Accurate and Reliable Classification of Unstructured Reports on Their Diagnostic Goal Using BERT Models

Max Tigo Rietberg; Van Bach Nguyen; Jeroen Geerdink; Onno Vijlbrief; Christin Seifert

doi:10.3390/diagnostics13071251

journal article Open Access Mar 27, 2023

Accurate and Reliable Classification of Unstructured Reports on Their Diagnostic Goal Using BERT Models

Max Tigo Rietberg

Diagnostics Vol. 13 No. 7 pp. 1251 · MDPI AG

View at Publisher Save 10.3390/diagnostics13071251

Abstract

Understanding the diagnostic goal of medical reports is valuable information for understanding patient flows. This work focuses on extracting the reason for taking an MRI scan of Multiple Sclerosis (MS) patients using the attached free-form reports: Diagnosis, Progression or Monitoring. We investigate the performance of domain-dependent and general state-of-the-art language models and their alignment with domain expertise. To this end, eXplainable Artificial Intelligence (XAI) techniques are used to acquire insight into the inner workings of the model, which are verified on their trustworthiness. The verified XAI explanations are then compared with explanations from a domain expert, to indirectly determine the reliability of the model. BERTje, a Dutch Bidirectional Encoder Representations from Transformers (BERT) model, outperforms RobBERT and MedRoBERTa.nl in both accuracy and reliability. The latter model (MedRoBERTa.nl) is a domain-specific model, while BERTje is a generic model, showing that domain-specific models are not always superior. Our validation of BERTje in a small prospective study shows promising results for the potential uptake of the model in a practical setting.

Topics

No keywords indexed for this article. Browse by subject →

References

48

[1]

Centraal Bureau voor de Statistiek (2022). Zorguitgaven; Kerncijfers, Centraal Bureau voor de Statistiek.

[2]

Langlotz "Structured Radiology Reporting: Are We There Yet?" Radiology (2009) 10.1148/radiol.2531091088

[3]

Ashfaq "Medication Accuracy in Electronic Health Records for Microbial Keratitis" JAMA Ophthalmol. (2019) 10.1001/jamaophthalmol.2019.1444

[4]

Tamang "New Paradigms for Patient-Centered Outcomes Research in Electronic Medical Records" eGEMs (2016)

[5]

Payne "Electronic health records contain dispersed risk factor information that could be used to prevent breast and ovarian cancer" J. Am. Med Inform. Assoc. JAMIA (2020) 10.1093/jamia/ocaa152

[6]

Gotz "Data-Driven Healthcare: Challenges and Opportunities for Interactive Visualization" IEEE Comput. Graph. Appl. (2016) 10.1109/mcg.2016.59

[7]

"Medical language—A unique linguistic phenomenon" JAHR-Eur. J. Bioeth. (2019)

[8]

Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2–7). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA.

[9]

Ramos "Using TF-IDF to Determine Word Relevance in Document Queries" Proc. First Instr. Conf. Mach. Learn. (2003)

[10]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv.

[11]

Radford, A., and Narasimhan, K. (2022, October 12). Improving Language Understanding by Generative Pre-Training. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.

[12]

Radford "Language models are unsupervised multitask learners" OpenAI Blog (2019)

[13]

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., and Le, Q.V. (2019). XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv.

[14]

Haley, C. (2020, January 20). This is a BERT. Now there are several of them. Can they generalize to novel words?. Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Online. 10.18653/v1/2020.blackboxnlp-1.31

[15]

de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., Noord, G.v., and Nissim, M. (2019). BERTje: A Dutch BERT Model. arXiv.

[16]

Nozza, D., Bianchi, F., and Hovy, D. (2020). What the [MASK]? Making Sense of Language-Specific BERT Models. arXiv.

[17]

Brandsen, A. (2022, October 10). Language Resources by TMR. Available online: http://textdata.nl.

[18]

Delobelle, P., Winters, T., and Berendt, B. (2020). RobBERT: A Dutch RoBERTa-based Language Model. arXiv. 10.18653/v1/2020.findings-emnlp.292

[19]

Spyns, P., and Odijk, J. (2013). Essential Speech and Language Technology for Dutch: Results by the STEVIN Programme, Springer. 10.1007/978-3-642-30910-6

[20]

Bański, P., Barbaresi, A., Biber, H., Breiteneder, E., Clematide, S., Kupietz, M., Lüngen, H., and Iliadi, C. (2019, January 22). Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures. Proceedings of the 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7), Cardiff, UK.

[21]

de Vries, W., van Cranenburgh, A., Bisazza, A., Caselli, T., Noord, G.v., and Nissim, M. (2022, October 10). BERTje: A Dutch BERT Model (GitHub). Available online: https://github.com/wietsedv/bertje.

[22]

Delobelle "RobBERTje: A Distilled Dutch BERT Model" Comput. Linguist. Neth. J. (2021)

[23]

Beltagy, I., Lo, K., and Cohan, A. (2019, January 3–7). SciBERT: A Pretrained Language Model for Scientific Text. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. 10.18653/v1/d19-1371

[24]

Lee "BioBERT: A pre-trained biomedical language representation model for biomedical text mining" Bioinformatics (2019) 10.1093/bioinformatics/btz682

[25]

Brandsen "Can BERT Dig It? Named Entity Recognition for Information Retrieval in the Archaeology Domain" J. Comput. Cult. Herit. (2022) 10.1145/3497842

[26]

De Kruijf, G. (2022, October 14). Training a Dutch (+English) BERT Model Applicable for the Legal Domain. Available online: https://www.ru.nl/publish/pages/769526/gerwin_de_kruijf.pdf.

[27]

Verkijk "MedRoBERTa.nl: A Language Model for Dutch Electronic Health Records" Comput. Linguist. Neth. J. (2021)

[28]

Ahmed, M., Islam, S.R., Anwar, A., Moustafa, N., and Pathan, A.S.K. (2022). Explainable Artificial Intelligence for Cyber Security: Next Generation Artificial Intelligence, Springer International Publishing. Studies in Computational Intelligence. 10.1007/978-3-030-96630-0

[29]

Danilevsky, M., Qian, K., Aharonov, R., Katsis, Y., Kawas, B., and Sen, P. (2020). A Survey of the State of Explainable AI for Natural Language Processing. arXiv.

[30]

Sundararajan, M., Taly, A., and Yan, Q. (2017). Axiomatic Attribution for Deep Networks. arXiv.

[31]

Ribeiro, M.T., Singh, S., and Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. arXiv. 10.1145/2939672.2939778

[32]

Lundberg, S.M., and Lee, S. (2017). A unified approach to interpreting model predictions. arXiv.

[33]

Gorinski, P.J., Wu, H., Grover, C., Tobin, R., Talbot, C., Whalley, H., Sudlow, C., Whiteley, W., and Alex, B. (2019). Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches. arXiv.

[34]

Tsivgoulis "Racial Difference in Cerebral Microbleed Burden Among a Patient Population in the Mid-South United States" J. Stroke Cerebrovasc. Dis. (2018) 10.1016/j.jstrokecerebrovasdis.2018.05.031

[35]

Kim, C., Zhu, V., Obeid, J., and Lenert, L. (2019). Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke. PLoS ONE, 14. 10.1371/journal.pone.0212778

[36]

Garg "Automating Ischemic Stroke Subtype Classification Using Machine Learning and Natural Language Processing" J. Stroke Cerebrovasc. Dis. (2019) 10.1016/j.jstrokecerebrovasdis.2019.02.004

[37]

Fu "Natural Language Processing for the Identification of Silent Brain Infarcts From Neuroimaging Reports" JMIR Med. Inf. (2019) 10.2196/12109

[38]

Galbusera, F., Cina, A., Bassani, T., Panico, M., and Sconfienza, L.M. (2021). Automatic Diagnosis of Spinal Disorders on Radiographic Images: Leveraging Existing Unstructured Datasets with Natural Language Processing. Glob. Spine J., 21925682211026910. 10.1177/21925682211026910

[39]

Wood "Deep learning to automate the labelling of head MRI datasets for computer vision applications" Eur. Radiol. (2022) 10.1007/s00330-021-08132-0

[40]

Davis "Automated extraction of clinical traits of multiple sclerosis in electronic medical records" J. Am. Med. Inform. Assoc. (2013) 10.1136/amiajnl-2013-001999

[41]

Li, I., Pan, J., Goldwasser, J., Verma, N., Wong, W.P., Nuzumlalı, M.Y., Rosand, B., Li, Y., Zhang, M., and Chang, D. (2021). Neural Natural Language Processing for Unstructured Data in Electronic Health Records: A Review. arXiv. 10.1016/j.cosrev.2022.100511

[42]

Costa, A.D., Denkovski, S., Malyska, M., Moon, S.Y., Rufino, B., Yang, Z., Killian, T., and Ghassemi, M. (2020). Multiple Sclerosis Severity Classification From Clinical Text. arXiv. 10.18653/v1/2020.clinicalnlp-1.2

[43]

Wattjes "2021 MAGNIMS–CMSC–NAIMS consensus recommendations on the use of MRI in patients with multiple sclerosis" Lancet Neurol. (2021) 10.1016/s1474-4422(21)00095-8

[44]

Lau, J.H., and Baldwin, T. (2016). An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. arXiv. 10.18653/v1/w16-1609

[45]

Tunstall, L., von Werra, L., and Wolf, T. (2022). Natural Language Processing with Transformers: Building Language Applications with Hugging Face, O’Reilly Media, Inc.

[46]

Nauta, M., Trienes, J., Pathak, S., Nguyen, E., Peters, M., Schmitt, Y., Schlötterer, J., van Keulen, M., and Seifert, C. (2023). From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI. arXiv. 10.1145/3583558

[47]

Bobicev, V., and Sokolova, M. (2017, January 2–8). Inter-Annotator Agreement in Sentiment Analysis: Machine Learning Perspective. Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, Varna, Bulgaria. 10.26615/978-954-452-049-6_015

[48]

Trienes, J., Trieschnigg, D., Seifert, C., and Hiemstra, D. (2020). Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records. arXiv.

Metrics

17

Citations

48

References

Details

Published: Mar 27, 2023
Vol/Issue: 13(7)
Pages: 1251
License: View

Authors

M

Max Tigo Rietberg

Faculty of EEMCS, University of Twente, 7500 AE Enschede, The Netherlands

V

Van Bach Nguyen

Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany

J

Jeroen Geerdink

Hospital Group Twente (ZGT), 7555 DL Hengelo, The Netherlands

O

Onno Vijlbrief

Hospital Group Twente (ZGT), 7555 DL Hengelo, The Netherlands

C

Christin Seifert

Institute for Artificial Intelligence in Medicine, University of Duisburg-Essen, 45131 Essen, Germany

Funding

Open Access Publication Fund of the University of Duisburg-Essen

Cite This Article

Max Tigo Rietberg, Van Bach Nguyen, Jeroen Geerdink, et al. (2023). Accurate and Reliable Classification of Unstructured Reports on Their Diagnostic Goal Using BERT Models. Diagnostics, 13(7), 1251. https://doi.org/10.3390/diagnostics13071251

Accurate and Reliable Classification of Unstructured Reports on Their Diagnostic Goal Using BERT Models

You May Also Like