journal article Feb 24, 2025

Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study

Abstract
Background
Electronic health records (EHRs) and routine documentation practices play a vital role in patients’ daily care, providing a holistic record of health, diagnoses, and treatment. However, complex and verbose EHR narratives can overwhelm health care providers, increasing the risk of diagnostic inaccuracies. While large language models (LLMs) have showcased their potential in diverse language tasks, their application in health care must prioritize the minimization of diagnostic errors and the prevention of patient harm. Integrating knowledge graphs (KGs) into LLMs offers a promising approach because structured knowledge from KGs could enhance LLMs’ diagnostic reasoning by providing contextually relevant medical information.


Objective
This study introduces DR.KNOWS (Diagnostic Reasoning Knowledge Graph System), a model that integrates Unified Medical Language System–based KGs with LLMs to improve diagnostic predictions from EHR data by retrieving contextually relevant paths aligned with patient-specific information.


Methods
DR.KNOWS combines a stack graph isomorphism network for node embedding with an attention-based path ranker to identify and rank knowledge paths relevant to a patient’s clinical context. We evaluated DR.KNOWS on 2 real-world EHR datasets from different geographic locations, comparing its performance to baseline models, including QuickUMLS and standard LLMs (Text-to-Text Transfer Transformer and ChatGPT). To assess diagnostic reasoning quality, we designed and implemented a human evaluation framework grounded in clinical safety metrics.


Results
DR.KNOWS demonstrated notable improvements over baseline models, showing higher accuracy in extracting diagnostic concepts and enhanced diagnostic prediction metrics. Prompt-based fine-tuning of Text-to-Text Transfer Transformer with DR.KNOWS knowledge paths achieved the highest ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation–Longest Common Subsequence) and concept unique identifier F1-scores, highlighting the benefits of KG integration. Human evaluators found the diagnostic rationales of DR.KNOWS to be aligned strongly with correct clinical reasoning, indicating improved abstraction and reasoning. Recognized limitations include potential biases within the KG data, which we addressed by emphasizing case-specific path selection and proposing future bias-mitigation strategies.


Conclusions
DR.KNOWS offers a robust approach for enhancing diagnostic accuracy and reasoning by integrating structured KG knowledge into LLM-based clinical workflows. Although further work is required to address KG biases and extend generalizability, DR.KNOWS represents progress toward trustworthy artificial intelligence–driven clinical decision support, with a human evaluation framework focused on diagnostic safety and alignment with clinical standards.
Topics

No keywords indexed for this article. Browse by subject →

References
49
[2]
Length and Redundancy of Outpatient Progress Notes Across a Decade at an Academic Medical Center

Adam Rule, Steven Bedrick, Michael F. Chiang et al.

JAMA Network Open 10.1001/jamanetworkopen.2021.15334
[6]
Croskerry, P Advances in Patient Safety: From Research to Implementation. Volume 2 (2005)
[7]
GaoYDligachDMillerTXuDChurpekMMAfsharMSummarizing patients’ problems from hospital progress notes using pre-trained sequence-to-sequence modelsProceedings of the 29th International Conference on Computational Linguistics2022COLING '22October 12-17, 2022Virtual Event297991
[8]
MIMIC-III, a freely accessible critical care database

Alistair E.W. Johnson, Tom J. Pollard, Lu Shen et al.

Scientific Data 10.1038/sdata.2016.35
[12]
Raffel, C J Mach Learn Res (2020)
[13]
GPT-3: Its Nature, Scope, Limits, and Consequences

Luciano Floridi, Massimo Chiriatti

Minds and Machines 10.1007/s11023-020-09548-1
[15]
The Unified Medical Language System (UMLS): integrating biomedical terminology

O. Bodenreider

Nucleic Acids Research 10.1093/nar/gkh061
[28]
YasunagaMBosselutARenHZhangXManningCDLiangPLeskovecJDeep bidirectional language-knowledge graph pretrainingProceedings of the 36th Annual Conference on Neural Information Processing Systems2022NIPS '22November 28-December 9, 2022New Orleans, LA3730923
[32]
SoldainiLGoharianNQuickumls: a fast, unsupervised approach for medical concept extractionProceedings of the 2016 Conference on Medical Information Retrieval2016MedIR '16July 21, 2016Pisa, Italy14
[33]
HouYZhangJChengJMaKMaRTChenHYangMCMeasuring and improving the use of graph information in graph neural networkProceedings of the 8th International Conference on Learning Representations2020ICLR '20June 16-18, 2020Addis Ababa, Ethiopia116
[36]
Chung, HW arXiv
[37]
LehmanEJohnsonAClinical-T5: large language models built using MIMIC clinical textPhysioNet2023-01-23https://www.physionet.org/content/clinical-t5/1.0.0/
[38]
White, J arXiv
[40]
Lin, CY Text Summarization Branches Out (2004)
[45]
AsaiAWuZWangYSilAHajishirziHSelf-RAG: learning to retrieve, generate, and critique through self-reflectionProceedings of the 25th International Conference on Learning Representations2024ICLR '24May 7-11, 2024Vienna Austria130
[46]
Zheng, HS arXiv
[47]
FatemiBHalcrowJPerozziBTalk like a graph: encoding graphs for large language modelsProceedings of the 25th International Conference on Learning Representations2024ICLR '24May 7-11, 2024Vienna Austria
[49]
serenayj / DRKnowsGitHub2024-04-29https://github.com/serenayj/DRKnows
Metrics
45
Citations
49
References
Details
Published
Feb 24, 2025
Vol/Issue
4
Pages
e58670
Cite This Article
Yanjun Gao, Ruizhe Li, Emma Croxford, et al. (2025). Leveraging Medical Knowledge Graphs Into Large Language Models for Diagnosis Prediction: Design and Application Study. JMIR AI, 4, e58670. https://doi.org/10.2196/58670