journal article Oct 22, 2025

Interpretable Machine Learning Model for Predicting and Assessing the Risk of Diabetic Nephropathy: Prediction Model Study

Abstract
Abstract

Background
Diabetic nephropathy (DN), a severe complication of diabetes, is characterized by proteinuria, hypertension, and progressive renal function decline, potentially leading to end-stage renal disease. The International Diabetes Federation projects that by 2045, 783 million people will have diabetes, with 30%‐40% of them developing DN. Current diagnostic approaches lack sufficient sensitivity and specificity for early detection and diagnosis, underscoring the need for an accurate, interpretable predictive model to enable timely intervention, reduce cardiovascular risks, and optimize health care costs.


Objective
This study aimed to develop and validate a machine learning–based predictive model for DN in patients with type 2 diabetes, with a focus on achieving high predictive accuracy while ensuring transparency and interpretability through explainable artificial intelligence techniques, thereby supporting early diagnosis, risk assessment, and personalized clinical decision-making.


Methods

Our retrospective cohort study investigated 1000 patients with type 2 diabetes using data from electronic medical records collected between 2015 and 2020. The study design incorporated a sample of 444 patients with DN and 556 without, focusing on demographics, clinical metrics such as blood pressure and glucose levels, and renal function markers. Data collection relied on electronic records, with missing values handled via multiple imputation and dataset balance achieved using Synthetic Minority Oversampling Technique (SMOTE). In this study, advanced machine learning algorithms, namely Extreme Gradient Boosting (XGBoost), CatBoost, and Light Gradient-Boosting Machine (LightGBM), were used due to their robustness in handling complex datasets. Key metrics, including accuracy, precision, recall,
F
1
-score, specificity, and area under the curve, were used to provide a comprehensive assessment of model performance. In addition, explainable machine learning techniques, such as Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP), were applied to enhance the transparency and interpretability of the models, offering valuable insights into their decision-making processes.



Results

XGBoost and LightGBM demonstrated superior performance, with XGBoost achieving the highest accuracy of 86.87%, a precision of 88.90%, a recall of 84.40%, an
F
1
-score of 86.44%, and a specificity of 89.12%. LIME and SHAP analyses provided insights into the contribution of individual features to elucidate the decision-making processes of these models, identifying serum creatinine, albumin, and lipoproteins as significant predictors.



Conclusions
The developed machine learning model not only provides a robust predictive tool for early diagnosis and risk assessment of DN but also ensures transparency and interpretability, crucial for clinical integration. By enabling early intervention and personalized treatment strategies, this model has the potential to improve patient outcomes and optimize health care resource usage.
Topics

No keywords indexed for this article. Browse by subject →

References
46
[1]
Agarwal "Pathogenesis of diabetic nephropathy" Compendia 10.2337/db20211-2
[2]
Umanath "Update on diabetic nephropathy: core curriculum 2018" Am J Kidney Dis 10.1053/j.ajkd.2017.10.026
[3]
Zhang "Alterations of the gut microbiota in patients with diabetic nephropathy" Microbiol Spectr 10.1128/spectrum.00324-22
[4]
Pereira "Metabolomics as a tool for the early diagnosis and prognosis of diabetic kidney disease" Med Res Rev 10.1002/med.21883
[5]
Naaman "Diabetic nephropathy: update on pillars of therapy slowing progression" Diabetes Care 10.2337/dci23-0030
[6]
Richens "Improving the accuracy of medical diagnosis with causal machine learning" Nat Commun 10.1038/s41467-020-17419-7
[7]
Rajula "Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment" Med Bogota Colomb 10.3390/medicina56090455
[8]
Dong "Machine learning model for early prediction of acute kidney injury (AKI) in pediatric critical care" Crit Care 10.1186/s13054-021-03724-0
[9]
Lee "Machine learning model for predicting malaria using clinical information" Comput Biol Med 10.1016/j.compbiomed.2020.104151
[10]
Rahimi "Cervical cancer survival prediction by machine learning algorithms: a systematic review" BMC Cancer 10.1186/s12885-023-10808-3
[11]
Farah "Assessment of performance, interpretability, and explainability in artificial intelligence-based health technologies: what healthcare stakeholders need to know" Mayo Clin Proc Digit Health 10.1016/j.mcpdig.2023.02.004
[12]
Li "Toward building trust in machine learning models: quantifying the explainability by SHAP and references to human strategy" IEEE Access 10.1109/access.2023.3347796
[13]
El Shawi R Sherif Y Al-Mallah M Sakr S . Interpretability in healthcare a comparative study of local machine learning interpretability techniques. Presented at: 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS); Jun 5-7, 2019:275-280; Cordoba, Spain. [doi: 10.1109/CBMS.2019.00065] 10.1109/cbms.2019.00065
[14]
Chadaga "Explainable artificial intelligence approaches for COVID-19 prognosis prediction using clinical markers" Sci Rep 10.1038/s41598-024-52428-2
[15]
Khanna "A decision support system for osteoporosis risk prediction using machine learning and explainable artificial intelligence" Heliyon 10.1016/j.heliyon.2023.e22456
[17]
Zhong "Enhanced SpO estimation using explainable machine learning and neck photoplethysmography" Artif Intell Med 10.1016/j.artmed.2023.102685
[18]
Suh "Interpretable deep-learning approaches for osteoporosis risk screening and individualized feature analysis using large population-based data: model development and performance evaluation" J Med Internet Res 10.2196/40179
[19]
Bernard "Explainable machine learning framework to predict personalized physiological aging" Aging Cell 10.1111/acel.13872
[20]
Zhang "A deep learning‐based interpretable decision tool for predicting high risk of chemotherapy‐induced nausea and vomiting in cancer patients prescribed highly emetogenic chemotherapy" Cancer Med 10.1002/cam4.6428
[21]
Regression Shrinkage and Selection Via the Lasso

Robert Tibshirani

Journal of the Royal Statistical Society Series B:... 10.1111/j.2517-6161.1996.tb02080.x
[22]
Induction of decision trees

J. R. Quinlan

Machine Learning 10.1007/bf00116251
[23]
Random Forests

Leo Breiman

Machine Learning 10.1023/a:1010933404324
[24]
Extremely randomized trees

Pierre Geurts, Damien Ernst, Louis Wehenkel

Machine Learning 10.1007/s10994-006-6226-1
[25]
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting

Yoav Freund, Robert E Schapire

Journal of Computer and System Sciences 10.1006/jcss.1997.1504
[26]
Chen T Guestrin C . XGBoost: a scalable tree boosting system. Presented at: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mini; Aug 13-17, 2016:785-794; San Francisco, CA. [doi: 10.1145/2939672.2939785] 10.1145/2939672.2939785
[27]
Ke G Meng Q Finley T et al. LightGBM: a highly efficient gradient boosting decision tree. In: Guyon I Von Luxburg U Bengio S Wallach H Fergus R Vishwanathan S Garnett R , editors. Advances in Neural Information Processing Systems 30 (NIPS 2017). 2017:3149-3157.
[28]
“Why Should I Trust You?”: Explaining the Predictions of Any Classifier

Marco Ribeiro, Sameer Singh, Carlos Guestrin

Proceedings of the 2016 Conference of the North Am... 10.18653/v1/n16-3020
[29]
Lundberg S Lee SI . A unified approach to interpreting model predictions. Preprint posted online on Nov 24, 2017. [doi: 10.48550/arXiv.1705.07874]
[30]
Diabetes Complications Data Set. Population Health Data Archive. URL: https://www.ncmi.cn/phda/dataDetails.do?id=CSTR:A0006.11.A0005.201905.000282-V1.0 [Accessed 20-10-2025]
[31]
Narva "Laboratory assessment of diabetic kidney disease" Diabetes Spectr 10.2337/diaspect.28.3.162
[32]
Neumiller "Optimization of guideline-directed medical therapies in patients with diabetes and chronic kidney disease" Clin Kidney J 10.1093/ckj/sfad285
[33]
11. Chronic Kidney Disease and Risk Management: Standards of Care in Diabetes—2024

Nuha A. ElSayed, Grazia Aleppo, Raveendhara R. Bannuru et al.

Diabetes Care 10.2337/dc24-s011
[34]
Davis "Protecting the kidneys: update on therapies to treat diabetic nephropathy" Clin Diabetes 10.2337/cd21-0090
[35]
Li "A meta-analysis of urinary transferrin for early diagnosis of diabetic nephropathy" Lab Med 10.1093/labmed/lmad115
[36]
Sulaiman "Diabetic nephropathy: recent advances in pathophysiology and challenges in dietary management" Diabetol Metab Syndr 10.1186/s13098-019-0403-4
[37]
Barbagallo "Lipoprotein abnormalities in chronic kidney disease and renal transplantation" Life (Basel) 10.3390/life11040315
[38]
Gupta "Current understanding of diabetic dyslipidemia: a review" J Indian Inst Sci 10.1007/s41745-022-00346-5
[39]
Chen "Unlocking the mysteries of VLDL: exploring its production, intracellular trafficking, and metabolism as therapeutic targets" Lipids Health Dis 10.1186/s12944-023-01993-y
[40]
Weldegiorgis "Elevated triglycerides and reduced high-density lipoprotein cholesterol are independently associated with the onset of advanced chronic kidney disease: a cohort study of 911,360 individuals from the United Kingdom" BMC Nephrol 10.1186/s12882-022-02932-2
[41]
Bauer "Estimation of LDL cholesterol in chronic kidney disease" Eur J Prev Cardiol 10.1093/eurjpc/zwaa003
[42]
Yang "High levels of serum C-peptide are associated with a decreased risk for incident renal progression in patients with type 2 diabetes: a retrospective cohort study" BMJ Open Diabetes Res Care 10.1136/bmjdrc-2022-003201
[43]
Wahren "C-peptide: a new potential in the treatment of diabetic nephropathy" Curr Diab Rep 10.1007/s11892-001-0044-4
[44]
Hills "C-peptide as a therapeutic tool in diabetic nephropathy" Am J Nephrol 10.1159/000289864
[45]
Colbert "Management of hypertension in diabetic kidney disease" J Clin Med 10.3390/jcm12216868
[46]
Steigerwalt "Management of hypertension in diabetic patients with chronic kidney disease" Diabetes Spectr 10.2337/diaspect.21.1.30
Metrics
5
Citations
46
References
Details
Published
Oct 22, 2025
Vol/Issue
13
Pages
e64979-e64979
Cite This Article
Yili Wen, Zhiqiang Wan, Huiling Ren, et al. (2025). Interpretable Machine Learning Model for Predicting and Assessing the Risk of Diabetic Nephropathy: Prediction Model Study. JMIR Medical Informatics, 13, e64979-e64979. https://doi.org/10.2196/64979