Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development

Ana Victoria Ponce‐Bobadilla; Vanessa Schmitt; Corinna S. Maier; Sven Mensing; Sven Stodtmann

doi:10.1111/cts.70056

journal article Open Access Oct 28, 2024

Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development

Ana Victoria Ponce‐Bobadilla

Vanessa Schmitt

Corinna S. Maier Sven Mensing

Sven Stodtmann

Clinical and Translational Science Vol. 17 No. 11 · Wiley

View at Publisher Save 10.1111/cts.70056

Abstract

AbstractDespite increasing interest in using Artificial Intelligence (AI) and Machine Learning (ML) models for drug development, effectively interpreting their predictions remains a challenge, which limits their impact on clinical decisions. We address this issue by providing a practical guide to SHapley Additive exPlanations (SHAP), a popular feature‐based interpretability method, which can be seamlessly integrated into supervised ML models to gain a deeper understanding of their predictions, thereby enhancing their transparency and trustworthiness. This tutorial focuses on the application of SHAP analysis to standard ML black‐box models for regression and classification problems. We provide an overview of various visualization plots and their interpretation, available software for implementing SHAP, and highlight best practices, as well as special considerations, when dealing with binary endpoints and time‐series models. To enhance the reader's understanding for the method, we also apply it to inherently explainable regression models. Finally, we discuss the limitations and ongoing advancements aimed at tackling the current drawbacks of the method.

Topics

No keywords indexed for this article. Browse by subject →

References

50

[1]

10.1002/cpt.1771

[2]

10.1002/cpt.3053

[3]

10.3390/pharmaceutics16030332

[4]

10.1038/s41563-019-0332-5

[5]

10.1016/j.gpb.2022.11.008

[6]

10.1002/cmdc.202100418

[7]

10.3390/pharmaceutics14081530

[8]

10.1038/s42256-021-00357-4

[9]

10.1002/cpt.3076

[10]

10.1002/psp4.12796

[11]

10.1038/s41598-019-49656-2

[12]

10.1164/rccm.201908-1600oc

[13]

10.1038/s41598-020-78212-6

[14]

Qian Z "Integrating expert ODEs into neural ODEs: pharmacology and disease progression" Adv Neural Inf Proces Syst (2021)

[15]

10.1038/s41746-021-00381-z

[16]

10.2217/cer-2020-0230

[17]

10.1016/j.tranon.2020.100907

[18]

10.3390/electronics8080832

[19]

10.1038/s42256-023-00698-2

[20]

10.1002/psp4.12828

[21]

Denney W "What is normal? A meta‐analysis of phase 1 placebo data" Population Approach Group in Europe (2014)

[22]

10.3389/fphar.2022.994665

[23]

10.3390/pharmaceutics15051381

[24]

Shapley LS "A value for n‐person games" Contribution to the Theory of Games (1953)

[25]

Strumbelj E "An efficient explanation of individual classifications using game theory" J Machine Learning Res (2010)

[26]

Lundberg SM "A unified approach to interpreting model predictions" Adv Neural Inf Proces Syst (2017)

[27]

Molnar C (2023)

[28]

10.1038/s42256-019-0138-9

[29]

Centers for Disease Control and Prevention (CDC).National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey. Accessed July 25 2024.https://www.cdc.gov/nchs/nhanes/

[30]

10.1117/12.148698

[31]

10.1007/978-3-030-68640-6

[32]

Masís S (2023)

[33]

Wolberg WH "Importance of nuclear morphology in breast cancer prognosis" Clin Cancer Res (1999)

[34]

Ismail AA "Benchmarking deep learning interpretability in time series predictions" Adv Neural Inf Proces Syst (2020)

[35]

10.1038/s42256-023-00620-w

[36]

SHAPforxgboost.Accessed July 25 2024.https://cran.r‐project.org/web/packages/SHAPforxgboost/readme/README.html

[37]

Shapper.Accessed July 25 2024.https://modeloriented.github.io/shapper/

[38]

ChristophM.Interpretable machine learning: A guide for making black box models explainable(Leanpub).2020.

[39]

10.1016/j.artint.2021.103502

[40]

Corr_shap.Accessed July 29 2024.https://github.com/Fraunhofer‐SCAI/corr_shap/tree/main

[41]

Shapr.Accessed July 29 2024.https://github.com/NorskRegnesentral/shapr

[42]

10.1016/j.knosys.2022.110234

[43]

DuvalA MalliarosFD.Graphsvx: Shapley value explanations for graph neural networks. Machine Learning and Knowledge Discovery in Databases Research Track: European Conference ECML PKDD 2021 Bilbao Spain September 13–17 2021 Proceedings Part II 21. 2021: 302–318. 10.1007/978-3-030-86520-7_19

[44]

10.3390/diagnostics13010111

[45]

10.1109/icip46576.2022.9897253

[46]

RibeiroMT SinghS GuestrinC.“Why should i trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining2016: 1135–1144. 10.1145/2939672.2939778

[47]

SundararajanM TalyA YanQ.Axiomatic attribution for deep networks. International Conference on Machine Learning2017: 3319–3328.

[48]

Fisher A "All models are wrong, but many are useful: learning a variable's importance by studying an entire class of prediction models simultaneously" J Mach Learn Res (2019)

[49]

BentoJ SaleiroP CruzAF FigueiredoMA BizarroP.Timeshap: explaining recurrent models through sequence perturbations. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining2021: 2565–2573. 10.1145/3447548.3467166

[50]

10.1016/j.jbi.2023.104438

Cited By

450

More explicit is not always better: Boundary conditions for action guidance in hazard notifications across traffic complexity

Gayoung Ryu, Yong Gu Ji · 2026

Accident Analysis & Prevention

DSPONVNet: a multimodal deep learning model integrating intraoperative monitoring and clinical features for predicting postoperative nausea and vomiting risk

Lixin Liu, Haifeng Wang · 2026

BMC Medical Research Methodology

Uncertainty-Aware Quantile-GBRT Modeling and Robust Optimization of a Hybrid Air–Liquid Battery Thermal Management System

Merve Akkus, Ferhat Akkuş · 2026

Gazi University Journal of Science...

Single-lead ECG during paced breathing enables multidimensional assessment of Parkinson’s disease

Lixing Deng, Xingwu Tong · 2026

Biomedical Signal Processing and Co...

Explainable Machine Learning Based Prediction of Progression-Free Survival in Prostate Cancer: A Retrospective Cohort Study (Preprint)

Hein Minn Tun, Lin Naing · 2026

JMIR Cancer

Advancing target discovery through disease-specific integration of multi-modal target identification models and comprehensive benchmarking system

Howell Leung, Chengchen Duan · 2026

Scientific Reports

Development and validation of a machine learning model for in-hospital mortality prediction in children under 5 years with heart failure

Huasheng Lv, Fengyu Sun · 2025

Frontiers in Pediatrics

Ensemble Learning-Based Alzheimer’s Disease Classification Using Electroencephalogram Signals and Clock Drawing Test Images

Young Jae Huh, Jun-Ha Park · 2025

Sensors

Integrating expert knowledge with machine learning for AI-based stroke identifications and treatment systems

Taddesse kassu Yimenu, Abebe Belay Adege · 2025

DIGITAL HEALTH

Predicting carotid plaques in metabolic dysfunction-associated steatotic liver disease using machine learning and SHAP interpretation

Shu-Mei Zhai, Han Zhang · 2025

Scientific Reports

Metrics

450

Citations

50

References

Details

Published: Oct 28, 2024
Vol/Issue: 17(11)
License: View

Authors

A

Ana Victoria Ponce‐Bobadilla