Abstract
eXplainable AI (XAI) involves two intertwined but separate challenges: the development of techniques to extract explanations from black-box AI models and the way such explanations are presented to users, i.e., the explanation user interface. Despite its importance, the second aspect has received limited attention so far in the literature. Effective AI explanation interfaces are fundamental for allowing human decision-makers to take advantage and oversee high-risk AI systems effectively. Following an iterative design approach, we present the first cycle of prototyping-testing-redesigning of an explainable AI technique and its explanation user interface for clinical Decision Support Systems (DSS). We first present an XAI technique that meets the technical requirements of the healthcare domain: sequential, ontology-linked patient data, and multi-label classification tasks. We demonstrate its applicability to explain a clinical DSS, and we design a first prototype of an explanation user interface. Next, we test such a prototype with healthcare providers and collect their feedback with a two-fold outcome: First, we obtain evidence that explanations increase users’ trust in the XAI system, and second, we obtain useful insights on the perceived deficiencies of their interaction with the system, so we can re-design a better, more human-centered explanation interface.
Topics

No keywords indexed for this article. Browse by subject →

References
145
[1]
European Commission 2018. EU General Data Protection Regulation. European Commission. Retrieved from https://eur-lex.europa.eu/eli/reg/2016/679/oj.
[2]
2021. Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts . Retrieved from https://eur-lex.europa.eu/legal-content/EN/TXT/?qid=1623335154975&uri=CELEX%3A52021PC0206.
[3]
Barbara D. Adams, Lora E. Bruyn, Sébastien Houde, Paul Angelopoulos, Kim Iwasa-Madge, and Carol McCann. 2003. Trust in automated systems. Minist. Nat. Defen. (2003).
[4]
Bibb Allen, Sheela Agarwal, Laura Coombs, Christoph Wald, and Keith Dreyer. 2021. 2020 ACR Data Science Institute artificial intelligence survey. J. Amer. Coll. Radiol. 18, 8 (2021).
[5]
Omar AlShorman, Buthaynah Alshorman, and Fahed Alkahtani. 2021. A review of wearable sensors based monitoring with daily physical activity to manage type 2 diabetes. Int. J. Electric. Comput. Eng. 11, 1 (2021), 646–653.
[6]
Ahmad Fayez S. Althobaiti. 2017. Comparison of ontology-based semantic-similarity measures in the biomedical text. J. Comput. Commun. 5, 02 (2017), 17. 10.4236/jcc.2017.52003
[7]
Anna Markella Antoniadi, Yuhan Du, Yasmine Guendouz, Lan Wei, Claudia Mazo, Brett A. Becker, and Catherine Mooney. 2021. Current challenges and future opportunities for XAI in machine learning-based clinical decision support systems: A systematic review. Appl. Sci. 11, 11 (2021), 5088. 10.3390/app11115088
[8]
Vijay Arya, Rachel K. E. Bellamy, Pin-Yu Chen, Amit Dhurandhar, Michael Hind, Samuel C. Hoffman, Stephanie Houde, Q. Vera Liao, Ronny Luss, Aleksandra Mojsilović, et al. 2019. One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques. arXiv preprint arXiv:1909.03012 (2019).
[9]
Robert Avram, Jeffrey E. Olgin, Peter Kuhar, J. Weston Hughes, Gregory M. Marcus, Mark J. Pletcher, Kirstin Aschbacher, and Geoffrey H. Tison. 2020. A digital biomarker of diabetes from smartphone-based vascular signals. Nat. Med. 26, 10 (2020), 1576–1582. 10.1038/s41591-020-1010-5
[10]
Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Túlio Ribeiro, and Daniel S. Weld. 2020. Does the whole exceed its parts? The effect of AI explanations on complementary team performance. CoRR abs/2006.14779 (2020).
[11]
Alina Jade Barnett, Fides Regina Schwartz, Chaofan Tao, Chaofan Chen, Yinhao Ren, Joseph Y. Lo, and Cynthia Rudin. 2021. A case-based interpretable deep learning model for classification of mass lesions in digital mammography. Nat. Mach. Intell. 3 (2021), 1–10.
[12]
Zafer Barutcuoglu, Robert E. Schapire, and Olga G. Troyanskaya. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 7 (2006), 830–836. 10.1093/bioinformatics/btk048
[13]
Tal Baumel, Jumana Nassour-Kassis, Raphael Cohen, Michael Elhadad, and Noémie Elhadad. 2018. Multi-label classification of patient notes: Case study on ICD code assignment. In Proceedings of the Workshops at the 32nd AAAI Conference on Artificial Intelligence.
[14]
Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (AAAIWS’94). AAAI Press, 359–370. Retrieved from http://dl.acm.org/citation.cfm?id=3000850.3000887.
[15]
Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yunhan Jia, Joydeep Ghosh, Ruchir Puri, José M. F. Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 648–657. 10.1145/3351095.3375624
[17]
Natalia Blanco, Lyndsay M. O’Hara, Gwen L. Robinson, Jeanine Brown, Emily Heil, Clayton H. Brown, Brian D. Stump, Bryant W. Sigler, Anusha Belani, Heidi L. Miller, et al. 2018. Health care worker perceptions toward computerized clinical decision support tools for Clostridium difficile infection reduction: A qualitative study at 2 hospitals. Amer. J. Infect. Contr. 46, 10 (2018), 1160–1166. 10.1016/j.ajic.2018.04.204
[18]
Olivier Bodenreider. 2004. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Res. 32, suppl_1 (2004).
[19]
Francesco Bodria, Fosca Giannotti, Riccardo Guidotti, Francesca Naretto, Dino Pedreschi, and Salvatore Rinzivillo. 2021. Benchmarking and survey of explanation methods for black box models. arXiv preprint arXiv:2102.13076 (2021).
[20]
Andrea Brennen. 2020. What do people really want when they say they want “Explainable AI?” We asked 60 stakeholders. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 1–7.
[21]
Zana Buçinca, Phoebe Lin, Krzysztof Z. Gajos, and Elena L. Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 454–464. 10.1145/3377325.3377498
[22]
To Trust or to Think

Zana Buçinca, Maja Barbara Malaya, Krzysztof Z. Gajos

Proceedings of the ACM on Human-Computer Interacti... 2021 10.1145/3449287
[23]
Adrian Bussone, Simone Stumpf, and Dympna O’Sullivan. 2015. The role of explanations on trust and reliance in clinical decision support systems. In Proceedings of the International Conference on Healthcare Informatics. IEEE, 160–169.
[24]
The Efficient Assessment of Need for Cognition

John T. Cacioppo, Richard E. Petty, Chuan Feng Kao

Journal of Personality Assessment 1984 10.1207/s15327752jpa4803_13
[25]
Béatrice Cahour and Jean-François Forzy. 2009. Does projection into use improve trust and exploration? An example with a cruise control system. Safet. Sci. 47, 9 (2009), 1260–1270. 10.1016/j.ssci.2009.03.015
[26]
Carrie J. Cai, Samantha Winter, David Steiner, Lauren Wilcox, and Michael Terry. 2019. “Hello AI”: Uncovering the onboarding needs of medical practitioners for Human-AI collaborative decision-making. Proc. ACM Hum.-comput. Interact. 3, CSCW (2019), 1–24.
[27]
Giacomo Cappon, Martina Vettoretti, Giovanni Sparacino, and Andrea Facchinetti. 2019. Continuous glucose monitoring sensors for diabetes management: A review of technologies and applications. Diab. Metab. J. 43, 4 (2019), 383–397. 10.4093/dmj.2019.0121
[28]
Donna J. Cartwright. 2013. Icd-9-cm to icd-10-cm Codes: What? Why? How? 10.1089/wound.2013.0478
[30]
Benjamin Chin-Yee and Ross Upshur. 2020. The impact of artificial intelligence on clinical judgment: A briefing document. (2020).
[31]
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, and Jimeng Sun. 2016. Doctor AI: Predicting clinical events via recurrent neural networks. In Proceedings of the Machine Learning for Healthcare Conference. PMLR, 301–318.
[32]
Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. 2017. GRAM: Graph-based attention model for healthcare representation learning. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 787–795. 10.1145/3097983.3098126
[33]
Hiba Chougrad, Hamid Zouaki, and Omar Alheyane. 2020. Multi-label transfer learning for the early diagnosis of breast cancer. Neurocomputing 392 (2020), 168–180. 10.1016/j.neucom.2019.01.112
[34]
Michael Chromik and Andreas Butz. 2021. Human-XAI interaction: A review and design principles for explanation user interfaces. In Proceedings of the IFIP Conference on Human-Computer Interaction. Springer, 619–640.
[35]
Amanda Clare and Ross D. King. 2001. Knowledge discovery in multi-label phenotype data. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. Springer, 42–53. 10.1007/3-540-44794-6_4
[36]
Giovanni Comandé. 2020. Unfolding the legal component of trustworthy AI: A must to avoid ethics washing. SSRN Electronic Journal XI (2020) 24 pages. DOI:10.2139/ssrn.3690633 10.2139/ssrn.3690633
[37]
Ian Covert, Scott Lundberg, and Su-In Lee. 2021. Explaining by removing: A unified framework for model explanation. J. Mach. Learn. Res. 22, 209 (2021), 1–90. Retrieved from http://jmlr.org/papers/v22/20-1316.html.
[38]
Ian Covert, Scott M. Lundberg, and Su-In Lee. 2021. Explaining by removing: A unified framework for model explanation. J. Mach. Learn. Res. 22 (2021), 209–1.
[39]
Mark Craven and Jude Shavlik. 1995. Extracting tree-structured representations of trained networks. Adv. Neural Inf.Process. Syst. 8 (1995), 24–30.
[40]
People Reject Algorithms in Uncertain Decision Domains Because They Have Diminishing Sensitivity to Forecasting Error

Berkeley J. Dietvorst, Soaham Bharti

Psychological Science 2020 10.1177/0956797620948841
[41]
Hang Dong, Víctor Suárez-Paniagua, William Whiteley, and Honghan Wu. 2021. Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation. J. Biomed. Inform. 116 (2021), 103728. 10.1016/j.jbi.2021.103728
[42]
Kevin Donnelly et al. 2006. SNOMED-CT: The advanced terminology and coding system for eHealth. Stud. Health Technol. Inform. 121 (2006), 279.
[43]
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).
[44]
Jinyun Duan, Yue Xu, and Lyn M. Van Swol. 2020. Influence of self-concept clarity on advice seeking and utilisation. Asian J. Soc. Psychol. 24 (2020).
[45]
Upol Ehsan, Q. Vera Liao, Michael Muller, Mark O. Riedl, and Justin D. Weisz. 2021. Expanding explainability: Towards social transparency in ai systems. In Proceedings of the CHI Conference on Human Factors in Computing Systems. 1–19.
[46]
Upol Ehsan and Mark O. Riedl. 2020. Human-centered explainable AI: Towards a reflective sociotechnical approach. In Proceedings of the International Conference on Human-Computer Interaction. Springer, 449–466.
[47]
Malin Eiband, Daniel Buschek, Alexander Kremer, and Heinrich Hussmann. 2019. The impact of placebic explanations on trust in intelligent systems. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 1–6.
[48]
Shaker El-Sappagh and Farman Ali. 2016. DDO: A diabetes mellitus diagnosis ontology. In Applied Informatics, Vol. 3. Springer, 5.
[49]
Wenjuan Fan, Jingnan Liu, Shuwan Zhu, and Panos M. Pardalos. 2018. Investigating the impacting factors for the healthcare professionals to adopt artificial intelligence-based medical diagnosis support system (AIMDSS). Ann. Oper. Res. 294 (2018), 1–26.
[50]
Ruiwei Feng, Yan Cao, Xuechen Liu, Tingting Chen, Jintai Chen, Danny Z. Chen, Honghao Gao, and Jian Wu. 2021. ChroNet: A multi-task learning based approach for prediction of multiple chronic diseases. Multim. Tools Applic. 81 (2021), 1–15.

Showing 50 of 145 references

Metrics
88
Citations
145
References
Details
Published
Dec 08, 2023
Vol/Issue
13(4)
Pages
1-35
License
View
Funding
European Union Award: ERC-2018-ADG G.A. 834756 (XAI)
UK Government Award: 10061955
HumanE AI Net Award: 952026
PNRR - M4C2 - Investimento 1.3, Partenariato Esteso Award: PE00000013
Cite This Article
Cecilia Panigutti, Andrea Beretta, Daniele Fadda, et al. (2023). Co-design of Human-centered, Explainable AI for Clinical Decision Support. ACM Transactions on Interactive Intelligent Systems, 13(4), 1-35. https://doi.org/10.1145/3587271
Related

You May Also Like

The MovieLens Datasets

F. Maxwell Harper, Joseph A. Konstan · 2015

2,567 citations

Bridging the Gap Between Ethics and Practice

Ben Shneiderman · 2020

592 citations

Modeling User Preferences in Recommender Systems

Gawesh Jawaheer, Peter Weller · 2014

129 citations