Effects of AI and Logic-Style Explanations on Users’ Decisions Under Different Levels of Uncertainty

Federico Maria Cau; Hanna Hauptmann; Lucio Davide Spano; Nava Tintarev

doi:10.1145/3588320

journal article Dec 08, 2023

Effects of AI and Logic-Style Explanations on Users’ Decisions Under Different Levels of Uncertainty

Federico Maria Cau

Hanna Hauptmann

Lucio Davide Spano

Nava Tintarev

ACM Transactions on Interactive Intelligent Systems Vol. 13 No. 4 pp. 1-42 · Association for Computing Machinery (ACM)

View at Publisher Save 10.1145/3588320

Abstract

Existing eXplainable Artificial Intelligence (XAI) techniques support people in interpreting AI advice. However, although previous work evaluates the users’ understanding of explanations, factors influencing the decision support are largely overlooked in the literature. This article addresses this gap by studying the impact of
user uncertainty
,
AI correctness
, and the interaction between
AI uncertainty
and
explanation logic-styles
for classification tasks. We conducted two separate studies: one requesting participants to recognize handwritten digits and one to classify the sentiment of reviews. To assess the decision making, we analyzed the
task performance, agreement
with the AI suggestion, and the user’s
reliance
on the XAI interface elements. Participants make their decision relying on three pieces of information in the XAI interface (image or text instance, AI prediction, and explanation). Participants were shown one explanation style (between-participants design) according to three styles of logical reasoning (inductive, deductive, and abductive). This allowed us to study how different levels of AI uncertainty influence the effectiveness of different explanation styles. The results show that user uncertainty and AI correctness on predictions significantly affected users’ classification decisions considering the analyzed metrics. In both domains (images and text), users relied mainly on the instance to decide. Users were usually overconfident about their choices, and this evidence was more pronounced for text. Furthermore, the inductive style explanations led to overreliance on the AI advice in both domains—it was the most persuasive, even when the AI was incorrect. The abductive and deductive styles have complex effects depending on the domain and the AI uncertainty levels.

Topics

No keywords indexed for this article. Browse by subject →

References

92

[1]

10.5555/3327546.3327621

[2]

Kamran Alipour, Jürgen P. Schulze, Yi Yao, Avi Ziskind, and Giedrius Burachas. 2020. A study on multimodal and interactive explanations for visual question answering. CoRR abs/2003.00431 (2020).

[3]

10.1145/3377325.3377519

[4]

Alexander Amini Wilko Schwarting Ava Soleimany and Daniela Rus. 2020. Deep Evidential Uncertainty. Retrieved March 22 2023 from https://openreview.net/forum?id=S1eSoeSYwr.

[5]

Vijay Arya Rachel K. E. Bellamy Pin-Yu Chen Amit Dhurandhar Michael Hind Samuel C. Hoffman Stephanie Houde et al. 2019. One explanation does not fit all: A toolkit and taxonomy of AI explainability techniques. arxiv:1909.03012 [cs.AI] (2019).

[6]

Nabiha Asghar. 2016. Yelp Dataset Challenge: Review rating prediction. arxiv:1605.05362 [cs.CL] (2016).

[7]

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation

Sebastian Bach, Alexander Binder, Grégoire Montavon et al.

PLoS ONE 10.1371/journal.pone.0130140

[8]

10.1145/3447548.3467307

[9]

10.1145/3377325.3377498

[10]

Federico Maria Cau, L. D. Spano, and N. Tintarev. 2020. Considerations for applying logical reasoning to explain neural network outputs. In Proceedings of the 2020 Italian Workshop on Explainable Artificial Intelligence (XAI.it@AI*IA’20).

[11]

Daniel Cer Yinfei Yang Sheng Yi Kong Nan Hua Nicole Limtiaco Rhomni St. John Noah Constant et al. 2018. Universal sentence encoder. arxiv:1803.11175 [cs.CL] (2018).

[12]

David Cian Jan van Gemert and Attila Lengyel. 2020. Evaluating the performance of the LIME and Grad-CAM explanation methods on a LEGO multi-label image classification task. arxiv:2008.01584 [cs.CV] (2020).

[13]

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher et al.

2009 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2009.5206848

[14]

Aleatory or epistemic? Does it matter?

Armen Der Kiureghian, Ove Ditlevsen

Structural Safety 10.1016/j.strusafe.2008.06.020

[15]

Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. arxiv:1810.04805 [cs.CL] (2019).

[16]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv:1702.08608 (2017).

[17]

10.1016/s1071-5819(03)00038-7

[18]

10.1109/cvpr.2015.7298754

[19]

Franz Faul, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2013. G*Power 3.1.7: A flexible statistical power analysis program for the social, behavioral and biomedical sciences. Behavior Research Methods 39, 2 (2013), 175–191.

[20]

10.1007/978-94-017-0606-3-1

[21]

Rudolf Franz Flesch. 1979. How to Write Plain English: A Book for Lawyers & Consumers . HarperCollins.

[22]

The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance

Milton Friedman

Journal of the American Statistical Association 10.1080/01621459.1937.10503522

[23]

A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings

Milton Friedman

The Annals of Mathematical Statistics 10.1214/aoms/1177731944

[24]

Yarin Gal. 2016. Uncertainty in Deep Learning. Ph.D. Dissertation. University of Cambridge.

[25]

Yarin Gal and Zoubin Ghahramani. 2016. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning—Volume 48 (ICML’16). 1050–1059.

[26]

10.23915/distill.00030

[27]

Chuan Guo Geoff Pleiss Yu Sun and Kilian Q. Weinberger. 2017. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning 70 (2017) 1321–1330. https://proceedings.mlr.press/v70/guo17a.html.

[28]

Tanmay Gupta Amita Kamath Aniruddha Kembhavi and Derek Hoiem. 2021. Towards general purpose vision systems. arxiv:2104.00743 [cs.CV] (2021).

[29]

10.1002/1520-6564(200023)10:4<369::aid-hfm2>3.0.co;2-y

[30]

Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2015. Deep residual learning for image recognition. arxiv:1512.03385 [cs.CV] (2015).

[31]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren et al.

2016 IEEE Conference on Computer Vision and Patter... 10.1109/cvpr.2016.90

[32]

Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, B. Schiele, and Trevor Darrell. 2016. Generating visual explanations. In Computer Vision—ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. Springer, 3–19.

[33]

10.1109/mis.2013.24

[34]

10.1016/j.dss.2010.12.003

[35]

Mark T. Keane and Eoin M. Kenny. 2019. How case-based reasoning explains neural networks: A theoretical analysis of XAI using post-hoc explanation-by-example from a survey of ANN-CBR twin-systems. In Case-Based Reasoning Research and Development. Lecture Notes in Computer Science, Vol. 11680. Springer, 155–171.

[36]

Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in Bayesian deep learning for computer vision? In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Red Hook, NY, 1–11.https://proceedings.neurips.cc/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf.

[37]

Alex Kendall, Y. Gal, and R. Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.7482–7491.

[38]

10.1016/j.artint.2021.103459

[39]

10.1016/j.knosys.2021.107530

[40]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arxiv:1408.5882 [cs.CL] (2014). 10.3115/v1/d14-1181

[41]

J. Peter Kincaid Robert P. Fishburne R. L. Rogers and Brad S. Chissom. 1975. Derivation of New Readability Formulas (Automated Readability Index Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel . IST Technical Report. Institute for Simulation and Training University of Central Florida. 10.21236/ada006655

[42]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2015).

[43]

Pang Wei Koh and Percy Liang. 2017. Understanding black-box predictions via influence functions. Proceedings of the 34th International Conference on Machine Learning 70 (2017) 185–1894. http://proceedings.mlr.press/v70/koh17a.html.

[44]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.). Curran Associates, Red Hook, NY, 1097–1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.

[45]

10.1145/2939672.2939874

[46]

Y. LeCun and C. Cortes. 2010. The MNIST Database of Handwritten Digits. Retrieved March 22, 2023 from http://yann.lecun.com/exdb/mnist/.

[47]

Piyawat Lertvittayakumjorn and Francesca Toni. 2019. Human-grounded evaluations of explanation methods for text classification. arxiv:1908.11355 [cs.CL] (2019). 10.18653/v1/d19-1523

[48]

10.18653/v1/d19-1523

[49]

10.1002/meet.2011.14504801092

[50]

10.1016/j.ergon.2012.09.001

Showing 50 of 92 references

Metrics

29

Citations

92

References

Details

Published: Dec 08, 2023
Vol/Issue: 13(4)
Pages: 1-42
License: View

Authors

F

Federico Maria Cau

University of Cagliari

Utrecht University

University of Cagliari

N

Nava Tintarev

Maastricht University

Funding

CRS4.Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna for collaboration on the RIALE

Sardinia Regional Government and by Fondazione di Sardegna, ADAM Award: CUP F74I19000900007

ASTRID Award: CUP F75F21001220007

Cite This Article

Federico Maria Cau, Hanna Hauptmann, Lucio Davide Spano, et al. (2023). Effects of AI and Logic-Style Explanations on Users’ Decisions Under Different Levels of Uncertainty. ACM Transactions on Interactive Intelligent Systems, 13(4), 1-42. https://doi.org/10.1145/3588320

Effects of AI and Logic-Style Explanations on Users’ Decisions Under Different Levels of Uncertainty

You May Also Like