Abstract
Machine learning (ML) algorithms have advanced significantly in recent years, progressively evolving into artificial intelligence (AI) agents capable of solving complex, human-like intellectual challenges. Despite the advancements, the interpretability of these sophisticated models lags behind, with many ML architectures remaining "black boxes" that are too intricate and expansive for human interpretation. Recognizing this issue, there has been a revived interest in the field of explainable AI (XAI) aimed at explaining these opaque ML models. However, XAI tools often suffer from being tightly coupled with the underlying ML models and are inefficient due to redundant computations. We introduce provenance-enabled explainable AI (PXAI). PXAI decouples XAI computation from ML models through a provenance graph that tracks the creation and transformation of all data within the model. PXAI improves XAI computational efficiency by excluding irrelevant and insignificant variables and computation in the provenance graph. Through various case studies, we demonstrate how PXAI enhances computational efficiency when interpreting complex ML models, confirming its potential as a valuable tool in the field of XAI.
Topics

No keywords indexed for this article. Browse by subject →

References
62
[1]
Somak Aditya Yezhou Yang and Chitta Baral. 2018. Explicit Reasoning over End-to-End Neural Architectures for Visual Question Answering. In AAAI. 10.1609/aaai.v32i1.11324
[3]
Stephen H. Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. 2015. Hinge-Loss Markov Random Fields and Probabilistic Soft Logic. Journal of Machine Learning Research 18 (2015), 109:1--109:67.
[9]
Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 613--627. https://www.usenix.org/conference/nsdi17/ technical-sessions/presentation/crankshaw
[10]
Susanne Dandl, Christoph Molnar, Martin Binder, and Bernd Bischl. 2020. Multi-objective counterfactual explanations. In Parallel Problem Solving from Nature--PPSN XVI: 16th International Conference, PPSN 2020, Leiden, The Netherlands, September 5--9, 2020, Proceedings, Part I. Springer, 448--469.
[11]
The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]

Li Deng

IEEE Signal Processing Magazine 10.1109/msp.2012.2211477
[13]
Hailun Ding, Juan Zhai, Dong Deng, and Shiqing Ma. 2023. The Case for Learned Provenance Graph Storage Systems. In 32nd USENIX Security Symposium (USENIX Security 23). USENIX Association, Anaheim, CA, 3277--3294. https://www.usenix.org/conference/usenixsecurity23/presentation/ding-hailun-provenance
[14]
Pedro Domingos Dominik Jain Stanley Kok Daniel Lowd Lily Mihalkova Hoifung Poon Matthew Richardson Parag Singla Marc Sumner and Jue Wang. [n. d.]. Alchemy - Open Source AI. http://alchemy.cs.washington.edu/alchemy1. html
[18]
Patrick Forré and Joris M Mooij. 2017. Markov properties for graphical models with cycles and latent variables. arXiv preprint arXiv:1710.08775 (2017).
[19]
Víctor Garcia Satorras and Max Welling. 2021. Neural Enhanced Belief Propagation on Factor Graphs. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (Proceedings of Machine Learning Research, Vol. 130), Arindam Banerjee and Kenji Fukumizu (Eds.). PMLR, 685--693. https://proceedings.mlr.press/v130/garciasatorras21a. html
[20]
Antonio A. Ginart, Melody Y. Guan, Gregory Valiant, and James Zou. 2019. Making AI Forget You: Data Deletion in Machine Learning. Curran Associates Inc., Red Hook, NY, USA.
[21]
Alex Goldstein, Adam Kapelner, Justin Bleich, and Emil Pitkin. 2015. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. journal of Computational and Graphical Statistics 24, 1 (2015), 44--65.
[23]
David Gunning and David Aha. 2019. DARPA's explainable artificial intelligence (XAI) program. AI magazine 40, 2 (2019), 44--58.
[25]
Lisa Anne Hendricks, Zeynep Akata, Marcus Rohrbach, Jeff Donahue, Bernt Schiele, and Trevor Darrell. 2016. Generating visual explanations. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV 14. Springer, 3--19.
[26]
Andreas Holzinger, Randy Goebel, Ruth Fong, Taesup Moon, Klaus-Robert Mueller, and Wojciech Samek. 2022. xxAIbeyond explainable artificial intelligence. In xxAI-Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers. Springer, 3--10.
[28]
Been Kim, Rajiv Khanna, and Oluwasanmi O Koyejo. 2016. Examples are not enough, learn to criticize! criticism for interpretability. Advances in neural information processing systems 29 (2016).
[29]
Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: principles and techniques. MIT press.
[30]
Kyu Hyung Lee, Xiangyu Zhang, and Dongyan Xu. 2013. High Accuracy Attack Provenance via Binary-based Execution Partition. In 20th Annual Network and Distributed System Security Symposium, NDSS 2013, San Diego, California, USA, February 24--27, 2013. The Internet Society. https://www.ndss-symposium.org/ndss2013/high-accuracy-attackprovenance- binary-based-execution-partition
[31]
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco Domenico Santambrogio, Markus Weimer, and Matteo Interlandi. 2018. {PRETZEL}: Opening the black box of machine learning prediction serving systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 611--626.
[33]
Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
[38]
Kevin Murphy, Yair Weiss, and Michael I Jordan. 2013. Loopy belief propagation for approximate inference: An empirical study. arXiv preprint arXiv:1301.6725 (2013).
[39]
Rohan Paris. [n. d.]. Credit Score Classification. https://www.kaggle.com/datasets/parisrohan/credit-score-classification
[42]
Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d'Alché Buc, Emily Fox, and Hugo Larochelle. 2021. Improving Reproducibility in Machine Learning Research (a Report from the NeurIPS 2019 Reproducibility Program). J. Mach. Learn. Res. 22, 1, Article 164 (jan 2021), 20 pages.
[43]
Inc. Preferred Networks. 2021. Overview of PyTorch Autograd Engine. https://pytorch.org/blog/overview-of-pytorchautograd- engine
[44]
Recent Advances in Trustworthy Explainable Artificial Intelligence: Status, Challenges, and Perspectives

Atul Rawal, James McCoy, Danda B. Rawat et al.

IEEE Transactions on Artificial Intelligence 10.1109/tai.2021.3133846
[45]
Christopher Ré and Dan Suciu. 2008. Approximate lineage for probabilistic databases. In PVLDB. 797--808. 10.14778/1453856.1453943
[47]
Anchors: High-Precision Model-Agnostic Explanations

Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin

Proceedings of the AAAI Conference on Artificial I... 10.1609/aaai.v32i1.11491
[50]
Sebastian Schelter, Joos-Hendrik Böse, Johannes Kirschnick, Thoralf Klein, and Stephan Seufert. 2017. Automatically tracking metadata and provenance of machine learning experiments. In NeurIPS 2017. https://www.amazon.science/ publications/automatically-tracking-metadata-and-provenance-of-machine-learning-experiments

Showing 50 of 62 references

Cited By
4
AI in control: Rethinking cybersecurity compliance and auditing

Fatma Yasmine Loumachi, Márcio J. Lacerda · 2026

Information and Software Technology
Metrics
4
Citations
62
References
Details
Published
Dec 18, 2024
Vol/Issue
2(6)
Pages
1-27
License
View
Funding
National Science Foundation Award: CNS-1704189
Cite This Article
Jiachi Zhang, Wenchao Zhou, Benjamin E. Ujcich (2024). Provenance-Enabled Explainable AI. Proceedings of the ACM on Management of Data, 2(6), 1-27. https://doi.org/10.1145/3698826