SHARQ: Explainability Framework for Association Rules on Relational Data

Hadar Ben-Efraim; Susan B. Davidson; Amit Somech

doi:10.1145/3709726

journal article Feb 10, 2025

SHARQ: Explainability Framework for Association Rules on Relational Data

Hadar Ben-Efraim

Susan B. Davidson

Amit Somech

Proceedings of the ACM on Management of Data Vol. 3 No. 1 pp. 1-25 · Association for Computing Machinery (ACM)

View at Publisher Save 10.1145/3709726

Abstract

Association rules are an important technique for gaining insights over large relational datasets consisting of tuples of elements (i.e. attribute-value pairs). However, it is difficult to explain the relative importance of data elements with respect to the rules in which they appear. This paper develops a measure of an element's contribution to a set of association rules based on Shapley values, denoted SHARQ (ShApley Rules Quantification). As is the case with many Shapely-based computations, the cost of a naive calculation of the score is exponential in the number of elements. To that end, we present an efficient framework for computing the exact SHARQ value of a single element whose running time is practically linear in the number of rules. Going one step further, we develop an efficient multi-element SHARQ algorithm which amortizes the cost of the single element SHARQ calculation over a set of elements. Based on the definition of SHARQ for elements we describe two additional use-cases for association rules explainability: rule importance and attribute importance. Extensive experiments over a novel benchmark dataset containing 67 instances of mined rule sets show the effectiveness of our approach.

Topics

No keywords indexed for this article. Browse by subject →

References

75

[1]

Adults Income Dataset (UCI). 2024. https://archive.ics.uci.edu/ml/datasets/Adult/. (2024).

[2]

10.1145/170035.170072

[3]

Rakesh Agrawal, Ramakrishnan Srikant, et al. 1994. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, Vol. 1215. Citeseer, 487--499.

[4]

10.1016/j.imu.2019.100204

[5]

10.1145/1989284.1989302

[6]

10.1145/2723372.2735361

[7]

10.1145/312129.312219

[8]

10.1145/3615952.3615954

[9]

10.14778/3007263.3007301

[10]

10.1145/253260.253327

[11]

10.1145/253260.253325

[12]

P. Buneman S. Khanna and W.C. Tan. 2001. Why and Where: A Characterization of Data Provenance. In ICDT. 316--330. 10.1007/3-540-44503-x_20

[13]

10.1007/s10115-006-0039-1

[14]

X Chang, H Li, Y Fu, and D Yang. 2018. Knowledge-Based Error Detection in External Beam Physician Orders Using Association Rules. International Journal of Radiation Oncology, Biology, Physics, Vol. 102, 3 (2018), S119--S120.

[15]

10.1145/1559845.1559901

[16]

10.1093/bioinformatics/19.1.79

[17]

10.1145/3514221.3520172

[18]

10.1145/3514221.3520172

[19]

10.1145/3459637.3482341

[20]

10.14778/3415478.3415508

[21]

Jose A Diaz-Garcia, M Dolores Ruiz, and Maria J Martin-Bautista. 2022. A survey on the use of association rules mining techniques in textual social media. Artificial Intelligence Review (2022), 1--26.

[22]

10.1007/s40747-021-00607-3

[23]

Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).

[24]

10.1007/bfb0094799

[25]

10.1145/1132960.1132963

[26]

T.J. Green G. Karvounarakis and V. Tannen. 2007. Provenance semirings. In PODS. 31--40. 10.1145/1265530.1265535

[27]

10.1007/s11573-016-0822-8

[28]

Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. ACM sigmod record, Vol. 29, 2 (2000), 1--12.

[29]

10.1007/978-1-4757-3283-2

[30]

Hao Huang, Qian Yan, Wei Lu, Huaizhong Lin, Yunjun Gao, and Lei Chen. 2019. LERI: Local Exploration for Rare-Category Identification. IEEE Transactions on Knowledge and Data Engineering, Vol. 32, 9 (2019), 1761--1772.

[31]

Richard A Huebner. 2009. Diversity-based interestingness measures for association rule mining. Proceedings of ASBBS, Vol. 16, 1 (2009).

[32]

10.1145/3306618.3314230

[33]

ISOLET Dataset (UCI). 2024. https://archive.ics.uci.edu/dataset/54/isolet. (2024).

[34]

10.1007/3-540-47887-6_13

[35]

Neil Jethani, Mukund Sudarshan, Ian Connick Covert, Su-In Lee, and Rajesh Ranganath. 2021. FastSHAP: Real-Time Shapley Value Estimation. In International Conference on Learning Representations.

[36]

COVID-19 Symptoms (Kaggle). 2020. https://www.kaggle.com/datasets/iamhungundji/covid19-symptoms-checker. (2020).

[37]

Spotify Dataset (Kaggle). 2024. https://www.kaggle.com/mrmorj/dataset-of-songs-in-spotify. (2024).

[38]

10.1038/s41598-021-96912-5

[39]

10.1109/icde.1997.581756

[40]

10.3389/fpsyg.2020.582480

[41]

10.5555/1148928.1700950

[42]

10.3390/e23010018

[43]

Ester Livshits, Leopoldo Bertossi, Benny Kimelfeld, and Moshe Sebag. 2019. The Shapley value of tuples in query answering. arXiv preprint arXiv:1904.08679 (2019).

[44]

Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, Vol. 2, 1 (2020), 56--67.

[45]

Scott M Lundberg and Su-In Lee. 2017a. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf

[46]

Scott M Lundberg and Su-In Lee. 2017b. A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems 30. 4765--4774.

[47]

Yuyu Luo Xuedi Qin Nan Tang and Guoliang Li. 2018. DeepEye: Towards Automatic Data Visualization. ICDE. 10.1109/icde.2018.00019

[48]

Data Structures for Statistical Computing in Python

Wes McKinney

Proceedings of the Python in Science Conference 10.25080/majora-92bf1922-00a

[49]

Rory Mitchell, Joshua Cooper, Eibe Frank, and Geoffrey Holmes. 2022. Sampling permutations for shapley value estimation. Journal of Machine Learning Research, Vol. 23, 43 (2022), 1--46.

[50]

United States Department of Transportation. 2015. 2015 Flight Delays and Cancellations. https://www.kaggle.com/usdot/flight-delays. (2015).

Showing 50 of 75 references

Metrics

1

Citations

75

References

Details

Published: Feb 10, 2025
Vol/Issue: 3(1)
Pages: 1-25
License: View

Authors

H

Hadar Ben-Efraim

Bar-Ilan University, Ramat Gan, Israel

S

Susan B. Davidson

University of Pennsylvania, Philadelphia, PA, USA

A

Amit Somech

Bar-Ilan University, Ramat Gan, Israel

Funding

BSF Award: 2022279

Cite This Article

Hadar Ben-Efraim, Susan B. Davidson, Amit Somech (2025). SHARQ: Explainability Framework for Association Rules on Relational Data. Proceedings of the ACM on Management of Data, 3(1), 1-25. https://doi.org/10.1145/3709726

SHARQ: Explainability Framework for Association Rules on Relational Data

You May Also Like