Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations

Robert S. Sullivan; Luca Longo

doi:10.3390/make5040072

journal article Open Access Oct 09, 2023

Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations

Robert S. Sullivan

Luca Longo

Machine Learning and Knowledge Extraction Vol. 5 No. 4 pp. 1433-1455 · MDPI AG

View at Publisher Save 10.3390/make5040072

Abstract

Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making processes but Deep Reinforcement Learning (DRL) lacks interpretability, limiting its adoption in regulated sectors like manufacturing, finance, and healthcare. Difficulties arise from DRL’s opaque decision-making, hindering efficiency and resource use, this issue is amplified with every advancement. While many seek to move from Experience Replay to A3C, the latter demands more resources. Despite efforts to improve Experience Replay selection strategies, there is a tendency to keep the capacity high. We investigate training a Deep Convolutional Q-learning agent across 20 Atari games intentionally reducing Experience Replay capacity from 1×106 to 5×102. We find that a reduction from 1×104 to 5×103 doesn’t significantly affect rewards, offering a practical path to resource-efficient DRL. To illuminate agent decisions and align them with game mechanics, we employ a novel method: visualizing Experience Replay via Deep SHAP Explainer. This approach fosters comprehension and transparent, interpretable explanations, though any capacity reduction must be cautious to avoid overfitting. Our study demonstrates the feasibility of reducing Experience Replay and advocates for transparent, interpretable decision explanations using the Deep SHAP Explainer to promote enhancing resource efficiency in Experience Replay.

Topics

No keywords indexed for this article. Browse by subject →

References

45

[1]

Li, Y. (2023, June 06). Reinforcement Learning Applications. CoRR, Available online: http://xxx.lanl.gov/abs/1908.06973.

[2]

Li "Deep reinforcement learning in smart manufacturing: A review and prospects" CIRP J. Manuf. Sci. Technol. (2023) 10.1016/j.cirpj.2022.11.003

[3]

Wu "Adaptive stock trading strategies with deep reinforcement learning methods" Inf. Sci. (2020) 10.1016/j.ins.2020.05.066

[4]

Reinforcement Learning in Healthcare: A Survey

Chao Yu, Jiming Liu, Shamim Nemati et al.

ACM Computing Surveys 2021 10.1145/3477600

[5]

Vouros "Explainable Deep Reinforcement Learning: State of the Art and Challenges" ACM Comput. Surv. (2022) 10.1145/3527448

[6]

Strubell "Energy and Policy Considerations for Modern Deep Learning Research" Proc. AAAI Conf. Artif. Intell. (2020)

[7]

Thompson "Deep Learning’s Diminishing Returns: The Cost of Improvement is Becoming Unsustainable" IEEE Spectr. (2021) 10.1109/mspec.2021.9563954

[8]

Heuillet "Explainability in deep reinforcement learning" Knowl.-Based Syst. (2021) 10.1016/j.knosys.2020.106685

[9]

Shrikumar, A., Greenside, P., and Kundaje, A. (2017, January 6–11). Learning Important Features through Propagating Activation Differences. Proceedings of the ICML’17, 34th International Conference on Machine Learning—Volume 70, Sydney, Australia.

[10]

Lundberg, S.M., and Lee, S.I. (2017). A Unified Approach to Interpreting Model Predictions, Curran Associates Inc.

[11]

Human-level control through deep reinforcement learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver et al.

Nature 2015 10.1038/nature14236

[12]

Zhang, S., and Sutton, R.S. (2017). A deeper look at experience replay. Deep Reinforcement Learning Symposium. NIPS.

[13]

Bruin "Experience Selection in Deep Reinforcement Learning for Control" J. Mach. Learn. Res. (2018)

[14]

Fedus, W., Ramachandran, P., Agarwal, R., Bengio, Y., Larochelle, H., Rowland, M., and Dabney, W. (2020, January 12–18). Revisiting Fundamentals of Experience Replay. Proceedings of the ICML’20, 37th International Conference on Machine Learning—Volume 119, Vienna, Austria.

[15]

Bilgin, E. (2020). Mastering Reinforcement Learning with Python: Build Next-Generation, Self-Learning Models Using Reinforcement Learning Techniques and Best Practices, Packt Publishing.

[16]

De Ponteves, H. (2019). AI Crash Course: A Fun and Hands-On Introduction to Reinforcement Learning, Deep Learning, and Artificial Intelligence with Python, Expert Insight, Packt Publishing.

[17]

Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, The MIT Press. [2nd ed.].

[18]

Wiering, M., and van Otterlo, M. (2012). Reinforcement Learning: State-of-the-Art, Springer. 10.1007/978-3-642-27645-3

[19]

White "A Survey of Applications of Markov Decision Processes" J. Oper. Res. Soc. (1993) 10.1057/jors.1993.181

[20]

Bayesian Reinforcement Learning: A Survey

Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau et al.

Foundations and Trends® in Machine Learning 2015 10.1561/2200000049

[21]

Wu "Dyna-PPO reinforcement learning with Gaussian process for the continuous action decision-making in autonomous driving" Appl. Intell. (2022) 10.1007/s10489-022-04354-x

[22]

Learning to predict by the methods of temporal differences

Richard S. Sutton

Machine Learning 1988 10.1007/bf00115009

[23]

Bellman, R. (1957). Dynamic Programming, Dover Publications.

[24]

Bach, J., and Edelkamp, S. (2011, January 4–7). Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax. Proceedings of the KI 2011: Advances in Artificial Intelligence, Berlin, Germany.

[25]

Lanham, M. (2020). Hands-On Reinforcement Learning for Games: Implementing Self-Learning Agents in Games Using Artificial Intelligence Techniques, Packt Publishing.

[26]

Bellemare "The Arcade Learning Environment: An Evaluation Platform for General Agents" J. Artif. Int. Res. (2013)

[27]

Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., and Kavukcuoglu, K. (2016, January 19–24). Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the PMLR’16, 33rd International Conference on Machine Learning—Volume 48, New York, NY, USA.

[28]

Lin "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching" Mach. Learn. (1992) 10.1007/bf00992699

[29]

Schaul, T., Quan, J., Antonoglou, I., and Silver, D. (2016, January 2–4). Prioritized Experience Replay. Proceedings of the 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico. Conference Track Proceedings; Bengio, Y., LeCun, Y., Eds.; 2016.

[30]

Ramicic, M., and Bonarini, A. (2017). Attention-Based Experience Replay in Deep Q-Learning, Association for Computing Machinery. 10.1145/3055635.3056621

[31]

Sovrano "Explanation-Aware Experience Replay in Rule-Dense Environments" IEEE Robot. Autom. Lett. (2021) 10.1109/lra.2021.3135927

[32]

Osei, R.S., and Lopez, D. (2023). Experience Replay Optimisation via ATSC and TSC for Performance Stability in Deep RL. Appl. Sci., 13. 10.3390/app13042034

[33]

Kapturowski, S., Campos, V., Jiang, R., Rakicevic, N., van Hasselt, H., Blundell, C., and Badia, A.P. (2023, January 1–5). Human-level Atari 200x faster. Proceedings of the The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda.

[34]

Vilone "A Quantitative Evaluation of Global, Rule-Based Explanations of Post-Hoc, Model Agnostic Methods" Front. Artif. Intell. (2021) 10.3389/frai.2021.717899

[35]

Longo, L., Goebel, R., Lécué, F., Kieseberg, P., and Holzinger, A. (2020, January 25–28). Explainable Artificial Intelligence: Concepts, Applications, Research Challenges and Visions. Proceedings of the Machine Learning and Knowledge Extraction—4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020, Dublin, Ireland.

[36]

Vilone "Classification of Explainable Artificial Intelligence Methods through Their Output Formats" Mach. Learn. Knowl. Extr. (2021) 10.3390/make3030032

[37]

Keramati "Cocaine addiction as a homeostatic reinforcement learning disorder" Psychol. Rev. (2017) 10.1037/rev0000046

[38]

Miralles-Pechuán, L., Jiménez, F., Ponce, H., and Martinez-Villaseñor, L. (2020). A Methodology Based on Deep Q-Learning/Genetic Algorithms for Optimizing COVID-19 Pandemic Government Actions, Association for Computing Machinery. 10.1145/3340531.3412179

[39]

Explainable AI in Deep Reinforcement Learning Models for Power System Emergency Control

Ke Zhang, Jun Zhang, Pei-Dong Xu et al.

IEEE Transactions on Computational Social Systems 2022 10.1109/tcss.2021.3096824

[40]

Thirupathi, A.N., Alhanai, T., and Ghassemi, M.M. (2022). A Machine Learning Approach to Detect Early Signs of Startup Success, Association for Computing Machinery. 10.1145/3490354.3494374

[41]

Ras "Explainable Deep Learning: A Field Guide for the Uninitiated" J. Artif. Int. Res. (2022)

[42]

Kumar, S., Vishal, M., and Ravi, V. (2023, June 06). Explainable Reinforcement Learning on Financial Stock Trading Using SHAP. CoRR, Available online: http://xxx.lanl.gov/abs/2208.08790.

[43]

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv.

[44]

An analysis of variance test for normality (complete samples)

S. S. SHAPIRO, M. B. WILK

Biometrika 1965 10.1093/biomet/52.3-4.591

[45]

Use of Ranks in One-Criterion Variance Analysis

William H. Kruskal, W. Allen Wallis

Journal of the American Statistical Association 1952 10.1080/01621459.1952.10483441

Metrics

8

Citations

45

References

Details

Published: Oct 09, 2023
Vol/Issue: 5(4)
Pages: 1433-1455
License: View

Authors

R

Robert S. Sullivan

Artificial Intelligence and Cognitive Load Research Lab, School of Computer Science, Technological University Dublin, Grangegorman, D07 ADY7 Dublin, Ireland

L

Luca Longo

Artificial Intelligence and Cognitive Load Research Lab, School of Computer Science, Technological University Dublin, Grangegorman, D07 ADY7 Dublin, Ireland

Cite This Article

Robert S. Sullivan, Luca Longo (2023). Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations. Machine Learning and Knowledge Extraction, 5(4), 1433-1455. https://doi.org/10.3390/make5040072

Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations

You May Also Like