journal article Open Access May 27, 2025

Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead

Abstract
The significant advancements in Large Language Models (LLMs) have resulted in their widespread adoption across various tasks within Software Engineering (SE), including vulnerability detection and repair. Numerous studies have investigated the application of LLMs to enhance vulnerability detection and repair tasks. Despite the increasing research interest, there is currently no existing survey that focuses on the utilization of LLMs for vulnerability detection and repair. In this paper, we aim to bridge this gap by offering a systematic literature review of approaches aimed at improving vulnerability detection and repair through the utilization of LLMs. The review encompasses research work from leading SE, AI, and Security conferences and journals, encompassing 43 papers published across 25 distinct venues, along with 15 high-quality preprint papers, bringing the total to 58 papers. By answering three key research questions, we aim to (1) summarize the LLMs employed in the relevant literature, (2) categorize various LLM adaptation techniques in vulnerability detection, and (3) classify various LLM adaptation techniques in vulnerability repair. Based on our findings, we have identified a series of limitations of existing studies. Additionally, we have outlined a roadmap highlighting potential opportunities that we believe are pertinent and crucial for future research endeavors.
Topics

No keywords indexed for this article. Browse by subject →

References
133
[1]
ACM Digital Library. Retrieved from https://dl.acm.org
[2]
arXiv Database. Retrieved from https://arxiv.org
[3]
IEEE Xplore Database. Retrieved from https://ieeexplore.ieee.org
[4]
ScienceDirect Database. Retrieved from https://www.sciencedirect.com
[5]
SpringerLink Database. Retrieved from https://link.springer.com
[6]
Web of Science Database. Retrieved from https://www.webofscience.com
[7]
Wiely Database. Retrieved from https://onlinelibrary.wiley.com
[8]
Online Appendix for This Review. 2024. Retrieved from https://docs.google.com/document/d/18-UrkfH35CNMGRjjsDYZGK6L1aC9wP3GsKCtrIekcUQ/edit?usp=sharing
[9]
Baleegh Ahmad Shailja Thakur Benjamin Tan Ramesh Karri and Hammond Pearce. 2023. Fixing hardware security bugs with large language models. arXiv:2302.01215. Retrieved from https://arxiv.org/abs/2302.01215
[10]
Wasi Uddin Ahmad Saikat Chakraborty Baishakhi Ray and Kai-Wei Chang. 2021. Unified pre-training for program understanding and generation. arXiv:2103.06333. Retrieved from https://arxiv.org/abs/2103.06333 10.18653/v1/2021.naacl-main.211
[11]
Akari Asai Zeqiu Wu Yizhong Wang Avirup Sil and Hannaneh Hajishirzi. 2023. Self-rag: Learning to retrieve generate and critique through self-reflection. arXiv:2310.11511. Retrieved from https://arxiv.org/abs/2310.11511
[12]
Berkay Berabi Alexey Gronskiy Veselin Raychev Gishor Sivanrupan Victor Chibotaru and Martin T. Vechev. 2024. DeepCode AI fix: Fixing security vulnerabilities with large language models. arXiv:2402.13291. Retrieved from https://arxiv.org/abs/2402.13291
[13]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. In Proceedings of the International Conference on Neural Information Processing Systems, 1877–1901.
[15]
Mark Chen Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman et al. 2021. Evaluating large language models trained on code. arXiv:2107.03374. Retrieved from https://arxiv.org/abs/2107.03374
[20]
Jacob Devlin Ming-Wei Chang Kenton Lee and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. Retrieved from https://arxiv.org/abs/1810.04805
[21]
Yangruibo Ding Yanjun Fu Omniyyah Ibrahim Chawin Sitawarin Xinyun Chen Basel Alomair David A. Wagner Baishakhi Ray and Yizheng Chen. 2024. Vulnerability detection with code language models: How far are we? arXiv:2403.18624. Retrieved from https://arxiv.org/abs/2403.18624
[22]
Xueying Du Geng Zheng Kaixin Wang Jiayi Feng Wentai Deng Mingwei Liu Bihuan Chen Xin Peng Tao Ma and Yiling Lou. 2024. Vul-RAG: Enhancing LLM-based vulnerability detection via knowledge-level RAG. arXiv:2406.11147. Retrieved from https://arxiv.org/abs/2406.11147
[23]
Zhangyin Feng Daya Guo Duyu Tang Nan Duan Xiaocheng Feng Ming Gong Linjun Shou Bing Qin Ting Liu Daxin Jiang et al. 2020. Codebert: A pre-trained model for programming and natural languages. arXiv:2002.08155. Retrieved from https://arxiv.org/abs/2002.08155 10.18653/v1/2020.findings-emnlp.139
[24]
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Scott Yih, Luke Zettlemoyer, and Mike Lewis. 2023. InCoder: A generative model for code infilling and synthesis. In Proceedings of the 11th International Conference on Learning Representations (ICLR ’23). OpenReview.Net. Retrieved from https://openreview.net/pdf?id=hQwb-lbM6EL
[29]
Michael Fu, Chakkrit Tantithamthavorn, Van Nguyen, and Trung Le. 2023. ChatGPT for Vulnerability Detection, Classification, and Repair: How Far Are We? APSEC.
[31]
GitHub. 2023. GitHub Copilot. Retrieved from https://copilot.github.com
[32]
Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. 2022. UniXcoder: Unified cross-modal pre-training for code representation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL). ACL, 7212–7225.
[33]
Daya Guo Shuo Ren Shuai Lu Zhangyin Feng Duyu Tang Shujie Liu Long Zhou Nan Duan Alexey Svyatkovskiy Shengyu Fu et al. 2020. Graphcodebert: Pre-training code representations with data flow. arXiv:2009.08366. Retrieved from https://arxiv.org/abs/2009.08366
[35]
Jingxuan He and Martin Vechev. 2023. Large language models for code: Security hardening and adversarial testing. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 1865–1879.
[36]
Junda He, Xin Zhou, Bowen Xu, Ting Zhang, Kisub Kim, Zhou Yang, Ferdian Thung, Ivana Clairine Irsan, and David Lo. 2023. Representation learning for stack overflow posts: How far are we? ACM Transactions on Software Engineering and Methodology 33, 3, Article 69 (2023), 1–14.
[37]
Xinying Hou Yanjie Zhao Yue Liu Zhou Yang Kailong Wang Li Li Xiapu Luo David Lo John C. Grundy and Haoyu Wang. 2023. Large Language Models for Software Engineering: A Systematic Literature Review. Retrieved from https://api.semanticscholar.org/CorpusID:261048648
[38]
Nafis Tanveer Islam Joseph Khoury Andrew Seong Gonzalo De La Torre Parra Elias Bou-Harb and Peyman Najafirad. 2024. LLM-powered code vulnerability repair with reinforcement learning and semantic reward. arXiv:2401.03374. Retrieved from https://arxiv.org/abs/2401.03374
[39]
Nafis Tanveer Islam and Peyman Najafirad. 2024. Code security vulnerability repair using reinforcement learning with large language models. In Proceedings of the AAAI Workshop.
[40]
Minhao Jiang Ken Ziyu Liu Ming Zhong Rylan Schaeffer Siru Ouyang Jiawei Han and Sanmi Koyejo. 2024. Investigating data contamination for pre-training language models. arxiv:2401.06059.
[42]
Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2019. Learning and evaluating contextual embedding of source code. In Proceedings of the International Conference on Machine Learning. Retrieved from https://api.semanticscholar.org/CorpusID:220425306
[43]
Avishree Khare Saikat Dutta Ziyang Li Alaia Solko-Breslin Rajeev Alur and Mayur Naik. 2023. Understanding the effectiveness of large language models in detecting security vulnerabilities. arXiv: 2311.16169. Retrieved from https://arxiv.org/abs/2311.16169
[44]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS ’22). Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.), Retrieved from http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html
[47]
Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NeurIPS ’20). Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.), Retrieved from https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
[49]
Raymond Li Loubna Ben Allal Yangtian Zi Niklas Muennighoff Denis Kocetkov Chenghao Mou Marc Marone Christopher Akiki Jia Li Jenny Chim et al. 2023. StarCoder: May the Source Be with You! Retrieved from https://api.semanticscholar.org/CorpusID:258588247

Showing 50 of 133 references

Cited By
61
Systems and Soft Computing
Cybersecurity
Metrics
61
Citations
133
References
Details
Published
May 27, 2025
Vol/Issue
34(5)
Pages
1-31
License
View
Funding
National Research Foundation Award: NRF-NRFI08-2022-0002
Cite This Article
Xin Zhou, Sicong Cao, Xiaobing Sun, et al. (2025). Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead. ACM Transactions on Software Engineering and Methodology, 34(5), 1-31. https://doi.org/10.1145/3708522
Related

You May Also Like

Software Engineering for AI-Based Systems: A Survey

Silverio Martínez-Fernández, Justus Bogner · 2022

238 citations

Programming pervasive and mobile computing applications

Marco Mamei, Franco Zambonelli · 2009

157 citations