journal article Jul 01, 2026

A systematic mapping study on the research landscape of LLM-based code clone detection

View at Publisher Save 10.1016/j.infsof.2026.108096
Topics

No keywords indexed for this article. Browse by subject →

References
62
[1]
Cordy "The NiCad clone detector" (2011)
[2]
Kamiya "CCFinder: A multilinguistic token-based code clone detection system for large scale source code" IEEE Trans. Softw. Eng. (2002) 10.1109/tse.2002.1019480
[3]
Tsantalis "Ten years of JDeodorant: Lessons learned from the hunt for smells" (2018)
[4]
Tsantalis "Identification of extract method refactoring opportunities for the decomposition of methods" J. Syst. Softw. (2011) 10.1016/j.jss.2011.05.016
[5]
E.A. AlOmar, A. Ivanov, Z. Kurbatova, Y. Golubev, M.W. Mkaouer, A. Ouni, T. Bryksin, L. Nguyen, A. Kini, A. Thakur, AntiCopyPaster: extracting code duplicates as soon as they are introduced in the IDE, in: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–4. 10.1145/3551349.3559537
[6]
E.A. AlOmar, B. Knobloch, T. Kain, C. Kalish, M.W. Mkaouer, A. Ouni, AntiCopyPaster 2.0: Whitebox just-in-time code duplicates extraction, in: Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, 2024, pp. 84–88. 10.1145/3639478.3640035
[7]
AlOmar "AntiCopyPaster 3.0: Just-in-time clone refactoring" ACM Trans. Softw. Eng. Methodol. (2025) 10.1145/3749100
[8]
AlOmar "Just-in-time code duplicates extraction" Inf. Softw. Technol. (2023) 10.1016/j.infsof.2023.107169
[9]
Martinez "Software refactoring research with large language models: A systematic literature review" J. Syst. Softw. (2025)
[10]
Roy "A survey on software clone detection research" (2007)
[11]
Fowler (2018)
[12]
AlOmar "On preserving the behavior in software refactoring: A systematic mapping study" Inf. Softw. Technol. (2021) 10.1016/j.infsof.2021.106675
[13]
AlOmar "Behind the intent of extract method refactoring: A systematic literature review" IEEE Trans. Softw. Eng. (2024) 10.1109/tse.2023.3345800
[14]
Lei "Deep learning application on code clone detection: A review of current knowledge" J. Syst. Softw. (2022) 10.1016/j.jss.2021.111141
[15]
Sheneamer "A survey of software clone detection techniques" Int. J. Comput. Appl. (2016)
[16]
Rattan "Software clone detection: A systematic review" Inf. Softw. Technol. (2013) 10.1016/j.infsof.2013.01.008
[17]
Husein "Large language models for code completion: A systematic literature review" (2025)
[18]
N. Raihan, M.L. Siddiq, J.C. Santos, M. Zampieri, Large language models in computer science education: A systematic literature review, in: Proceedings of the 56th ACM Technical Symposium on Computer Science Education V. 1, 2025, pp. 938–944. 10.1145/3641554.3701863
[19]
Kitchenham (2007)
[20]
Li "Understanding and addressing quality attributes of microservices architecture: A systematic literature review" Inf. Softw. Technol. (2021) 10.1016/j.infsof.2020.106449
[21]
Dybå "Empirical studies of agile software development: A systematic review" Inf. Softw. Technol. (2008) 10.1016/j.infsof.2008.01.006
[22]
Nashaat "An enhanced transformer-based framework for interpretable code clone detection" J. Syst. Softw. (2025) 10.1016/j.jss.2025.112347
[23]
Eagal "Analyzing the dependability of large language models for code clone generation" J. Syst. Softw. (2025) 10.1016/j.jss.2025.112548
[24]
Qian "Can large language models identify and refactor code clones? An empirical study" J. Syst. Softw. (2025)
[25]
Zhang "Assessing the code clone detection capability of large language models" (2024)
[26]
Rabbani "A comparative analysis of clone detection techniques on semanticclonebench" (2022)
[27]
Pinku "On the use of deep learning models for semantic clone detection" (2024)
[28]
Alam "GPTCloneBench: A comprehensive benchmark of semantic clones and cross-language clones using GPT-3 model and SemanticCloneBench" (2023)
[29]
Alam "Are classical clone detectors good enough for the AI era?" (2025)
[30]
Roy "Unveiling the potential of large language models in generating semantic and cross-language clones" (2023)
[31]
Pinku "Pathways to leverage transcompiler based data augmentation for cross-language clone detection" (2023)
[32]
M. Khajezade, J.J. Wu, F.H. Fard, G. Rodríguez-Pérez, M.S. Shehata, Investigating the efficacy of large language models for code clone detection, in: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, 2024, pp. 161–165. 10.1145/3643916.3645030
[33]
M.B. Moumoula, A.K. Kabore, J. Klein, T.F. Bissyande, Cross-lingual Code Clone Detection: When LLMs Fail Short Against Embedding-based Classifier, in: Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024, pp. 2474–2475. 10.1145/3691620.3695335
[34]
Li "ZC 3: Zero-shot cross-language code clone detection" (2023)
[35]
K. Kitsios, F. Sovrano, E.T. Barr, A. Bacchelli, Detecting semantic clones of unseen functionality, in: International Conference on Automated Software Engineering, 2025. 10.1109/ase63991.2025.00112
[36]
Khajezade "Evaluating few-shot and contrastive learning methods for code clone detection" Empir. Softw. Eng. (2024) 10.1007/s10664-024-10441-z
[37]
Inoue "Improving accuracy of LLM-based code clone detection u sing functionally equivalent methods" (2024)
[38]
Y. Xie, X. Hou, Y. Zhao, K. Chen, H. Wang, LLM App Squatting and Cloning, in: Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, 2025, pp. 64–74. 10.1145/3696630.3728532
[39]
N. Sorokin, D. Abulkhanov, S. Nikolenko, V. Malykh, CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search, in: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), 2025, pp. 178–185. 10.18653/v1/2025.naacl-srw.17
[40]
Gupta "A generative ai-driven method-level semantic clone detection based on the structural and semantical comparison of methods" IEEE Access (2024) 10.1109/access.2024.3401770
[41]
Li "Nuanced code clone detection through LLM-based code revision and AST graph modeling" IEEE Access (2025) 10.1109/access.2025.3628856
[42]
Moumoula (2024)
[43]
Almatrafi "Code clone detection techniques based on large language models" IEEE Access (2025) 10.1109/access.2025.3549780
[44]
Bhaskar "A comprehensive analysis of unified approaches for revealing code clone detection" (2024)
[45]
Zhang "Exploring the boundaries between LLM code clone detection and code similarity assessment on human and AI-generated code" Big Data Cogn. Comput. (2025) 10.3390/bdcc9020041
[46]
López "On inter-dataset code duplication and data leakage in large language models" IEEE Trans. Softw. Eng. (2024)
[47]
Moumoula "The struggles of LLMs in cross-lingual code clone detection" Proc. ACM Softw. Eng. (2025) 10.1145/3715764
[48]
A. Almatrafi "Hybrid intelligent architecture for context-driven code clone detection" J. King Abdulaziz Univ.: Comput. Inf. Technol. Sci. (2025)
[49]
Z. Zhang, T. Saber, Assessing code clone detection capabilities of large language models on human and AI-generated code: Zero-shot and fine-tuning approaches, Available At SSRN 4979508.
[50]
Shirafuji "Refactoring programs using large language models with few-shot examples" (2023)

Showing 50 of 62 references

Metrics
0
Citations
62
References
Details
Published
Jul 01, 2026
Vol/Issue
195
Pages
108096
License
View
Cite This Article
Bowen Jiang, Mitchell Ruffolo, Aaditya Kulkarni, et al. (2026). A systematic mapping study on the research landscape of LLM-based code clone detection. Information and Software Technology, 195, 108096. https://doi.org/10.1016/j.infsof.2026.108096
Related

You May Also Like