Abstract
The sequential recommendation problem has attracted considerable research attention in the past few years, leading to the rise of numerous recommendation models. In this work, we explore how Large Language Models (LLMs), which are nowadays introducing disruptive effects in many AI-based applications, can be used to build or improve sequential recommendation approaches. Specifically, we design three orthogonal approaches and hybrids of those to leverage the power of LLMs in different ways. In addition, we investigate the potential of each approach by focusing on its technical aspects and determining an array of alternative choices for each one. We conduct extensive experiments on three datasets and explore a large variety of configurations, including different language models and baseline recommendation models, to obtain a comprehensive picture of the performance of each approach.

Among other observations, we highlight that initializing state-of-the-art sequential recommendation models such as BERT4Rec or SASRec with embeddings obtained from an LLM can lead to substantial performance gains in terms of accuracy. Furthermore, we find that fine-tuning an LLM for recommendation tasks enables it to learn not only the tasks but also the concepts of a domain to some extent. We also show that fine-tuning OpenAI GPT leads to considerably better performance than fine-tuning Google PaLM 2. Overall, our extensive experiments indicate a huge potential value of leveraging LLMs in future recommendation approaches. We publicly share the code and data of our experiments to ensure reproducibility.

1
Topics

No keywords indexed for this article. Browse by subject →

References
88
[1]
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL].
[4]
Keqin Bao Jizhi Zhang Yang Zhang Wenjie Wang Fuli Feng and Xiangnan He. 2023. TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation. arxiv:2305.00447 [cs.IR]. 10.1145/3604915.3608857
[5]
Lukas Berglund Meg Tong Max Kaufmann Mikita Balesni Asa Cooper Stickland Tomasz Korbak and Owain Evans. 2023. The Reversal Curse: LLMs Trained on “A Is B” Fail to Learn “B Is A.”arxiv:2309.12288 [cs.CL].
[6]
James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems, J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger (Eds.), Vol. 24. Curran Associates, Inc.
[7]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al.2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901.
[8]
Sanyuan Chen Yutai Hou Yiming Cui Wanxiang Che Ting Liu and Xiangzhan Yu. 2020. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting. arxiv:2004.12651 [cs.CL]. 10.18653/v1/2020.emnlp-main.634
[10]
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre et al.

Proceedings of the 2014 Conference on Empirical Me... 10.3115/v1/d14-1179
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT ’19), Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186.
[14]
Hao Ding, Anoop Deoras, Yuyang (Bernie) Wang, and Hao Wang. 2022. Zero shot recommender systems. In ICLR 2022 Workshop on Deep Generative Models for Highly Structured Data.
[15]
Yao Fu Litu Ou Mingyu Chen Yuhao Wan Hao Peng and Tushar Khot. 2023. Chain-of-thought Hub: A Continuous Effort to Measure Large Language Models’ Reasoning Performance. arxiv:2305.17306 [cs.CL].
[16]
Yunfan Gao Tao Sheng Youlin Xiang Yun Xiong Haofen Wang and Jiawei Zhang. 2023. Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System. arxiv:2303.14524 [cs.IR].
[22]
Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag. 2023. TabLLM: Few-shot classification of tabular data with large language models. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics(Proceedings of Machine Learning Research, Vol. 206). PMLR, 5549–5581.
[23]
Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based recommendations with recurrent neural networks. In 4th International Conference on Learning Representations (ICLR ’16).
[26]
Yupeng Hou Junjie Zhang Zihan Lin Hongyu Lu Ruobing Xie Julian McAuley and Wayne Xin Zhao. 2023. Large Language Models are Zero-shot Rankers for Recommender Systems. arxiv:2305.08845 [cs.IR].
[33]
Nikhil Kandpal Haikang Deng Adam Roberts Eric Wallace and Colin Raffel. 2023. Large Language Models Struggle to Learn Long-tail Knowledge. arxiv:2211.08411 [cs.CL].
[35]
Wang-Cheng Kang Jianmo Ni Nikhil Mehta Maheswaran Sathiamoorthy Lichan Hong Ed Chi and Derek Zhiyuan Cheng. 2023. Do LLMs Understand User Preferences? Evaluating LLMs on User Rating Prediction. arxiv:2305.06474 [cs.IR].
[36]
Barrie Kersbergen, Olivier Sprangers, and Sebastian Schelter. 2022. Serenade—Low-latency session-based recommendation in e-Commerce at scale. In SIGMOD.
[37]
Overcoming catastrophic forgetting in neural networks

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz et al.

Proceedings of the National Academy of Sciences 10.1073/pnas.1611835114
[42]
Jinming Li Wentao Zhang Tian Wang Guanglei Xiong Alan Lu and Gerard Medioni. 2023. GPT4Rec: A Generative Framework for Personalized Recommendation and User Interests Interpretation. arxiv:2304.03879 [cs.IR].
[43]
Ruyu Li Wenhao Deng Yu Cheng Zheng Yuan Jiaqi Zhang and Fajie Yuan. 2023. Exploring the Upper Limits of Text-based Collaborative Filtering Using Large Language Models: Discoveries and Insights. arxiv:2305.11700 [cs.IR].
[44]
Xinyi Li Yongfeng Zhang and Edward C. Malthouse. 2023. PBNR: Prompt-based News Recommender System. arxiv:2304.07862 [cs.IR].
[45]
Xinyi Li Yongfeng Zhang and Edward C. Malthouse. 2023. A Preliminary Study of ChatGPT on News Recommendation: Personalization Provider Fairness Fake News. arxiv:2306.10702 [cs.IR].
[48]
Jianghao Lin Xinyi Dai Yunjia Xi Weiwen Liu Bo Chen Xiangyang Li Chenxu Zhu Huifeng Guo Yong Yu Ruiming Tang and Weinan Zhang. 2023. How Can Recommender Systems Benefit from Large Language Models: A Survey. arxiv:2306.05817 [cs.IR].
[49]
Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network in network. In 2nd International Conference on Learning Representations (ICLR ’14), Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). arxiv:1312.4400
[50]
Junling Liu Chao Liu Renjie Lv Kang Zhou and Yan Zhang. 2023. Is ChatGPT a Good Recommender? A Preliminary Study. arxiv:2304.10149 [cs.IR].

Showing 50 of 88 references

Metrics
9
Citations
88
References
Details
Published
Nov 24, 2025
Vol/Issue
4(2)
Pages
1-35
Cite This Article
Artun Boz, Wouter Zorgdrager, Zoe Kotti, et al. (2025). Improving Sequential Recommendations with LLMs. ACM Transactions on Recommender Systems, 4(2), 1-35. https://doi.org/10.1145/3711667