Practical and ethical challenges of large language models in education: A systematic scoping review

Lixiang Yan; Lele Sha; Linxuan Zhao; Yuheng Li; Roberto Martinez‐Maldonado; Guanliang Chen; Xinyu Li; Yueqiao Jin; Dragan Gašević

doi:10.1111/bjet.13370

journal article Open Access Aug 06, 2023

Practical and ethical challenges of large language models in education: A systematic scoping review

Lixiang Yan

Lele Sha Linxuan Zhao

Yuheng Li

Roberto Martinez‐Maldonado Guanliang Chen

Xinyu Li

Yueqiao Jin

Dragan Gašević

British Journal of Educational Technology Vol. 55 No. 1 pp. 90-112 · Wiley

View at Publisher Save 10.1111/bjet.13370

Abstract

Abstract

Educational technology innovations leveraging large language models (LLMs) have shown the potential to automate the laborious process of generating and analysing textual content. While various innovations have been developed to automate a range of educational tasks (eg, question generation, feedback provision, and essay grading), there are concerns regarding the practicality and ethicality of these innovations. Such concerns may hinder future research and the adoption of LLMs‐based innovations in authentic educational contexts. To address this, we conducted a systematic scoping review of 118 peer‐reviewed papers published since 2017 to pinpoint the current state of research on using LLMs to automate and support educational tasks. The findings revealed 53 use cases for LLMs in automating education tasks, categorised into nine main categories: profiling/labelling, detection, grading, teaching support, prediction, knowledge representation, feedback, content generation, and recommendation. Additionally, we also identified several practical and ethical challenges, including low technological readiness, lack of replicability and transparency and insufficient privacy and beneficence considerations. The findings were summarised into three recommendations for future studies, including updating existing innovations with state‐of‐the‐art models (eg, GPT‐3/4), embracing the initiative of open‐sourcing models/systems, and adopting a human‐centred approach throughout the developmental process. As the intersection of AI and education is continuously evolving, the findings of this study can serve as an essential reference point for researchers, allowing them to leverage the strengths, learn from the limitations, and uncover potential research opportunities enabled by ChatGPT and other generative AI models.

Practitioner notes

What is currently known about this topic

Generating and analysing text‐based content are time‐consuming and laborious tasks.

Large language models are capable of efficiently analysing an unprecedented amount of textual content and completing complex natural language processing and generation tasks.

Large language models have been increasingly used to develop educational technologies that aim to automate the generation and analysis of textual content, such as automated question generation and essay scoring.

What this paper adds

A comprehensive list of different educational tasks that could potentially benefit from LLMs‐based innovations through automation.

A structured assessment of the practicality and ethicality of existing LLMs‐based innovations from seven important aspects using established frameworks.

Three recommendations that could potentially support future studies to develop LLMs‐based innovations that are practical and ethical to implement in authentic educational contexts.

Implications for practice and/or policy

Updating existing innovations with state‐of‐the‐art models may further reduce the amount of manual effort required for adapting existing models to different educational tasks.

The reporting standards of empirical research that aims to develop educational technologies using large language models need to be improved.

Adopting a human‐centred approach throughout the developmental process could contribute to resolving the practical and ethical challenges of large language models in education.

Topics

No keywords indexed for this article. Browse by subject →

References

89

[1]

10.1007/978-3-030-78270-2_4

[2]

10.1109/issc55427.2022.9826194

[3]

10.1007/978-3-030-86618-1_2

[4]

Bang Y. Cahyawijaya S. Lee N. Dai W. Su D. Wilie B. Lovenia H. Ji Z. Yu T. Chung W. Do Q. V. Xu Y. &Fung P.(2023).A multitask multilingual multimodal evaluation of chatGPT on reasoning hallucination and interactivity.arXiv preprint arXiv:2302.04023. 10.18653/v1/2023.ijcnlp-main.45

[5]

10.14507/epaa.v8n51.2000

[6]

10.1007/s12528-021-09283-1

[7]

10.1007/s11159-019-09772-7

[8]

10.1145/3531146.3534642

[9]

Brown T. "Language models are few‐shot learners" Advances in Neural Information Processing Systems (2020)

[10]

10.21449/ijate.1124382

[11]

Caines A. (2023)

[12]

10.1007/978-3-030-52237-7_5

[13]

10.1007/s11218-022-09686-7

[14]

10.1016/j.caeai.2021.100027

[15]

10.1007/978-3-031-11647-6_33

[16]

Chechitelli A.(2023).AI writing detection update from turnitin's chief product officer.https://www.turnitin.com/blog/ai‐writing‐detection‐update‐from‐turnitins‐chief‐product‐officer

[17]

Condor A. (2021)

[18]

Defence Science and Technology Group. (2021).Technology readiness levels definitions and descriptions.https://www.dst.defence.gov.au/sites/default/files/basic_pages/documents/TRL%20Explanations_1.pdf

[19]

Devlin J. Chang M.‐W. Lee K. &Toutanova K.(2018).Bert: Pre‐training of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805.

[20]

Doewes A. (2021)

[21]

10.1007/bf01189290

[22]

10.1073/pnas.2123433119

[23]

10.1007/bf02299597

[24]

10.18608/jla.2016.31.2

[25]

Fonseca S. C. (2020)

[26]

10.1016/j.iheduc.2015.10.002

[27]

10.1016/j.chb.2022.107304

[28]

10.1109/tlt.2021.3123266

[29]

10.18653/v1/2020.bea-1.14

[30]

10.2196/45312

[31]

10.4324/9780429329067

[32]

10.1007/978-3-030-90677-1_13

[33]

ChatGPT for good? On opportunities and challenges of large language models for education

Enkelejda Kasneci, Kathrin Sessler, Stefan Küchemann et al.

Learning and Individual Differences 10.1016/j.lindif.2023.102274

[34]

10.1016/j.caeai.2022.100074

[35]

10.1109/incet54531.2022.9824483

[36]

10.1007/s40593-019-00186-y

[37]

Leiker D. (2023)

[38]

10.1007/s40593-020-00235-x

[39]

10.1016/j.caeai.2023.100140

[40]

Liang W. Yuksekgonul M. Mao Y. Wu E. &Zou J.(2023).GPT detectors are biased against non‐native english writers.arXiv preprint arXiv:2304.02819. 10.1016/j.patter.2023.100779

[41]

10.1016/j.compedu.2022.104461

[42]

Liu Z. He X. Liu L. Liu T. &Zhai X.(2023).Context matters: A strategy to pre‐train language model for science education.arXiv preprint arXiv:2301.12031. 10.2139/ssrn.4339205

[43]

Ma Q. (2023)

[44]

10.7717/peerj-cs.1010

[45]

10.1037/a0039400

[46]

10.1109/ichi54592.2022.00113

[47]

Min B. Ross H. Sulem E. Veyseh A. P. B. Nguyen T. H. Sainz O. Agirre E. Heinz I. &Roth D.(2021).Recent advances in natural language processing via large pre‐trained language models: A survey.arXiv preprint arXiv:2111.01243.

[48]

10.1038/s42256-019-0114-4

[49]

Moore S. (2022)

[50]

10.1186/s12874-018-0611-x

Showing 50 of 89 references

Cited By

555

Investigating university students’ perceptions of plagiarism in the use of ChatGPT: an explanatory mixed-method study

Bao Vu Huynh Phan, Hao Anh Tong · 2026

Education and Information Technolog...

The Effects of Artificial Intelligence-Based Dynamic Written Corrective Feedback on Second Language Writing and User Sentiment

K. James Hartshorn, Austin Pack · 2026

RELC Journal

Explainable machine learning for sustainable education: Predicting college students' reliance on generative artificial intelligence

Sunyu Tao, Hongfeng Zhang · 2026

Acta Psychologica

Gaps in large language model awareness, usage, and perceptions in the United States: Evidence from a nationally representative longitudinal survey

Marco Angrisani, Maria Casanova · 2026

PNAS Nexus

Agentic AI in Healthcare and Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-Based Agents

Shubham Vatsal, Harsh Dubey · 2026

IEEE Access

Towards responsible AI in education: A Delphi-AHP-based framework for evaluating educational large language models

Pingrong Lin, Qin Deng · 2026

Computers and Education: Artificial...

Generative AI and Privacy-Preserving Big Data Analytic in Cloud Environments with AI Agents

Akanksha Shukla, Dr. Rohit Kumar · 2025

International Journal of Latest Tec...

Industrial applications of large language models

Mubashar Raza, Zarmina Jahangir · 2025

Scientific Reports

Generative AI in higher education: A global perspective of institutional adoption policies and guidelines

Yueqiao Jin, Lixiang Yan · 2025

Computers and Education: Artificial...

Does ChatGPT enhance student learning? A systematic review and meta-analysis of experimental studies

Ruiqi Deng, Maoli Jiang · 2025

Computers & Education

Data-driven stock forecasting models based on neural networks: A review

Wuzhida Bao, Yuting Cao · 2025

Information Fusion

RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

Shreyas Chaudhari, Pranjal Aggarwal · 2025

ACM Computing Surveys

Investigating the higher education institutions’ guidelines and policies regarding the use of generative AI in teaching, learning, research, and administration

Yunjo An, Ji Hyun Yu · 2025

International Journal of Educationa...

The critical role of trust in adopting AI-powered educational technology for learning: An instrument for measuring student perceptions

Tanya Nazaretsky, Paola Mejia-Domenzain · 2025

Computers and Education: Artificial...

Can ChatGPT enhance business student creativity? Evidence from a randomised controlled trial

Rosemary Fisher, Taylor Gogan · 2025

Studies in Higher Education

The effects of generative AI agents and scaffolding on enhancing students’ comprehension of visual learning analytics

Lixiang Yan, Roberto Martinez-Maldonado · 2025

Computers & Education

Fine-tuning LLMs for psychological and educational assessments: Tutorials for open-source and closed-source LLMs

Philseok Lee, Zihao Jia · 2025

International Journal of Testing

Future Skills in the GenAI Era: A Labor Market Classification System Using Kolmogorov–Arnold Networks and Explainable AI

Dimitrios Christos Kavargyris, Konstantinos Georgiou · 2025

Algorithms

Generative AI, IoT, and blockchain in healthcare: application, issues, and solutions

Tehseen Mazhar, Sunawar khan · 2025

Discover Internet of Things

Opportunities, challenges and school strategies for integrating generative AI in education

Davy Tsz Kit Ng, Eagle Kai Chi Chan · 2025

Computers and Education: Artificial...

Metrics

555

Citations

89

References

Details

Published: Aug 06, 2023
Vol/Issue: 55(1)
Pages: 90-112
License: View

Authors

L

Lixiang Yan