Abstract
Background
Prompt engineering, focusing on crafting effective prompts to large language models (LLMs), has garnered attention for its capabilities at harnessing the potential of LLMs. This is even more crucial in the medical domain due to its specialized terminology and language technicity. Clinical natural language processing applications must navigate complex language and ensure privacy compliance. Prompt engineering offers a novel approach by designing tailored prompts to guide models in exploiting clinically relevant information from complex medical texts. Despite its promise, the efficacy of prompt engineering in the medical domain remains to be fully explored.


Objective
The aim of the study is to review research efforts and technical approaches in prompt engineering for medical applications as well as provide an overview of opportunities and challenges for clinical practice.


Methods
Databases indexing the fields of medicine, computer science, and medical informatics were queried in order to identify relevant published papers. Since prompt engineering is an emerging field, preprint databases were also considered. Multiple data were extracted, such as the prompt paradigm, the involved LLMs, the languages of the study, the domain of the topic, the baselines, and several learning, design, and architecture strategies specific to prompt engineering. We include studies that apply prompt engineering–based methods to the medical domain, published between 2022 and 2024, and covering multiple prompt paradigms such as prompt learning (PL), prompt tuning (PT), and prompt design (PD).


Results
We included 114 recent prompt engineering studies. Among the 3 prompt paradigms, we have observed that PD is the most prevalent (78 papers). In 12 papers, PD, PL, and PT terms were used interchangeably. While ChatGPT is the most commonly used LLM, we have identified 7 studies using this LLM on a sensitive clinical data set. Chain-of-thought, present in 17 studies, emerges as the most frequent PD technique. While PL and PT papers typically provide a baseline for evaluating prompt-based approaches, 61% (48/78) of the PD studies do not report any nonprompt-related baseline. Finally, we individually examine each of the key prompt engineering–specific information reported across papers and find that many studies neglect to explicitly mention them, posing a challenge for advancing prompt engineering research.


Conclusions
In addition to reporting on trends and the scientific landscape of prompt engineering, we provide reporting guidelines for future studies to help advance research in the medical field. We also disclose tables and figures summarizing medical prompt engineering papers available and hope that future contributions will leverage these existing works to better advance the field.
Topics

No keywords indexed for this article. Browse by subject →

References
127
[1]
BrownTMannBRyderNSubbiahMKaplanJDDhariwalPNeelakantanAShyamPSastryGAskellALanguage models are few-shot learners2020Advances in Neural Information Processing SystemsDecember 6, 2020Virtual18771901
[2]
KojimaTGuSSReidMMatsuoYIwasawaYLarge language models are zero-shot reasoners2022Advances in Neural Information Processing SystemsNovember 28, 2022New Orleans2219922213
[3]
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Pengfei Liu, Weizhe Yuan, Jinlan Fu et al.

ACM Computing Surveys 10.1145/3560815
[7]
Large language models in medicine

Arun James Thirunavukarasu, Darren Shu Jeng Ting, Kabilan Elangovan et al.

Nature Medicine 10.1038/s41591-023-02448-8
[10]
FriesJWeberLSeelamNAltayGDattaDGardaSKangSSuRKusaWCahyawijayaSBigBIO: a framework for data-centric biomedical natural language processing2022Advances in Neural Information Processing SystemsNovember 28, 2022New Orleans2579225806
[12]
LiLNingWProBioRE: a framework for biomedical causal relation extraction based on dual-head prompt and prototypical network20232023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)December 5, 2023Istanbul, Turkiye20712074 10.1109/bibm58861.2023.10385919
[15]
AteiaSKruschwitzUIs ChatGPT a biomedical expert?2023Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2023)September 18-21, 2023Thessaloniki, Greece7390
[20]
Large language models encode clinical knowledge

Karan Singhal, Shekoofeh Azizi, Tao Tu et al.

Nature 10.1038/s41586-023-06291-2
[26]
BioGPT: generative pre-trained transformer for biomedical text generation and mining

Renqian Luo, Liai Sun, Yingce Xia et al.

Briefings in Bioinformatics 10.1093/bib/bbac409
[30]
CasolaSLabrunaTLavelliAMagniniBTesting ChatGPT for stability and reasoning: a case study using Italian medical specialty tests2023Proceedings of the 9th Italian Conference on Computational LinguisticsNovember 30-Decemeber 2, 2023Venice, Italy
[36]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models

Tiffany H. Kung, Morgan Cheatham, Arielle Medenilla et al.

PLOS Digital Health 10.1371/journal.pdig.0000198
[46]
WangXYangQLingX at ROCLING 2023 MultiNER-health task: intelligent capture of Chinese medical named entities by LLMs2023Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)October 20-21, 2023Taipei City, Taiwan

Showing 50 of 127 references

Metrics
96
Citations
127
References
Details
Published
Sep 10, 2024
Vol/Issue
26
Pages
e60501
Cite This Article
Jamil Zaghir, Marco Naguib, Mina Bjelogrlic, et al. (2024). Prompt Engineering Paradigms for Medical Applications: Scoping Review. Journal of Medical Internet Research, 26, e60501. https://doi.org/10.2196/60501