Dermatology “AI Babylon”: Cross-Language Evaluation of AI-Crafted Dermatology Descriptions

Emmanouil Karampinis; Christina-Marina Zoumpourli; Christina Kontogianni; Theofanis Arkoumanis; Dimitra Koumaki; Dimitrios Mantzaris; Konstantinos Filippakis; Maria-Myrto Papadopoulou; Melpomeni Theofili; Nkechi Anne Enechukwu; Nomtondo Amina Ouédraogo; Alexandros Katoulis; Efterpi Zafiriou; Dimitrios Sgouros

doi:10.3390/medicina62010227

journal article Open Access Jan 22, 2026

Dermatology “AI Babylon”: Cross-Language Evaluation of AI-Crafted Dermatology Descriptions

Emmanouil Karampinis

Christina-Marina Zoumpourli Christina Kontogianni Theofanis Arkoumanis Dimitra Koumaki

Dimitrios Mantzaris Konstantinos Filippakis Maria-Myrto Papadopoulou

Melpomeni Theofili

Nkechi Anne Enechukwu

Nomtondo Amina Ouédraogo

Alexandros Katoulis Efterpi Zafiriou

Dimitrios Sgouros

Medicina Vol. 62 No. 1 pp. 227 · MDPI AG

View at Publisher Save 10.3390/medicina62010227

Abstract

Background and Objectives: Dermatology relies on a complex terminology encompassing lesion types, distribution patterns, colors, and specialized sites such as hair and nails, while dermoscopy adds an additional descriptive framework, making interpretation subjective and challenging. Our study aims to evaluate the ability of a chatbot (Gemini 2) to generate dermatology descriptions across multiple languages and image types, and to assess the influence of prompt language on readability, completeness, and terminology consistency. Our research is based on the concept that non-English prompts are not mere translations of the English prompts but are independently generated texts that reflect medical and dermatological knowledge learned from non-English material used in the chatbot’s training. Materials and Methods: Five macroscopic and five dermoscopic images of common skin lesions were used. Images were uploaded to Gemini 2 with language-specific prompts requesting short paragraphs describing visible features and possible diagnoses. A total of 2400 outputs were analyzed for readability using LIX score and CLEAR (comprehensiveness, accuracy, evidence-based content, appropriateness, and relevance) assessment, while terminology consistency was evaluated via SNOMED CT mapping across English, French, German, and Greek outputs. Results: English and French descriptions were found to be harder to read and more sophisticated, while SNOMED CT mapping revealed the largest terminology mismatch in German and the smallest in French. English texts and macroscopic images achieved the highest accuracy, completeness, and readability based on CLEAR assessment, whereas dermoscopic images and non-English texts presented greater challenges. Conclusions: Overall, partial terminology inconsistencies and cross-lingual variations highlighted that the language of the prompt plays a critical role in shaping AI-generated dermatology descriptions.

Topics

No keywords indexed for this article. Browse by subject →

References

55

[1]

Generative Artificial Intelligence Use in Healthcare: Opportunities for Clinical Excellence and Administrative Efficiency

Soumitra S. Bhuyan, Vidyoth Sateesh, Naya Mukul et al.

Journal of Medical Systems 2025 10.1007/s10916-024-02136-1

[2]

Large Language Models in Healthcare and Medical Domain: A Review

Zabir Al Nazi, Wei Peng

Informatics 10.3390/informatics11030057

[3]

Vartiainen "How Text-to-Image Generative AI Is Transforming Mediated Action" IEEE Comput. Graph. Appl. (2024) 10.1109/mcg.2024.3355808

[4]

Boit "A Prompt Engineering Framework for Large Language Model–Based Mental Health Chatbots: Conceptual Framework" JMIR Ment. Health (2025) 10.2196/75078

[5]

Kalyan "A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4" Nat. Lang. Process. J. (2024) 10.1016/j.nlp.2023.100048

[6]

Karampinis "Use of a Large Language Model as a Dermatology Case Narrator: Exploring the Dynamics of a Chatbot as an Educational Tool in Dermatology" JMIR Dermatol. (2025) 10.2196/72058

[7]

Rahman, M.d.M., and Watanobe, Y. (2023). ChatGPT for Education and Research: Opportunities, Threats, and Strategies. Appl. Sci., 13. 10.20944/preprints202303.0473.v1

[8]

Tengler "Exploring the Difference and Quality of AI-Generated versus Human-Written Texts" Discov. Educ. (2025) 10.1007/s44217-025-00529-z

[9]

Hakam "Human-Written vs AI-Generated Texts in Orthopedic Academic Literature: Comparative Qualitative Analysis" JMIR Form. Res. (2024) 10.2196/52164

[10]

Kar "How Sensitive Are the Free AI-Detector Tools in Detecting AI-Generated Texts? A Comparison of Popular AI-Detector Tools" Indian J. Psychol. Med. (2025) 10.1177/02537176241247934

[11]

Herbold "A Large-Scale Comparison of Human-Written versus ChatGPT-Generated Essays" Sci. Rep. (2023) 10.1038/s41598-023-45644-9

[12]

Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., and Wu, Y. (2023). How Close Is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. arXiv.

[13]

Zhou, J., Zhang, Y., Luo, Q., Parker, A.G., and De Choudhury, M. (2023, January 23–28). Synthetic Lies: Understanding AI-Generated Misinformation and Evaluating Algorithmic and Human Solutions. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany. 10.1145/3544548.3581318

[14]

Georgiou, G.P. (2025). Differentiating Between Human-Written and AI-Generated Texts Using Automatically Extracted Linguistic Features. Information, 16. 10.3390/info16110979

[15]

"Use of Artificial Intelligence in Planning Postoperative Nursing Care in Laparoscopic Cholecystectomy Patients: Comparison of ChatGPT and Student Practice" Nurse Educ. Pract. (2025) 10.1016/j.nepr.2025.104515

[16]

Garcia "Artificial Intelligence–Generated Draft Replies to Patient Inbox Messages" JAMA Netw. Open (2024) 10.1001/jamanetworkopen.2024.3201

[17]

Xie "Evaluation of the Artificial Intelligence Chatbot on Breast Reconstruction and Its Efficacy in Surgical Research: A Case Study" Aesthetic Plast. Surg. (2023) 10.1007/s00266-023-03443-7

[18]

Liang, C.X., Tian, P., Yin, C.H., Yua, Y., An-Hou, W., Ming, L., Song, X., Wang, T., Bi, Z., and Liu, M. (2025). A Comprehensive Survey and Guide to Multimodal Large Language Models in Vision-Language Tasks. arXiv.

[19]

Karampinis, E., Toli, O., Georgopoulou, K.-E., Kampra, E., Spyridonidou, C., Roussaki Schulze, A.-V., and Zafiriou, E. (2024). Can Artificial Intelligence “Hold” a Dermoscope?—The Evaluation of an Artificial Intelligence Chatbot to Translate the Dermoscopic Language. Diagnostics, 14. 10.3390/diagnostics14111165

[20]

Zhang "The Impact of Chatbots Based on Large Language Models on Second Language Vocabulary Acquisition" Heliyon (2024) 10.1016/j.heliyon.2024.e25370

[21]

Terzis "Evaluation of GPT-4o for Multilingual Translation of Radiology Reports across Imaging Modalities" Eur. J. Radiol. (2025) 10.1016/j.ejrad.2025.112341

[22]

Jaradat "ChatGPT Translation vs. Human Translation: An Examination of a Literary Text" Cogent Soc. Sci. (2025)

[23]

Martínez, G., Conde, J., Reviriego, P., Merino-Gómez, E., Hernández, J.A., and Lombardi, F. (2023). How Many Words Does ChatGPT Know? The Answer Is ChatWords. arXiv.

[24]

Harigai "Response Accuracy of GPT-4 across Languages: Insights from an Expert-Level Diagnostic Radiology Examination in Japan" Jpn. J. Radiol. (2025) 10.1007/s11604-024-01673-6

[25]

Zheng "Development and Evaluation of a Large Language Model of Ophthalmology in Chinese" Br. J. Ophthalmol. (2024) 10.1136/bjo-2023-324526

[26]

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment

Aidan Gilson, Conrad W Safranek, Thomas Huang et al.

JMIR Medical Education 2023 10.2196/45312

[27]

Yao "Performance of Large Language Models in the Non-English Context: Qualitative Study of Models Trained on Different Languages in Chinese Medical Examinations" JMIR Med. Inform. (2025) 10.2196/69485

[28]

Kelloniemi "AI Did Not Pass Finnish Plastic Surgery Written Board Examination" J. Plast. Reconstr. Aesthetic Surg. (2023) 10.1016/j.bjps.2023.10.059

[29]

Wu "Large Language Models Leverage External Knowledge to Extend Clinical Insight beyond Language Boundaries" J. Am. Med. Inform. Assoc. (2024) 10.1093/jamia/ocae079

[30]

Toyama "Performance Evaluation of ChatGPT, GPT-4, and Bard on the Official Board Examination of the Japan Radiology Society" Jpn. J. Radiol. (2024) 10.1007/s11604-023-01491-2

[31]

Seghier "ChatGPT: Not All Languages Are Equal" Nature (2023) 10.1038/d41586-023-00680-3

[32]

Sallam, M., Al-Mahzoum, K., Alshuaib, O., Alhajri, H., Alotaibi, F., Alkhurainej, D., Al-Balwah, M.Y., Barakat, M., and Egger, J. (2024). Language Discrepancies in the Performance of Generative Artificial Intelligence Models: An Examination of Infectious Disease Queries in English and Arabic. BMC Infect. Dis., 24. 10.1186/s12879-024-09725-y

[33]

Samaan "ChatGPT’s Ability to Comprehend and Answer Cirrhosis Related Questions in Arabic" Arab. J. Gastroenterol. (2023) 10.1016/j.ajg.2023.08.001

[34]

Menezes "The Potential of Generative Pre-Trained Transformer 4 (GPT-4) to Analyse Medical Notes in Three Different Languages: A Retrospective Model-Evaluation Study" Lancet Digit. Health (2025) 10.1016/s2589-7500(24)00246-2

[35]

Cheng "Artificial Intelligence Chatbots and Their Responses to Most Searched Spanish Cancer Questions" Cancer Med. (2025) 10.1002/cam4.71364

[36]

Gimeno "Completeness and Readability of GPT-4-Generated Multilingual Discharge Instructions in the Pediatric Emergency Department" JAMIA Open (2024) 10.1093/jamiaopen/ooae050

[37]

Mootz "Accuracy of Spanish and English-Generated ChatGPT Responses to Commonly Asked Patient Questions about Labor Epidurals: A Survey-Based Study among Bilingual Obstetric Anesthesia Experts" Int. J. Obstet. Anesth. (2025) 10.1016/j.ijoa.2024.104290

[38]

Pugliese, N., Polverini, D., Lombardi, R., Pennisi, G., Ravaioli, F., Armandi, A., Buzzetti, E., Dalbeni, A., Liguori, A., and Mantovani, A. (2024). Evaluation of ChatGPT as a Counselling Tool for Italian-Speaking MASLD Patients: Assessment of Accuracy, Completeness and Comprehensibility. J. Pers. Med., 14. 10.3390/jpm14060568

[39]

Mikhail "Performance of ChatGPT in French Language Analysis of Multimodal Retinal Cases" J. Fr. Ophtalmol. (2025) 10.1016/j.jfo.2024.104391

[40]

Menz "Generative AI Chatbots for Reliable Cancer Information: Evaluating Web-Search, Multilingual, and Reference Capabilities of Emerging Large Language Models" Eur. J. Cancer (2025) 10.1016/j.ejca.2025.115274

[41]

Singla "Accuracy, Clarity, and Comprehensiveness of ChatGPT Outputs for Commonly Asked Questions About Living Kidney Donation" Clin. Transplant. (2025) 10.1111/ctr.70303

[42]

Sallam "Chinese Generative AI Models (DeepSeek and Qwen) Rival ChatGPT-4 in Ophthalmology Queries with Excellent Performance in Arabic and English" Narra J. (2025) 10.52225/narra.v5i1.2371

[43]

Sallam, M., Al-Mahzoum, K., Almutawaa, R.A., Alhashash, J.A., Dashti, R.A., AlSafy, D.R., Almutairi, R.A., and Barakat, M. (2024). The Performance of OpenAI ChatGPT-4 and Google Gemini in Virology Multiple-Choice Questions: A Comparative Analysis of English and Arabic Responses. BMC Res. Notes, 17. 10.1186/s13104-024-06920-7

[44]

Sallam "A Preliminary Checklist (METRICS) to Standardize the Design and Reporting of Studies on Generative Artificial Intelligence–Based Models in Health Care Education and Practice: Development Study Involving a Literature Review" Interact. J. Med. Res. (2024) 10.2196/54704

[45]

Skrzypczak "Assessing the Readability of Online Health Information for Colonoscopy—Analysis of Articles in 22 European Languages" J. Cancer Educ. (2023) 10.1007/s13187-023-02344-2

[46]

Calafato "Literature in Contemporary Foreign Language School Textbooks in Russia: Content, Approaches, and Readability" Lang. Teach. Res. (2022) 10.1177/1362168820917909

[47]

Skrzypczak "The Importance of Readability: A Guide to Understanding Alopecia Areata through Multilingual Online Resources" Acta Derm. Venereol. (2024) 10.2340/actadv.v104.41046

[48]

Sebo, P., and de Lucia, S. (2024). Performance of Machine Translators in Translating French Medical Research Abstracts to English: A Comparative Study of DeepL, Google Translate, and CUBBITT. PLoS ONE, 19. 10.1371/journal.pone.0297183

[49]

Balk "Data Extraction from Machine-Translated versus Original Language Randomized Trial Reports: A Comparative Study" Syst. Rev. (2013) 10.1186/2046-4053-2-97

[50]

Das "Named Signs and Metaphoric Terminologies in Dermoscopy: A Compilation" Indian J. Dermatol. Venereol. Leprol. (2022) 10.25259/ijdvl_1047_20

Showing 50 of 55 references

Cited By

3

Validation of a Dermatology-Focused Multimodal Image-and-Data Assistant in Diagnosis and Management of Common Dermatologic Conditions

Joshua Mijares, Emma J. Bisch · 2026

Medicina

Metrics

3

Citations

55

References

Details

Published: Jan 22, 2026
Vol/Issue: 62(1)
Pages: 227
License: View

Authors

E

Emmanouil Karampinis

Second Dermatology Department, School of Health Sciences, Aristotle University of Thessaloniki, 541 24 Thessaloniki, Greece; Department of Dermatology, Faculty of Medicine, School of Health Sciences, University General Hospital of Larissa, University of Thessaly, 411 10 Larissa, Greece

C

Christina-Marina Zoumpourli

1st Department of Dermatology and Venereology, “Andreas Sygros” Hospital, Medical School, National and Kapodistrian University of Athens, 161 21 Athens, Greece

C

Christina Kontogianni

Department of Internal Medicine, InnKlinikum Altötting, 84503 Altötting, Germany

T

Theofanis Arkoumanis

2nd Academic Department of General Surgery, Aretaieion Hospital, National and Kapodistrian University of Athens, 161 21 Athens, Greece

D

Dimitra Koumaki

Dermatology Department, University Hospital of Heraklion, 715 00 Heraklion, Greece

D

Dimitrios Mantzaris

Computational Intelligence and Health Informatics Lab, Nursing Department, University of Thessaly, 382 21 Larissa, Greece

K

Konstantinos Filippakis

Department of Internal Medicine, Chios General Hospital, 821 00 Chios, Greece

M

Maria-Myrto Papadopoulou

Department of Internal Medicine, General Hospital of Karditsa, 431 31 Karditsa, Greece

M

Melpomeni Theofili

2nd Department of Dermatology and Venereology, “Attikon” General University Hospital, Medical School, National and Kapodistrian University of Athens, 157 84 Athens, Greece

N

Nkechi Anne Enechukwu

Department of Dermatology, Nnamdi Azikiwe University Teaching Hospital, Nnewi 431101, Nigeria

N

Nomtondo Amina Ouédraogo

Dermatology Department, University Joseph Ki-Zerbo, Ouagadougou 03 BP 7021, Burkina Faso

A

Alexandros Katoulis

2nd Department of Dermatology and Venereology, “Attikon” General University Hospital, Medical School, National and Kapodistrian University of Athens, 157 84 Athens, Greece

E

Efterpi Zafiriou

Department of Dermatology, Faculty of Medicine, School of Health Sciences, University General Hospital of Larissa, University of Thessaly, 411 10 Larissa, Greece

D

Dimitrios Sgouros

2nd Department of Dermatology and Venereology, “Attikon” General University Hospital, Medical School, National and Kapodistrian University of Athens, 157 84 Athens, Greece

Cite This Article

Emmanouil Karampinis, Christina-Marina Zoumpourli, Christina Kontogianni, et al. (2026). Dermatology “AI Babylon”: Cross-Language Evaluation of AI-Crafted Dermatology Descriptions. Medicina, 62(1), 227. https://doi.org/10.3390/medicina62010227

Dermatology “AI Babylon”: Cross-Language Evaluation of AI-Crafted Dermatology Descriptions

You May Also Like