journal article Open Access Aug 07, 2024

Natural language processing with transformers: a review

View at Publisher Save 10.7717/peerj-cs.2222
Abstract
Natural language processing (NLP) tasks can be addressed with several deep learning architectures, and many different approaches have proven to be efficient. This study aims to briefly summarize the use cases for NLP tasks along with the main architectures. This research presents transformer-based solutions for NLP tasks such as Bidirectional Encoder Representations from Transformers (BERT), and Generative Pre-Training (GPT) architectures. To achieve that, we conducted a step-by-step process in the review strategy: identify the recent studies that include Transformers, apply filters to extract the most consistent studies, identify and define inclusion and exclusion criteria, assess the strategy proposed in each study, and finally discuss the methods and architectures presented in the resulting articles. These steps facilitated the systematic summarization and comparative analysis of NLP applications based on Transformer architectures. The primary focus is the current state of the NLP domain, particularly regarding its applications, language models, and data set types. The results provide insights into the challenges encountered in this research domain.
Topics

No keywords indexed for this article. Browse by subject →

References
50
[1]
Acheampong "Transformer models for text-based emotion detection: a review of BERT-based approaches" Artificial Intelligence Review (2021) 10.1007/s10462-021-09958-2
[2]
Al-Yahya "Arabic fake news detection: comparative study of neural networks and transformer-based approaches" Complexity (2021) 10.1155/2021/5516945
[3]
Ayoub "Combat COVID-19 infodemic using explainable natural language processing models" Information Processing and Management (2021) 10.1016/j.ipm.2021.102569
[4]
MolGPT: Molecular Generation Using a Transformer-Decoder Model

Viraj Bagal, Rishal Aggarwal, P. K. Vinod et al.

Journal of Chemical Information and Modeling 2022 10.1021/acs.jcim.1c00600
[5]
Bakker "Evaluating the accuracy of scite, a smart citation index" Hypothesis: Research Journal for Health Information Professionals (2023) 10.18060/26528
[6]
Balagopalan "To BERT or not to BERT: comparing speech and language-based approaches for Alzheimer’s disease detection" (2020) 10.21437/interspeech.2020
[7]
Chang "Taming pretrained transformers for extreme multi-label text classification" (2020) 10.1145/3394486.3403368
[8]
Colón-Ruiz "Comparing deep learning architectures for sentiment analysis on drug reviews" Journal of Biomedical Informatics (2020) 10.1016/j.jbi.2020.103539
[9]
Devlin "BERT: pre-training of deep bidirectional transformers for language understanding" (2018) 10.48550/arxiv.1810.04805
[10]
Dhar "Evaluation of the benchmark datasets for testing the efficacy of deep convolutional neural networks" Visual Informatics (2021) 10.1016/j.visinf.2021.10.001
[11]
Fan "Adverse drug event detection and extraction from open data: a deep learning approach" Information Processing and Management (2020) 10.1016/j.ipm.2019.102131
[12]
Farahani "ParsBERT: transformer-based model for persian language understanding" Neural Processing Letters (2021) 10.1007/s11063-021-10528-4
[13]
Fu "An introduction of deep learning based word representation applied to natural language processing" (2019) 10.1109/mlbdbi48998.2019.00025
[14]
Gao "Limitations of transformers on clinical text classification" IEEE Journal of Biomedical and Health Informatics (2021) 10.1109/jbhi.2021.3062322
[15]
Gavrilov "Self-attentive model for headline generation" (2019) 10.1007/978-3-030-15719-7_11
[16]
A Divide-and-Conquer Approach to the Summarization of Long Documents

Alexios Gidiotis, Grigorios Tsoumakas

IEEE/ACM Transactions on Audio, Speech, and Langua... 2020 10.1109/taslp.2020.3037401
[17]
Ham "ELSA: hardware-software co-design for efficient, lightweight self-attention mechanism in neural networks" (2021) 10.1109/isca52012.2021.00060
[18]
He "Molecular optimization by capturing chemist’s intuition using deep neural networks" Journal of Cheminformatics (2021) 10.1186/s13321-021-00497-0
[19]
Natural language processing: state of the art, current trends and challenges

Diksha Khurana, Aditya Koli, Kiran Khatter et al.

Multimedia Tools and Applications 2023 10.1007/s11042-022-13428-4
[20]
An introduction to Deep Learning in Natural Language Processing: Models, techniques, and tools

Ivano Lauriola, Alberto Lavelli, Fabio Aiolli

Neurocomputing 2022 10.1016/j.neucom.2021.05.103
[21]
Le "A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information" Briefings in Bioinformatics (2021) 10.1093/bib/bbab005
[22]
Lee "BioBERT: a pre-trained biomedical language representation model for biomedical text mining" Bioinformatics (2020) 10.1093/bioinformatics/btz682
[23]
Li "Bridging text and video: a universal multimodal transformer for audio-visual scene-aware dialog" IEEE/ACM Transactions on Audio, Speech, and Language Processing (2021) 10.1109/taslp.2021.3065823
[24]
A survey of transformers

Tianyang Lin, Yuxin Wang, Xiangyang Liu et al.

AI Open 2022 10.1016/j.aiopen.2022.10.001
[25]
Liu "Multi-task learning based pre-trained language model for code completion" (2020) 10.1145/3324884.3416591
[26]
Lukovnikov "Pretrained transformers for simple question answering over knowledge graphs" (2019) 10.1007/978-3-030-30793-6_27
[27]
Mastropaolo "Studying the usage of text-to-text transfer transformer to support code-related tasks" (2021) 10.1109/icse43902.2021.00041
[28]
Mozafari "A BERT-based transfer learning approach for hate speech detection in online social media" (2020) 10.1007/978-3-030-36687-2_77
[29]
Nguyen "Fast and accurate capitalization and punctuation for automatic speech recognition using transformer and chunk merging" (2019) 10.1109/o-cocosda46868.2019.9041202
[30]
Nicholson "scite: a smart citation index that displays the context of citations and classifies their intent using deep learning" Quantitative Science Studies (2021) 10.1162/qss_a_00146
[31]
Guidelines for conducting systematic mapping studies in software engineering: An update

Kai Petersen, Sairam Vakkalanka, Ludwik Kuzniarz

Information and Software Technology 2015 10.1016/j.infsof.2015.03.007
[32]
Potamias "A transformer-based approach to irony and sarcasm detection" Neural Computing and Applications (2020) 10.1007/s00521-020-05102-3
[33]
Radfar "End-to-end neural transformer based spoken language understanding" (2020) 10.21437/interspeech.2020
[34]
Radford "Improving language understanding by generative pre-training" (2018)
[35]
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Laila Rasmy, Yang Xiang, Ziqian Xie et al.

npj Digital Medicine 2021 10.1038/s41746-021-00455-y
[36]
A Primer in BERTology: What We Know About How BERT Works

Anna Rogers, Olga Kovaleva, Anna Rumshisky

Transactions of the Association for Computational... 2020 10.1162/tacl_a_00349
[37]
Rothe "Leveraging pre-trained checkpoints for sequence generation tasks" Transactions of the Association for Computational Linguistics (2020) 10.1162/tacl_a_00313
[38]
Sharma "Towards facilitating empathic conversations in online mental health support: a reinforcement learning approach (extended abstract)" (2022)
[39]
Sohn "MC-BERT4HATE: hate speech detection using multi-channel BERT for different languages and translations" (2019) 10.1109/icdmw.2019.00084
[40]
Souza "BERTimbau: pretrained BERT models for Brazilian portuguese" (2020) 10.1007/978-3-030-61377-8_28
[41]
Sung "Improving short answer grading using transformer-based pre-training" (2019) 10.1007/978-3-030-23204-7_39
[42]
Whang "An effective domain adaptive post-training method for BERT in response selection" (2020) 10.21437/interspeech.2020
[43]
Xie "Deep learning enabled semantic communication systems" IEEE Transactions on Signal Processing (2021) 10.1109/tsp.2021.3071210
[44]
Yang "Clinical concept extraction using transformers" Journal of the American Medical Informatics Association (2020) 10.1093/jamia/ocaa189
[45]
Yang "HTML: hierarchical transformer-based multi-task learning for volatility prediction" (2020) 10.1145/3366423.3380128
[46]
Yang "Beyond 512 tokens: siamese multi-depth transformer-based hierarchical encoder for long-form document matching" (2020) 10.1145/3340531.3411908
[47]
Yu "Improving bert-based text classification with auxiliary sentence and domain knowledge" IEEE Access (2019) 10.1109/access.2019.2953990
[48]
Zafrir "Q8BERT: quantized 8Bit BERT" (2019) 10.1109/emc2-nips53020.2019.00016
[49]
Zhang "Sentiment analysis for software engineering: how far can pre-trained transformer models go?" (2020) 10.1109/icsme46990.2020.00017
[50]
Zhang "Extracting comprehensive clinical information for breast cancer using deep learning methods" International Journal of Medical Informatics (2019) 10.1016/j.ijmedinf.2019.103985
Metrics
27
Citations
50
References
Details
Published
Aug 07, 2024
Vol/Issue
10
Pages
e2222
License
View
Cite This Article
Georgiana Tucudean, Marian Bucos, Bogdan Dragulescu, et al. (2024). Natural language processing with transformers: a review. PeerJ Computer Science, 10, e2222. https://doi.org/10.7717/peerj-cs.2222
Related

You May Also Like

Probabilistic programming in Python using PyMC3

John Salvatier, Thomas V. Wiecki · 2016

2,016 citations

Bracken: estimating species abundance in metagenomics data

Jennifer Lu, Florian P. Breitwieser · 2017

1,821 citations

SymPy: symbolic computing in Python

Aaron Meurer, Christopher P. Smith · 2017

1,289 citations