journal article Open Access Dec 15, 2023

From Turing to Transformers: A Comprehensive Review and Tutorial on the Evolution and Applications of Generative Transformer Models

Sci Vol. 5 No. 4 pp. 46 · MDPI AG
View at Publisher Save 10.3390/sci5040046
Abstract
In recent years, generative transformers have become increasingly prevalent in the field of artificial intelligence, especially within the scope of natural language processing. This paper provides a comprehensive overview of these models, beginning with the foundational theories introduced by Alan Turing and extending to contemporary generative transformer architectures. The manuscript serves as a review, historical account, and tutorial, aiming to offer a thorough understanding of the models’ importance, underlying principles, and wide-ranging applications. The tutorial section includes a practical guide for constructing a basic generative transformer model. Additionally, the paper addresses the challenges, ethical implications, and future directions in the study of generative models.
Topics

No keywords indexed for this article. Browse by subject →

References
95
[1]
Statistical Inference for Probabilistic Functions of Finite State Markov Chains

Leonard E. Baum, Ted Petrie

The Annals of Mathematical Statistics 1966 10.1214/aoms/1177699147
[2]
Baum, L.E., and Eagon, J.A. (2023, November 10). An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes and to a Model for Ecology. Available online: https://community.ams.org/journals/bull/1967-73-03/S0002-9904-1967-11751-8/S0002-9904-1967-11751-8.pdf.
[3]
A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains

Leonard E. Baum, Ted Petrie, George Soules et al.

The Annals of Mathematical Statistics 1970 10.1214/aoms/1177697196
[4]
A tutorial on hidden Markov models and selected applications in speech recognition

L.R. Rabiner

Proceedings of the IEEE 1989 10.1109/5.18626
[5]
Neural networks and physical systems with emergent collective computational abilities.

J J Hopfield

Proceedings of the National Academy of Sciences 1982 10.1073/pnas.79.8.2554
[6]
Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber

Neural Computation 1997 10.1162/neco.1997.9.8.1735
[7]
An Introduction to Variational Autoencoders

Diederik P. Kingma, Max Welling

Foundations and Trends® in Machine Learning 2019 10.1561/2200000056
[8]
Kingma, D.P., and Welling, M. (2013). Auto-encoding variational bayes. arXiv.
[9]
Creswell "Generative adversarial networks: An overview" IEEE Signal Process. Mag. (2018) 10.1109/msp.2017.2765202
[10]
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. Adv. Neural Inf. Process. Syst., 27.
[11]
Antoniou, A., Storkey, A., and Edwards, H. (2017). Data augmentation generative adversarial networks. arXiv.
[12]
A survey on Image Data Augmentation for Deep Learning

Connor Shorten, Taghi M. Khoshgoftaar

Journal of Big Data 2019 10.1186/s40537-019-0197-0
[13]
Deecke, L., Vandermeulen, R., Ruff, L., Mandt, S., and Kloft, M. (2018, January 10–14). Image anomaly detection with generative adversarial networks. Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2018, Dublin, Ireland. Proceedings, Part I 18. 10.1007/978-3-030-10925-7_1
[14]
Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss

Qingsong Yang, Pingkun Yan, Yanbo Zhang et al.

IEEE Transactions on Medical Imaging 2018 10.1109/tmi.2018.2827462
[15]
Zhang "Image de-raining using a conditional generative adversarial network" IEEE Trans. Circuits Syst. Video Technol. (2019) 10.1109/tcsvt.2019.2920407
[16]
Oord, A.V.D., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., and Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. arXiv.
[17]
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., and Sutskever, I. (2021, January 18–24). Zero-shot text-to-image generation. Proceedings of the International Conference on Machine Learning, PMLR, Virtual.
[18]
Dhariwal, P., Jun, H., Payne, C., Kim, J.W., Radford, A., and Sutskever, I. (2020). Jukebox: A generative model for music. arXiv.
[19]
Understanding and Creating Art with AI: Review and Outlook

Eva Cetinic, James She

ACM Transactions on Multimedia Computing, Communic... 2022 10.1145/3475799
[20]
Bian "Generative chemistry: Drug discovery with deep learning generative models" J. Mol. Model. (2021) 10.1007/s00894-021-04674-8
[21]
Stephenson "Survey of machine learning techniques in drug discovery" Curr. Drug Metab. (2019) 10.2174/1389200219666180820112457
[22]
Martin "Scangan360: A generative model of realistic scanpaths for 360 images" IEEE Trans. Vis. Comput. Graph. (2022) 10.1109/tvcg.2022.3150502
[23]
Achlioptas, P., Diamanti, O., Mitliagkas, I., and Guibas, L. (2018, January 10–15). Learning representations and generative models for 3D point clouds. Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden.
[24]
Khoo, E.T., Lee, S.P., Cheok, A.D., Kodagoda, S., Zhou, Y., and Toh, G.S. (2006, January 22–27). Age invaders: Social and physical inter-generational family entertainment. Proceedings of the CHI’06 Extended Abstracts on Human Factors in Computing Systems, Montreal, QU, Canada. 10.1145/1125451.1125503
[25]
Radford, A., Metz, L., and Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv.
[26]
Way, G.P., and Greene, C.S. (2018, January 3–7). Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Proceedings of the Pacific Symposium on Biocomputing 2018, Hawaii, HI, USA. 10.1142/9789813235533_0008
[27]
Universal features of price formation in financial markets: perspectives from deep learning

Justin Sirignano, Rama Cont

Quantitative Finance 2019 10.1080/14697688.2019.1622295
[28]
Deep learning and process understanding for data-driven Earth system science

Markus Reichstein, Gustau Camps-Valls, Bjorn Stevens et al.

Nature 2019 10.1038/s41586-019-0912-1
[29]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. Adv. Neural Inf. Process. Syst., 30, Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[30]
Pascanu, R., Mikolov, T., and Bengio, Y. (2013, January 17–19). On the difficulty of training recurrent neural networks. Proceedings of the International Conference on Machine Learning, PMLR, Atlanta, GA, USA.
[31]
Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv.
[32]
A Primer in BERTology: What We Know About How BERT Works

Anna Rogers, Olga Kovaleva, Anna Rumshisky

Transactions of the Association for Computational... 2021 10.1162/tacl_a_00349
[33]
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., and Lundberg, S. (2023). Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv.
[34]
Jiao, W., Wang, W., Huang, J.T., Wang, X., and Tu, Z. (2023). Is ChatGPT a good translator? A preliminary study. arXiv.
[35]
Gao, M., Ruan, J., Sun, R., Yin, X., Yang, S., and Wan, X. (2023). Human-like summarization evaluation with chatgpt. arXiv.
[36]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv.
[37]
Raghu, M., Unterthiner, T., Kornblith, S., Zhang, C., and Dosovitskiy, A. (2021). Do Vision Transformers See Like Convolutional Neural Networks?. arXiv.
[38]
Paul, S., and Chen, P.Y. (2022, January 7–14). Vision transformers are robust learners. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.
[39]
Nikolić, G.S., Dimitrijević, B.R., Nikolić, T.R., and Stojcev, M.K. (2022, January 16–18). A survey of three types of processing units: CPU, GPU and TPU. Proceedings of the 2022 57th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST), Ohrid, Macedonia. 10.1109/icest55168.2022.9828625
[40]
Gozalo-Brizuela, R., and Garrido-Merchan, E.C. (2023). ChatGPT is not all you need. A State of the Art Review of large Generative AI models. arXiv.
[41]
A survey of transformers

Tianyang Lin, Yuxin Wang, Xiangyang Liu et al.

AI Open 10.1016/j.aiopen.2022.10.001
[42]
Kalyan, K.S., Rajasekharan, A., and Sangeetha, S. (2021). Ammus: A survey of transformer-based pretrained models in natural language processing. arXiv. 10.1016/j.jbi.2021.103982
[43]
Acheampong "Transformer models for text-based emotion detection: A review of BERT-based approaches" Artif. Intell. Rev. (2021) 10.1007/s10462-021-09958-2
[44]
A Survey on Vision Transformer

Kai Han, Yunhe Wang, Hanting Chen et al.

IEEE Transactions on Pattern Analysis and Machine... 2022 10.1109/tpami.2022.3152247
[45]
Transformers in Vision: A Survey

Salman Khan, Muzammal Naseer, Munawar Hayat et al.

ACM Computing Surveys 2022 10.1145/3505244
[46]
Shamshad "Transformers in medical imaging: A survey" Med. Image Anal. (2023) 10.1016/j.media.2023.102802
[47]
Transformers in Remote Sensing: A Survey

Abdulaziz Amer Aleissaee, Amandeep Kumar, Rao Muhammad Anwer et al.

Remote Sensing 10.3390/rs15071860
[48]
Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., and Sun, L. (2022). Transformers in time series: A survey. arXiv. 10.24963/ijcai.2023/759
[49]
Ahmed "Transformers in time-series analysis: A tutorial" Circuits Syst. Signal Process. (2023) 10.1007/s00034-023-02454-8
[50]
Turing "On computable numbers, with an application to the Entscheidungsproblem" J. Math (1936)

Showing 50 of 95 references

Metrics
54
Citations
95
References
Details
Published
Dec 15, 2023
Vol/Issue
5(4)
Pages
46
License
View
Funding
Research on quality Assurance and Evaluation of higher Education in Jiangsu Province Award: 2023JSETKT032
Cite This Article
Emma Yann Zhang, Adrian David Cheok, Zhigeng Pan, et al. (2023). From Turing to Transformers: A Comprehensive Review and Tutorial on the Evolution and Applications of Generative Transformer Models. Sci, 5(4), 46. https://doi.org/10.3390/sci5040046
Related

You May Also Like