THE RISE OF LARGE LANGUAGE MODELS: A BEGINNER’S SURVEY

Authors

  • Gustavo de Aquino Mouzinho Author
  • Leandro Youiti Silva Okimoto Author
  • Leonardo Yuto Suzuki Camelo Author
  • Nádila da Silva de Azevedo Author
  • Hendrio Luis de Souza Bragança Author
  • Rubens de Andrade Fernandes Author
  • Fabricio Ribeiro Seppe Author
  • Raimundo Claúdio Souza Gomes Author
  • Fábio de Sousa Cardoso Author

DOI:

https://doi.org/10.56238/arev7n11-118

Keywords:

Large Language Models, Generative AI, Natural Language Processing, Deep Learning

Abstract

Large Language Models (LLMs) have rapidly reshaped Natural Language Processing by shifting the field from task-specific systems to general-purpose models capable of following instructions and handling diverse tasks with minimal adaptation. This beginner-oriented survey traces the rise of LLMs, outlining the historical path from statistical methods and early neural architectures to Transformer-based models and the emergence of in-context learning. We introduce the core ingredients that make LLMs work—pre-training on large corpora, optional fine-tuning and alignment, and decoding strategies for generation—emphasizing how scale and self-attention enable broad generalization. At its core, this paper offers a compact, newcomer-friendly narrative of the LLM era, distilling the minimum set of concepts and milestones needed to build intuition about how modern models are trained and used.

Downloads

Download data is not yet available.

References

Agarwal, V., Jin, Y., Chandra, M., De Choudhury, M., Kumar, S., & Sastry, N. (2024). MedHalu: Hallucinations in responses to healthcare queries by large language models. arXiv. https://doi.org/10.48550/arXiv.2409.19492

Arslan, M., Ghanem, H., Munawar, S., & Cruz, C. (2024). A survey on RAG with LLMs. Procedia Computer Science, 246, 3781–3790. https://doi.org/10.1016/j.procs.2024.09.178 DOI: https://doi.org/10.1016/j.procs.2024.09.178

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kern7, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., et al. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv. https://doi.org/10.48550/arXiv.2212.08073

Bahdanau, D. (2014). Neural machine translation by jointly learning to align and translate. arXiv. https://doi.org/10.48550/arXiv.1409.0473

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv. https://doi.org/10.48550/arXiv.2108.07258

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2022). On the opportunities and risks of foundation models. arXiv. https://doi.org/10.48550/arXiv.2108.07258

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020). Curran Associates.

Carroll, A. J., & Borycz, J. (2024). Integrating large language models and generative artificial intelligence tools into information literacy instruction. The Journal of Academic Librarianship, 50, Article 102899. https://doi.org/10.1016/j.acalib.2024.102899 DOI: https://doi.org/10.1016/j.acalib.2024.102899

Chen, Y.-C., Hsu, P.-C., Hsu, C.-J., & Shan Shiu, D. (2024). Enhancing function-calling capabilities in LLMs: Strategies for prompt formats, data integration, and multilingual translation. arXiv. https://doi.org/10.48550/arXiv.2412.01130 DOI: https://doi.org/10.18653/v1/2025.naacl-industry.9

Cheng, W., Sun, K., Zhang, X., & Wang, W. (2025). Security attacks on LLM-based code completion tools. arXiv. https://doi.org/10.48550/arXiv.2408.11006 DOI: https://doi.org/10.1609/aaai.v39i22.34537

Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2023). Deep reinforcement learning from human preferences. arXiv. https://doi.org/10.48550/arXiv.1706.03741

Dai, W., Li, J., Li, D., Tiong, A. M. H., Zhao, J., Wang, W., Li, B., Fung, P., & Hoi, S. (2023). InstructBLIP: Towards general-purpose vision-language models with instruction tuning. arXiv. https://doi.org/10.48550/arXiv.2305.06500

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT (Vol. 1, pp. 2). Association for Computational Linguistics.

Ding, J., Nguyen, H., & Chen, H. (2024). Evaluation of question-answering based text summarization using LLM (Invited Paper). Proceedings of the 2024 IEEE International Conference on Artificial Intelligence Testing (AITest) (pp. 142–149). IEEE. https://doi.org/10.1109/AITest62860.2024.00025 DOI: https://doi.org/10.1109/AITest62860.2024.00025

Dong, Y., Zhang, H., Li, C., Guo, S., Leung, V. C., & Hu, X. (2024). Fine-tuning and deploying large language models over edges: Issues and approaches. arXiv. https://doi.org/10.48550/arXiv.2408.10691

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. (2024). From local to global: A graph RAG approach to query-focused summarization. arXiv. https://doi.org/10.48550/arXiv.2404.16130

Franceschelli, G., & Musolesi, M. (2024). Creative beam search: LLM-as-a-judge for improving response generation. arXiv. https://doi.org/10.48550/arXiv.2405.00099

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2312.10997

Goyal, T., Li, J. J., & Durrett, G. (2023). News summarization and evaluation in the era of GPT-3. arXiv. https://doi.org/10.48550/arXiv.2209.12356

Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N. V., Wiest, O., & Zhang, X. (2024). Large language model based multi-agents: A survey of progress and challenges. arXiv. https://doi.org/10.48550/arXiv.2402.01680

Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv. https://doi.org/10.48550/arXiv.1503.02531

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Kirrane, S., Labra Gayo, J. E., Navigli, R., Neumaier, S., et al. (2021). Knowledge graphs. ACM Computing Surveys, 54(s/n), 1–37. https://doi.org/10.1145/3447772 DOI: https://doi.org/10.1145/3447772

Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2020). The curious case of neural text degeneration. arXiv. https://doi.org/10.48550/arXiv.1904.09751

Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv. https://doi.org/10.48550/arXiv.1801.06146 DOI: https://doi.org/10.18653/v1/P18-1031

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. arXiv. https://doi.org/10.48550/arXiv.2106.09685

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al. (2023). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv. https://doi.org/10.48550/arXiv.2311.05232

Ji, B., Duan, X., Zhang, Y., Wu, K., & Zhang, M. (2024). Zero-shot prompting for LLM-based machine translation using in-domain target sentences. IEEE/ACM Transactions on Audio, Speech, and Language Processing, s/n, 1–12. https://doi.org/10.1109/TASLP.2024.3519814 DOI: https://doi.org/10.1109/TASLP.2024.3519814

Johnson, L. E., & Rashad, S. (2024). An innovative system for real-time translation from American Sign Language (ASL) to spoken English using a large language model (LLM). Proceedings of the 2024 IEEE 15th Annual UEMCON (pp. 605–611). IEEE. https://doi.org/10.1109/UEMCON62879.2024.10754690 DOI: https://doi.org/10.1109/UEMCON62879.2024.10754690

Jung, S. J., Kim, H., & Jang, K. S. (2024). LLM based biological named entity recognition from scientific literature. Proceedings of the 2024 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 433–435). IEEE. https://doi.org/10.1109/BigComp60711.2024.00095 DOI: https://doi.org/10.1109/BigComp60711.2024.00095

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. arXiv. https://doi.org/10.48550/arXiv.2001.08361

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., et al. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv. https://doi.org/10.48550/arXiv.2005.11401

Li, B., Zhang, Y., Chen, L., Wang, J., Yang, J., & Liu, Z. (2023). Otter: A multi-modal model with in-context instruction tuning. arXiv. https://doi.org/10.48550/arXiv.2305.03726

Li, J., Li, D., Savarese, S., & Hoi, S. (2023). BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv. https://doi.org/10.48550/arXiv.2301.12597

Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. arXiv. https://doi.org/10.48550/arXiv.2101.00190

Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayan, D., Wu, Y., Kumar, A., et al. (2023). Holistic evaluation of language models. arXiv. https://doi.org/10.48550/arXiv.2211.09110

Lin, H. (2024). Designing domain-specific large language models: The critical role of fine-tuning in public opinion simulation. arXiv. https://doi.org/10.48550/arXiv.2409.19308

Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., Chowdhury, T., Li, Y., Cui, H., Zhang, X., et al. (2023). Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv. https://doi.org/10.48550/arXiv.2305.18703

Manerba, M. M., Stańczak, K., Guidotti, R., & Augenstein, I. (2023). Social bias probing: Fairness benchmarking for language models. arXiv. https://doi.org/10.48550/arXiv.2311.09090

Mariño, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A. R., & Costa-jussà, M. R. (2006). N-gram-based machine translation. Computational Linguistics, 32(4), 527–549. https://doi.org/10.1162/coli.2006.32.4.527 DOI: https://doi.org/10.1162/coli.2006.32.4.527

Mehta, H., Kumar Bharti, S., & Doshi, N. (2024). Comparative analysis of part of speech (POS) tagger for Gujarati language using deep learning and pre-trained LLM. Proceedings of the 2024 3rd International Conference for Innovation in Technology (INOCON) (pp. 1–3). IEEE. https://doi.org/10.1109/INOCON60754.2024.10511678 DOI: https://doi.org/10.1109/INOCON60754.2024.10511678

Meng, X., Yan, X., Zhang, K., Liu, D., Cui, X., Yang, Y., Zhang, M., Cao, C., Wang, J., Wang, X., et al. (2024). The application of large language models in medicine: A scoping review. iScience, 27, Article 109713. https://doi.org/10.1016/j.isci.2024.109713 DOI: https://doi.org/10.1016/j.isci.2024.109713

Miah, Md. S. U., Kabir, Md. M., Sarwar, T. B., Safran, M., Alfarhood, S., & Mridha, Md. F. (2024). A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM. Scientific Reports, 14, Article 9603. https://doi.org/10.1038/s41598-024-60210-7 DOI: https://doi.org/10.1038/s41598-024-60210-7

Mialon, G., Dessi, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A., et al. (2023). Augmented language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2302.07842

Microsoft. (2023). Introducing Microsoft 365 Copilot: A whole new way to work. Microsoft News/Blog.

Mikolov, T. (2013). Efficient estimation of word representations in vector space. arXiv. https://doi.org/10.48550/arXiv.1301.3781

Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A comprehensive overview of large language models. arXiv. https://doi.org/10.48550/arXiv.2307.06435 DOI: https://doi.org/10.1145/3744746

Navigli, R., Conia, S., & Ross, B. (2023). Biases in large language models: Origins, inventory, and discussion. ACM Journal of Data and Information Quality, 15, 1–21. DOI: https://doi.org/10.1145/3597307

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., et al. (2024). GPT-4 technical report. arXiv. https://doi.org/10.48550/arXiv.2303.08774

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. arXiv. https://doi.org/10.48550/arXiv.2203.02155

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. arXiv. https://doi.org/10.48550/arXiv.1912.01703

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. DOI: https://doi.org/10.3115/v1/D14-1162

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv. https://doi.org/10.48550/arXiv.1802.05365 DOI: https://doi.org/10.18653/v1/N18-1202

Prabhu, S. P. (2024). PEDAL: Enhancing greedy decoding with large language models using diverse exemplars. arXiv. https://doi.org/10.48550/arXiv.2408.08869

Procko, T. T., & Ochoa, O. (2024). Graph retrieval-augmented generation for large language models: A survey. Proceedings of the 2024 Conference on AI, Science, Engineering, and Technology (AIxSET) (pp. 166–169). IEEE. https://doi.org/10.1109/AIxSET62544.2024.00030 DOI: https://doi.org/10.1109/AIxSET62544.2024.00030

Qu, C., Dai, S., Wei, X., Cai, H., Wang, S., Yin, D., Xu, J., & Wen, J.-R. (2024). Tool learning with large language models: A survey. arXiv. https://doi.org/10.1007/s11704-024-40678-2 DOI: https://doi.org/10.1007/s11704-024-40678-2

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog/Technical report.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2023). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv. https://doi.org/10.48550/arXiv.1910.10683

Rong, B., & Rutagemwa, H. (2024). Leveraging large language models for intelligent control of 6G integrated TN-NTN with IoT service. IEEE Network, 38, 136–142. https://doi.org/10.1109/MNET.2024.3384013 DOI: https://doi.org/10.1109/MNET.2024.3384013

Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv. https://doi.org/10.48550/arXiv.2402.07927 DOI: https://doi.org/10.1007/979-8-8688-0569-1_4

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv. https://doi.org/10.48550/arXiv.1707.06347

Shenoy, N., & Mbaziira, A. V. (2024). An extended review: LLM prompt engineering in cyber defense. Proceedings of the 2024 International Conference on Electrical, Computer and Energy Technologies (ICECET) (pp. 1–6). IEEE. https://doi.org/10.1109/ICECET61485.2024.10698605 DOI: https://doi.org/10.1109/ICECET61485.2024.10698605

Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., et al. (2023). Large language models encode clinical knowledge. Nature, 620, 172–180. DOI: https://doi.org/10.1038/s41586-023-06291-2

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. arXiv. https://doi.org/10.48550/arXiv.1409.3215

Tonmoy, S., Zaman, S., Jain, V., Rani, A., Rawte, V., Chadha, A., & Das, A. (2024). A comprehensive survey of hallucination mitigation techniques in large language models. arXiv. https://doi.org/10.48550/arXiv.2401.01313

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). LLaMA: Open and efficient foundation language models. arXiv. https://doi.org/10.48550/arXiv.2302.13971

Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.

Vilar, D., Freitag, M., Cherry, C., Luo, J., Ratnakar, V., & Foster, G. (2023). Prompting PaLM for translation: Assessing strategies and performance. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 15406–15427). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.859 DOI: https://doi.org/10.18653/v1/2023.acl-long.859

Wang, X., Wang, Z., Gao, X., Zhang, F., Wu, Y., Xu, Z., Shi, T., Wang, Z., Li, S., Qian, Q., et al. (2024). Searching for best practices in retrieval-augmented generation. arXiv. https://doi.org/10.48550/arXiv.2407.01219 DOI: https://doi.org/10.18653/v1/2024.emnlp-main.981

Xiong, H., Bian, J., Li, Y., Li, X., Du, M., Wang,Ein, S., Yin, D., & Helal, S. (2024). When search engine services meet large language models: Visions and challenges. arXiv. https://doi.org/10.48550/arXiv.2407.00128 DOI: https://doi.org/10.1109/TSC.2024.3451185

Xu, H., Gan, W., Qi, Z., Wu, J., & Yu, P. S. (2024). Large language models for education: A survey. arXiv. https://doi.org/10.48550/arXiv.2405.13001

Xu, S., Li, Z., Mei, K., & Zhang, Y. (2024). AIOS compiler: LLM as interpreter for natural language programming and flow programming of AI agents. arXiv. https://doi.org/10.48550/arXiv.2405.06907

Ye, H., Liu, T., Zhang, A., Hua, W., & Jia, W. (2023). Cognitive mirage: A review of hallucinations in large language models. arXiv. https://doi.org/10.48550/arXiv.2309.06794

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. (2024). A survey of large language models. arXiv. https://doi.org/10.48550/arXiv.2303.18223

Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv. https://doi.org/10.48550/arXiv.2304.10592

Zhu, Y., Yuan, H., Wang, S., Liu, J., Liu, W., Deng, C., Chen, H., Liu, Z., Dou, Z., & Wen, J.-R. (2024). Large language models for information retrieval: A survey. arXiv. https://doi.org/10.48550/arXiv.2308.07107 DOI: https://doi.org/10.1145/3748304

Downloads

Published

2025-11-13

Issue

Section

Articles

How to Cite

MOUZINHO, Gustavo de Aquino; OKIMOTO, Leandro Youiti Silva; CAMELO, Leonardo Yuto Suzuki; DE AZEVEDO, Nádila da Silva; BRAGANÇA, Hendrio Luis de Souza; FERNANDES, Rubens de Andrade; SEPPE, Fabricio Ribeiro; GOMES, Raimundo Claúdio Souza; CARDOSO, Fábio de Sousa. THE RISE OF LARGE LANGUAGE MODELS: A BEGINNER’S SURVEY. ARACÊ , [S. l.], v. 7, n. 11, p. e9901, 2025. DOI: 10.56238/arev7n11-118. Disponível em: https://periodicos.newsciencepubl.com/arace/article/view/9901. Acesso em: 5 dec. 2025.