THE RISE OF LARGE LANGUAGE MODELS: A BEGINNER’S SURVEY
DOI:
https://doi.org/10.56238/arev7n11-118Keywords:
Large Language Models, Generative AI, Natural Language Processing, Deep LearningAbstract
Large Language Models (LLMs) have rapidly reshaped Natural Language Processing by shifting the field from task-specific systems to general-purpose models capable of following instructions and handling diverse tasks with minimal adaptation. This beginner-oriented survey traces the rise of LLMs, outlining the historical path from statistical methods and early neural architectures to Transformer-based models and the emergence of in-context learning. We introduce the core ingredients that make LLMs work—pre-training on large corpora, optional fine-tuning and alignment, and decoding strategies for generation—emphasizing how scale and self-attention enable broad generalization. At its core, this paper offers a compact, newcomer-friendly narrative of the LLM era, distilling the minimum set of concepts and milestones needed to build intuition about how modern models are trained and used.
Downloads
References
Agarwal, V., Jin, Y., Chandra, M., De Choudhury, M., Kumar, S., & Sastry, N. (2024). MedHalu: Hallucinations in responses to healthcare queries by large language models. arXiv. https://doi.org/10.48550/arXiv.2409.19492
Arslan, M., Ghanem, H., Munawar, S., & Cruz, C. (2024). A survey on RAG with LLMs. Procedia Computer Science, 246, 3781–3790. https://doi.org/10.1016/j.procs.2024.09.178 DOI: https://doi.org/10.1016/j.procs.2024.09.178
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kern7, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., et al. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv. https://doi.org/10.48550/arXiv.2212.08073
Bahdanau, D. (2014). Neural machine translation by jointly learning to align and translate. arXiv. https://doi.org/10.48550/arXiv.1409.0473
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv. https://doi.org/10.48550/arXiv.2108.07258
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2022). On the opportunities and risks of foundation models. arXiv. https://doi.org/10.48550/arXiv.2108.07258
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020). Curran Associates.
Carroll, A. J., & Borycz, J. (2024). Integrating large language models and generative artificial intelligence tools into information literacy instruction. The Journal of Academic Librarianship, 50, Article 102899. https://doi.org/10.1016/j.acalib.2024.102899 DOI: https://doi.org/10.1016/j.acalib.2024.102899
Chen, Y.-C., Hsu, P.-C., Hsu, C.-J., & Shan Shiu, D. (2024). Enhancing function-calling capabilities in LLMs: Strategies for prompt formats, data integration, and multilingual translation. arXiv. https://doi.org/10.48550/arXiv.2412.01130 DOI: https://doi.org/10.18653/v1/2025.naacl-industry.9
Cheng, W., Sun, K., Zhang, X., & Wang, W. (2025). Security attacks on LLM-based code completion tools. arXiv. https://doi.org/10.48550/arXiv.2408.11006 DOI: https://doi.org/10.1609/aaai.v39i22.34537
Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2023). Deep reinforcement learning from human preferences. arXiv. https://doi.org/10.48550/arXiv.1706.03741
Dai, W., Li, J., Li, D., Tiong, A. M. H., Zhao, J., Wang, W., Li, B., Fung, P., & Hoi, S. (2023). InstructBLIP: Towards general-purpose vision-language models with instruction tuning. arXiv. https://doi.org/10.48550/arXiv.2305.06500
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT (Vol. 1, pp. 2). Association for Computational Linguistics.
Ding, J., Nguyen, H., & Chen, H. (2024). Evaluation of question-answering based text summarization using LLM (Invited Paper). Proceedings of the 2024 IEEE International Conference on Artificial Intelligence Testing (AITest) (pp. 142–149). IEEE. https://doi.org/10.1109/AITest62860.2024.00025 DOI: https://doi.org/10.1109/AITest62860.2024.00025
Dong, Y., Zhang, H., Li, C., Guo, S., Leung, V. C., & Hu, X. (2024). Fine-tuning and deploying large language models over edges: Issues and approaches. arXiv. https://doi.org/10.48550/arXiv.2408.10691
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. (2024). From local to global: A graph RAG approach to query-focused summarization. arXiv. https://doi.org/10.48550/arXiv.2404.16130
Franceschelli, G., & Musolesi, M. (2024). Creative beam search: LLM-as-a-judge for improving response generation. arXiv. https://doi.org/10.48550/arXiv.2405.00099
Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2312.10997
Goyal, T., Li, J. J., & Durrett, G. (2023). News summarization and evaluation in the era of GPT-3. arXiv. https://doi.org/10.48550/arXiv.2209.12356
Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N. V., Wiest, O., & Zhang, X. (2024). Large language model based multi-agents: A survey of progress and challenges. arXiv. https://doi.org/10.48550/arXiv.2402.01680
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv. https://doi.org/10.48550/arXiv.1503.02531
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Kirrane, S., Labra Gayo, J. E., Navigli, R., Neumaier, S., et al. (2021). Knowledge graphs. ACM Computing Surveys, 54(s/n), 1–37. https://doi.org/10.1145/3447772 DOI: https://doi.org/10.1145/3447772
Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2020). The curious case of neural text degeneration. arXiv. https://doi.org/10.48550/arXiv.1904.09751
Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv. https://doi.org/10.48550/arXiv.1801.06146 DOI: https://doi.org/10.18653/v1/P18-1031
Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. arXiv. https://doi.org/10.48550/arXiv.2106.09685
Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al. (2023). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv. https://doi.org/10.48550/arXiv.2311.05232
Ji, B., Duan, X., Zhang, Y., Wu, K., & Zhang, M. (2024). Zero-shot prompting for LLM-based machine translation using in-domain target sentences. IEEE/ACM Transactions on Audio, Speech, and Language Processing, s/n, 1–12. https://doi.org/10.1109/TASLP.2024.3519814 DOI: https://doi.org/10.1109/TASLP.2024.3519814
Johnson, L. E., & Rashad, S. (2024). An innovative system for real-time translation from American Sign Language (ASL) to spoken English using a large language model (LLM). Proceedings of the 2024 IEEE 15th Annual UEMCON (pp. 605–611). IEEE. https://doi.org/10.1109/UEMCON62879.2024.10754690 DOI: https://doi.org/10.1109/UEMCON62879.2024.10754690
Jung, S. J., Kim, H., & Jang, K. S. (2024). LLM based biological named entity recognition from scientific literature. Proceedings of the 2024 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 433–435). IEEE. https://doi.org/10.1109/BigComp60711.2024.00095 DOI: https://doi.org/10.1109/BigComp60711.2024.00095
Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. arXiv. https://doi.org/10.48550/arXiv.2001.08361
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., et al. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv. https://doi.org/10.48550/arXiv.2005.11401
Li, B., Zhang, Y., Chen, L., Wang, J., Yang, J., & Liu, Z. (2023). Otter: A multi-modal model with in-context instruction tuning. arXiv. https://doi.org/10.48550/arXiv.2305.03726
Li, J., Li, D., Savarese, S., & Hoi, S. (2023). BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv. https://doi.org/10.48550/arXiv.2301.12597
Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. arXiv. https://doi.org/10.48550/arXiv.2101.00190
Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayan, D., Wu, Y., Kumar, A., et al. (2023). Holistic evaluation of language models. arXiv. https://doi.org/10.48550/arXiv.2211.09110
Lin, H. (2024). Designing domain-specific large language models: The critical role of fine-tuning in public opinion simulation. arXiv. https://doi.org/10.48550/arXiv.2409.19308
Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., Chowdhury, T., Li, Y., Cui, H., Zhang, X., et al. (2023). Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv. https://doi.org/10.48550/arXiv.2305.18703
Manerba, M. M., Stańczak, K., Guidotti, R., & Augenstein, I. (2023). Social bias probing: Fairness benchmarking for language models. arXiv. https://doi.org/10.48550/arXiv.2311.09090
Mariño, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A. R., & Costa-jussà, M. R. (2006). N-gram-based machine translation. Computational Linguistics, 32(4), 527–549. https://doi.org/10.1162/coli.2006.32.4.527 DOI: https://doi.org/10.1162/coli.2006.32.4.527
Mehta, H., Kumar Bharti, S., & Doshi, N. (2024). Comparative analysis of part of speech (POS) tagger for Gujarati language using deep learning and pre-trained LLM. Proceedings of the 2024 3rd International Conference for Innovation in Technology (INOCON) (pp. 1–3). IEEE. https://doi.org/10.1109/INOCON60754.2024.10511678 DOI: https://doi.org/10.1109/INOCON60754.2024.10511678
Meng, X., Yan, X., Zhang, K., Liu, D., Cui, X., Yang, Y., Zhang, M., Cao, C., Wang, J., Wang, X., et al. (2024). The application of large language models in medicine: A scoping review. iScience, 27, Article 109713. https://doi.org/10.1016/j.isci.2024.109713 DOI: https://doi.org/10.1016/j.isci.2024.109713
Miah, Md. S. U., Kabir, Md. M., Sarwar, T. B., Safran, M., Alfarhood, S., & Mridha, Md. F. (2024). A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM. Scientific Reports, 14, Article 9603. https://doi.org/10.1038/s41598-024-60210-7 DOI: https://doi.org/10.1038/s41598-024-60210-7
Mialon, G., Dessi, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A., et al. (2023). Augmented language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2302.07842
Microsoft. (2023). Introducing Microsoft 365 Copilot: A whole new way to work. Microsoft News/Blog.
Mikolov, T. (2013). Efficient estimation of word representations in vector space. arXiv. https://doi.org/10.48550/arXiv.1301.3781
Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A comprehensive overview of large language models. arXiv. https://doi.org/10.48550/arXiv.2307.06435 DOI: https://doi.org/10.1145/3744746
Navigli, R., Conia, S., & Ross, B. (2023). Biases in large language models: Origins, inventory, and discussion. ACM Journal of Data and Information Quality, 15, 1–21. DOI: https://doi.org/10.1145/3597307
OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., et al. (2024). GPT-4 technical report. arXiv. https://doi.org/10.48550/arXiv.2303.08774
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. arXiv. https://doi.org/10.48550/arXiv.2203.02155
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. arXiv. https://doi.org/10.48550/arXiv.1912.01703
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. DOI: https://doi.org/10.3115/v1/D14-1162
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv. https://doi.org/10.48550/arXiv.1802.05365 DOI: https://doi.org/10.18653/v1/N18-1202
Prabhu, S. P. (2024). PEDAL: Enhancing greedy decoding with large language models using diverse exemplars. arXiv. https://doi.org/10.48550/arXiv.2408.08869
Procko, T. T., & Ochoa, O. (2024). Graph retrieval-augmented generation for large language models: A survey. Proceedings of the 2024 Conference on AI, Science, Engineering, and Technology (AIxSET) (pp. 166–169). IEEE. https://doi.org/10.1109/AIxSET62544.2024.00030 DOI: https://doi.org/10.1109/AIxSET62544.2024.00030
Qu, C., Dai, S., Wei, X., Cai, H., Wang, S., Yin, D., Xu, J., & Wen, J.-R. (2024). Tool learning with large language models: A survey. arXiv. https://doi.org/10.1007/s11704-024-40678-2 DOI: https://doi.org/10.1007/s11704-024-40678-2
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog/Technical report.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2023). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv. https://doi.org/10.48550/arXiv.1910.10683
Rong, B., & Rutagemwa, H. (2024). Leveraging large language models for intelligent control of 6G integrated TN-NTN with IoT service. IEEE Network, 38, 136–142. https://doi.org/10.1109/MNET.2024.3384013 DOI: https://doi.org/10.1109/MNET.2024.3384013
Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv. https://doi.org/10.48550/arXiv.2402.07927 DOI: https://doi.org/10.1007/979-8-8688-0569-1_4
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv. https://doi.org/10.48550/arXiv.1707.06347
Shenoy, N., & Mbaziira, A. V. (2024). An extended review: LLM prompt engineering in cyber defense. Proceedings of the 2024 International Conference on Electrical, Computer and Energy Technologies (ICECET) (pp. 1–6). IEEE. https://doi.org/10.1109/ICECET61485.2024.10698605 DOI: https://doi.org/10.1109/ICECET61485.2024.10698605
Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., et al. (2023). Large language models encode clinical knowledge. Nature, 620, 172–180. DOI: https://doi.org/10.1038/s41586-023-06291-2
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. arXiv. https://doi.org/10.48550/arXiv.1409.3215
Tonmoy, S., Zaman, S., Jain, V., Rani, A., Rawte, V., Chadha, A., & Das, A. (2024). A comprehensive survey of hallucination mitigation techniques in large language models. arXiv. https://doi.org/10.48550/arXiv.2401.01313
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). LLaMA: Open and efficient foundation language models. arXiv. https://doi.org/10.48550/arXiv.2302.13971
Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
Vilar, D., Freitag, M., Cherry, C., Luo, J., Ratnakar, V., & Foster, G. (2023). Prompting PaLM for translation: Assessing strategies and performance. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 15406–15427). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.859 DOI: https://doi.org/10.18653/v1/2023.acl-long.859
Wang, X., Wang, Z., Gao, X., Zhang, F., Wu, Y., Xu, Z., Shi, T., Wang, Z., Li, S., Qian, Q., et al. (2024). Searching for best practices in retrieval-augmented generation. arXiv. https://doi.org/10.48550/arXiv.2407.01219 DOI: https://doi.org/10.18653/v1/2024.emnlp-main.981
Xiong, H., Bian, J., Li, Y., Li, X., Du, M., Wang,Ein, S., Yin, D., & Helal, S. (2024). When search engine services meet large language models: Visions and challenges. arXiv. https://doi.org/10.48550/arXiv.2407.00128 DOI: https://doi.org/10.1109/TSC.2024.3451185
Xu, H., Gan, W., Qi, Z., Wu, J., & Yu, P. S. (2024). Large language models for education: A survey. arXiv. https://doi.org/10.48550/arXiv.2405.13001
Xu, S., Li, Z., Mei, K., & Zhang, Y. (2024). AIOS compiler: LLM as interpreter for natural language programming and flow programming of AI agents. arXiv. https://doi.org/10.48550/arXiv.2405.06907
Ye, H., Liu, T., Zhang, A., Hua, W., & Jia, W. (2023). Cognitive mirage: A review of hallucinations in large language models. arXiv. https://doi.org/10.48550/arXiv.2309.06794
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. (2024). A survey of large language models. arXiv. https://doi.org/10.48550/arXiv.2303.18223
Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv. https://doi.org/10.48550/arXiv.2304.10592
Zhu, Y., Yuan, H., Wang, S., Liu, J., Liu, W., Deng, C., Chen, H., Liu, Z., Dou, Z., & Wen, J.-R. (2024). Large language models for information retrieval: A survey. arXiv. https://doi.org/10.48550/arXiv.2308.07107 DOI: https://doi.org/10.1145/3748304
