THE RISE OF LARGE LANGUAGE MODELS: A BEGINNER’S SURVEY

Gustavo de Aquino Mouzinho; Leandro Youiti Silva Okimoto; Leonardo Yuto Suzuki Camelo; Nádila da Silva  de Azevedo; Hendrio Luis de Souza Bragança; Rubens de Andrade  Fernandes; Fabricio Ribeiro Seppe; Raimundo Claúdio Souza Gomes; Fábio de Sousa Cardoso

doi:10.56238/arev7n11-118

Autores

Gustavo de Aquino Mouzinho Autor
Leandro Youiti Silva Okimoto Autor
Leonardo Yuto Suzuki Camelo Autor
Nádila da Silva de Azevedo Autor
Hendrio Luis de Souza Bragança Autor
Rubens de Andrade Fernandes Autor
Fabricio Ribeiro Seppe Autor
Raimundo Claúdio Souza Gomes Autor
Fábio de Sousa Cardoso Autor

DOI:

https://doi.org/10.56238/arev7n11-118

Palavras-chave:

Grandes Modelos de Linguagem, IA Generativa, Processamento de Linguagem Natural, Aprendizado Profundo

Resumo

Os Grandes Modelos de Linguagem (LLMs) transformaram rapidamente o Processamento de Linguagem Natural, ao deslocar o campo de sistemas específicos para tarefas em direção a modelos de uso geral, capazes de seguir instruções e lidar com diversas tarefas com mínima adaptação. Esta pesquisa voltada para iniciantes traça a ascensão dos LLMs, delineando o percurso histórico desde os métodos estatísticos e as primeiras arquiteturas neurais até os modelos baseados em Transformer e o surgimento do aprendizado em contexto (in-context learning). Apresentamos os principais elementos que fazem os LLMs funcionarem — pré-treinamento em grandes corpora, fine-tuning e alinhamento opcionais, e estratégias de decodificação para geração — enfatizando como a escala e o mecanismo de autoatenção permitem ampla generalização. Em essência, este artigo oferece uma narrativa concisa e acessível sobre a era dos LLMs, condensando o conjunto mínimo de conceitos e marcos necessários para construir uma compreensão intuitiva sobre como os modelos modernos são treinados e utilizados.

Downloads

Os dados de download ainda não estão disponíveis.

Referências

Agarwal, V., Jin, Y., Chandra, M., De Choudhury, M., Kumar, S., & Sastry, N. (2024). MedHalu: Hallucinations in responses to healthcare queries by large language models. arXiv. https://doi.org/10.48550/arXiv.2409.19492

Arslan, M., Ghanem, H., Munawar, S., & Cruz, C. (2024). A survey on RAG with LLMs. Procedia Computer Science, 246, 3781–3790. https://doi.org/10.1016/j.procs.2024.09.178 DOI: https://doi.org/10.1016/j.procs.2024.09.178

Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kern7, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., et al. (2022). Constitutional AI: Harmlessness from AI feedback. arXiv. https://doi.org/10.48550/arXiv.2212.08073

Bahdanau, D. (2014). Neural machine translation by jointly learning to align and translate. arXiv. https://doi.org/10.48550/arXiv.1409.0473

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2021). On the opportunities and risks of foundation models. arXiv. https://doi.org/10.48550/arXiv.2108.07258

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. (2022). On the opportunities and risks of foundation models. arXiv. https://doi.org/10.48550/arXiv.2108.07258

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Proceedings of the 34th International Conference on Neural Information Processing Systems (NeurIPS 2020). Curran Associates.

Carroll, A. J., & Borycz, J. (2024). Integrating large language models and generative artificial intelligence tools into information literacy instruction. The Journal of Academic Librarianship, 50, Article 102899. https://doi.org/10.1016/j.acalib.2024.102899 DOI: https://doi.org/10.1016/j.acalib.2024.102899

Chen, Y.-C., Hsu, P.-C., Hsu, C.-J., & Shan Shiu, D. (2024). Enhancing function-calling capabilities in LLMs: Strategies for prompt formats, data integration, and multilingual translation. arXiv. https://doi.org/10.48550/arXiv.2412.01130 DOI: https://doi.org/10.18653/v1/2025.naacl-industry.9

Cheng, W., Sun, K., Zhang, X., & Wang, W. (2025). Security attacks on LLM-based code completion tools. arXiv. https://doi.org/10.48550/arXiv.2408.11006 DOI: https://doi.org/10.1609/aaai.v39i22.34537

Christiano, P., Leike, J., Brown, T. B., Martic, M., Legg, S., & Amodei, D. (2023). Deep reinforcement learning from human preferences. arXiv. https://doi.org/10.48550/arXiv.1706.03741

Dai, W., Li, J., Li, D., Tiong, A. M. H., Zhao, J., Wang, W., Li, B., Fung, P., & Hoi, S. (2023). InstructBLIP: Towards general-purpose vision-language models with instruction tuning. arXiv. https://doi.org/10.48550/arXiv.2305.06500

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT (Vol. 1, pp. 2). Association for Computational Linguistics.

Ding, J., Nguyen, H., & Chen, H. (2024). Evaluation of question-answering based text summarization using LLM (Invited Paper). Proceedings of the 2024 IEEE International Conference on Artificial Intelligence Testing (AITest) (pp. 142–149). IEEE. https://doi.org/10.1109/AITest62860.2024.00025 DOI: https://doi.org/10.1109/AITest62860.2024.00025

Dong, Y., Zhang, H., Li, C., Guo, S., Leung, V. C., & Hu, X. (2024). Fine-tuning and deploying large language models over edges: Issues and approaches. arXiv. https://doi.org/10.48550/arXiv.2408.10691

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. (2024). From local to global: A graph RAG approach to query-focused summarization. arXiv. https://doi.org/10.48550/arXiv.2404.16130

Franceschelli, G., & Musolesi, M. (2024). Creative beam search: LLM-as-a-judge for improving response generation. arXiv. https://doi.org/10.48550/arXiv.2405.00099

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., & Wang, H. (2024). Retrieval-augmented generation for large language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2312.10997

Goyal, T., Li, J. J., & Durrett, G. (2023). News summarization and evaluation in the era of GPT-3. arXiv. https://doi.org/10.48550/arXiv.2209.12356

Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N. V., Wiest, O., & Zhang, X. (2024). Large language model based multi-agents: A survey of progress and challenges. arXiv. https://doi.org/10.48550/arXiv.2402.01680

Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. arXiv. https://doi.org/10.48550/arXiv.1503.02531

Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Kirrane, S., Labra Gayo, J. E., Navigli, R., Neumaier, S., et al. (2021). Knowledge graphs. ACM Computing Surveys, 54(s/n), 1–37. https://doi.org/10.1145/3447772 DOI: https://doi.org/10.1145/3447772

Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2020). The curious case of neural text degeneration. arXiv. https://doi.org/10.48550/arXiv.1904.09751

Howard, J., & Ruder, S. (2018). Universal language model fine-tuning for text classification. arXiv. https://doi.org/10.48550/arXiv.1801.06146 DOI: https://doi.org/10.18653/v1/P18-1031

Hu, E. J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., & Chen, W. (2021). LoRA: Low-rank adaptation of large language models. arXiv. https://doi.org/10.48550/arXiv.2106.09685

Huang, L., Yu, W., Ma, W., Zhong, W., Feng, Z., Wang, H., Chen, Q., Peng, W., Feng, X., Qin, B., et al. (2023). A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv. https://doi.org/10.48550/arXiv.2311.05232

Ji, B., Duan, X., Zhang, Y., Wu, K., & Zhang, M. (2024). Zero-shot prompting for LLM-based machine translation using in-domain target sentences. IEEE/ACM Transactions on Audio, Speech, and Language Processing, s/n, 1–12. https://doi.org/10.1109/TASLP.2024.3519814 DOI: https://doi.org/10.1109/TASLP.2024.3519814

Johnson, L. E., & Rashad, S. (2024). An innovative system for real-time translation from American Sign Language (ASL) to spoken English using a large language model (LLM). Proceedings of the 2024 IEEE 15th Annual UEMCON (pp. 605–611). IEEE. https://doi.org/10.1109/UEMCON62879.2024.10754690 DOI: https://doi.org/10.1109/UEMCON62879.2024.10754690

Jung, S. J., Kim, H., & Jang, K. S. (2024). LLM based biological named entity recognition from scientific literature. Proceedings of the 2024 IEEE International Conference on Big Data and Smart Computing (BigComp) (pp. 433–435). IEEE. https://doi.org/10.1109/BigComp60711.2024.00095 DOI: https://doi.org/10.1109/BigComp60711.2024.00095

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., & Amodei, D. (2020). Scaling laws for neural language models. arXiv. https://doi.org/10.48550/arXiv.2001.08361

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., et al. (2021). Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv. https://doi.org/10.48550/arXiv.2005.11401

Li, B., Zhang, Y., Chen, L., Wang, J., Yang, J., & Liu, Z. (2023). Otter: A multi-modal model with in-context instruction tuning. arXiv. https://doi.org/10.48550/arXiv.2305.03726

Li, J., Li, D., Savarese, S., & Hoi, S. (2023). BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv. https://doi.org/10.48550/arXiv.2301.12597

Li, X. L., & Liang, P. (2021). Prefix-tuning: Optimizing continuous prompts for generation. arXiv. https://doi.org/10.48550/arXiv.2101.00190

Liang, P., Bommasani, R., Lee, T., Tsipras, D., Soylu, D., Yasunaga, M., Zhang, Y., Narayan, D., Wu, Y., Kumar, A., et al. (2023). Holistic evaluation of language models. arXiv. https://doi.org/10.48550/arXiv.2211.09110

Lin, H. (2024). Designing domain-specific large language models: The critical role of fine-tuning in public opinion simulation. arXiv. https://doi.org/10.48550/arXiv.2409.19308

Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., Chowdhury, T., Li, Y., Cui, H., Zhang, X., et al. (2023). Domain specialization as the key to make large language models disruptive: A comprehensive survey. arXiv. https://doi.org/10.48550/arXiv.2305.18703

Manerba, M. M., Stańczak, K., Guidotti, R., & Augenstein, I. (2023). Social bias probing: Fairness benchmarking for language models. arXiv. https://doi.org/10.48550/arXiv.2311.09090

Mariño, J. B., Banchs, R. E., Crego, J. M., de Gispert, A., Lambert, P., Fonollosa, J. A. R., & Costa-jussà, M. R. (2006). N-gram-based machine translation. Computational Linguistics, 32(4), 527–549. https://doi.org/10.1162/coli.2006.32.4.527 DOI: https://doi.org/10.1162/coli.2006.32.4.527

Mehta, H., Kumar Bharti, S., & Doshi, N. (2024). Comparative analysis of part of speech (POS) tagger for Gujarati language using deep learning and pre-trained LLM. Proceedings of the 2024 3rd International Conference for Innovation in Technology (INOCON) (pp. 1–3). IEEE. https://doi.org/10.1109/INOCON60754.2024.10511678 DOI: https://doi.org/10.1109/INOCON60754.2024.10511678

Meng, X., Yan, X., Zhang, K., Liu, D., Cui, X., Yang, Y., Zhang, M., Cao, C., Wang, J., Wang, X., et al. (2024). The application of large language models in medicine: A scoping review. iScience, 27, Article 109713. https://doi.org/10.1016/j.isci.2024.109713 DOI: https://doi.org/10.1016/j.isci.2024.109713

Miah, Md. S. U., Kabir, Md. M., Sarwar, T. B., Safran, M., Alfarhood, S., & Mridha, Md. F. (2024). A multimodal approach to cross-lingual sentiment analysis with ensemble of transformer and LLM. Scientific Reports, 14, Article 9603. https://doi.org/10.1038/s41598-024-60210-7 DOI: https://doi.org/10.1038/s41598-024-60210-7

Mialon, G., Dessi, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A., et al. (2023). Augmented language models: A survey. arXiv. https://doi.org/10.48550/arXiv.2302.07842

Microsoft. (2023). Introducing Microsoft 365 Copilot: A whole new way to work. Microsoft News/Blog.

Mikolov, T. (2013). Efficient estimation of word representations in vector space. arXiv. https://doi.org/10.48550/arXiv.1301.3781

Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2024). A comprehensive overview of large language models. arXiv. https://doi.org/10.48550/arXiv.2307.06435 DOI: https://doi.org/10.1145/3744746

Navigli, R., Conia, S., & Ross, B. (2023). Biases in large language models: Origins, inventory, and discussion. ACM Journal of Data and Information Quality, 15, 1–21. DOI: https://doi.org/10.1145/3597307

OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., et al. (2024). GPT-4 technical report. arXiv. https://doi.org/10.48550/arXiv.2303.08774

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. arXiv. https://doi.org/10.48550/arXiv.2203.02155

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. arXiv. https://doi.org/10.48550/arXiv.1912.01703

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. DOI: https://doi.org/10.3115/v1/D14-1162

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv. https://doi.org/10.48550/arXiv.1802.05365 DOI: https://doi.org/10.18653/v1/N18-1202

Prabhu, S. P. (2024). PEDAL: Enhancing greedy decoding with large language models using diverse exemplars. arXiv. https://doi.org/10.48550/arXiv.2408.08869

Procko, T. T., & Ochoa, O. (2024). Graph retrieval-augmented generation for large language models: A survey. Proceedings of the 2024 Conference on AI, Science, Engineering, and Technology (AIxSET) (pp. 166–169). IEEE. https://doi.org/10.1109/AIxSET62544.2024.00030 DOI: https://doi.org/10.1109/AIxSET62544.2024.00030

Qu, C., Dai, S., Wei, X., Cai, H., Wang, S., Yin, D., Xu, J., & Wen, J.-R. (2024). Tool learning with large language models: A survey. arXiv. https://doi.org/10.1007/s11704-024-40678-2 DOI: https://doi.org/10.1007/s11704-024-40678-2

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog/Technical report.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2023). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv. https://doi.org/10.48550/arXiv.1910.10683

Rong, B., & Rutagemwa, H. (2024). Leveraging large language models for intelligent control of 6G integrated TN-NTN with IoT service. IEEE Network, 38, 136–142. https://doi.org/10.1109/MNET.2024.3384013 DOI: https://doi.org/10.1109/MNET.2024.3384013

Sahoo, P., Singh, A. K., Saha, S., Jain, V., Mondal, S., & Chadha, A. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv. https://doi.org/10.48550/arXiv.2402.07927 DOI: https://doi.org/10.1007/979-8-8688-0569-1_4

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv. https://doi.org/10.48550/arXiv.1707.06347

Shenoy, N., & Mbaziira, A. V. (2024). An extended review: LLM prompt engineering in cyber defense. Proceedings of the 2024 International Conference on Electrical, Computer and Energy Technologies (ICECET) (pp. 1–6). IEEE. https://doi.org/10.1109/ICECET61485.2024.10698605 DOI: https://doi.org/10.1109/ICECET61485.2024.10698605

Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., Scales, N., Tanwani, A., Cole-Lewis, H., et al. (2023). Large language models encode clinical knowledge. Nature, 620, 172–180. DOI: https://doi.org/10.1038/s41586-023-06291-2

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. arXiv. https://doi.org/10.48550/arXiv.1409.3215

Tonmoy, S., Zaman, S., Jain, V., Rani, A., Rawte, V., Chadha, A., & Das, A. (2024). A comprehensive survey of hallucination mitigation techniques in large language models. arXiv. https://doi.org/10.48550/arXiv.2401.01313

Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). LLaMA: Open and efficient foundation language models. arXiv. https://doi.org/10.48550/arXiv.2302.13971

Vaswani, A. (2017). Attention is all you need. Advances in Neural Information Processing Systems.

Vilar, D., Freitag, M., Cherry, C., Luo, J., Ratnakar, V., & Foster, G. (2023). Prompting PaLM for translation: Assessing strategies and performance. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 15406–15427). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-long.859 DOI: https://doi.org/10.18653/v1/2023.acl-long.859

Wang, X., Wang, Z., Gao, X., Zhang, F., Wu, Y., Xu, Z., Shi, T., Wang, Z., Li, S., Qian, Q., et al. (2024). Searching for best practices in retrieval-augmented generation. arXiv. https://doi.org/10.48550/arXiv.2407.01219 DOI: https://doi.org/10.18653/v1/2024.emnlp-main.981

Xiong, H., Bian, J., Li, Y., Li, X., Du, M., Wang,Ein, S., Yin, D., & Helal, S. (2024). When search engine services meet large language models: Visions and challenges. arXiv. https://doi.org/10.48550/arXiv.2407.00128 DOI: https://doi.org/10.1109/TSC.2024.3451185

Xu, H., Gan, W., Qi, Z., Wu, J., & Yu, P. S. (2024). Large language models for education: A survey. arXiv. https://doi.org/10.48550/arXiv.2405.13001

Xu, S., Li, Z., Mei, K., & Zhang, Y. (2024). AIOS compiler: LLM as interpreter for natural language programming and flow programming of AI agents. arXiv. https://doi.org/10.48550/arXiv.2405.06907

Ye, H., Liu, T., Zhang, A., Hua, W., & Jia, W. (2023). Cognitive mirage: A review of hallucinations in large language models. arXiv. https://doi.org/10.48550/arXiv.2309.06794

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al. (2024). A survey of large language models. arXiv. https://doi.org/10.48550/arXiv.2303.18223

Zhu, D., Chen, J., Shen, X., Li, X., & Elhoseiny, M. (2023). MiniGPT-4: Enhancing vision-language understanding with advanced large language models. arXiv. https://doi.org/10.48550/arXiv.2304.10592

Zhu, Y., Yuan, H., Wang, S., Liu, J., Liu, W., Deng, C., Chen, H., Liu, Z., Dou, Z., & Wen, J.-R. (2024). Large language models for information retrieval: A survey. arXiv. https://doi.org/10.48550/arXiv.2308.07107 DOI: https://doi.org/10.1145/3748304

A ASCENSÃO DOS GRANDES MODELOS DE LINGUAGEM: UM PANORAMA PARA INICIANTES

Autores

DOI:

Palavras-chave:

Resumo

Downloads

Referências

Downloads

Publicado

Edição

Seção

Como Citar

Scopus e googs

Enviar Submissão

Idioma

Artigos mais recentes

Informações

Palavras-chave