EVALUATION OF GENERATIVE ARTIFICIAL INTELLIGENCE MODELS IN TEACHING PROGRAMMING LOGIC: A COMPARATIVE ANALYSIS BETWEEN THE GEMMA AND META LLAMA MODELS

Authors

  • Henrique Augusto Santos Matos Author
  • Jefferson Sousa Sampaio Júnior Author
  • Aline Lopes da Silva Author
  • Dadilton Bastos Melo Author

DOI:

https://doi.org/10.56238/levv17n59-014

Keywords:

Generative Artificial Intelligence, Programming Education, Programming Logic, Language Models, Comparative Evaluation

Abstract

This study aims to compare the performance of Generative Artificial Intelligence (GAI) models, specifically Gemma and Meta LLaMA, in the context of teaching programming logic. The research seeks to analyze which model presents better performance considering criteria such as response time, textual consistency, and pedagogical adequacy. To achieve this, an experimental approach was adopted, with the models being executed in a local environment using the LM Studio software, allowing access to their APIs. The evaluation metrics included BLEU, ROUGE, METEOR, and BERTScore, in addition to measuring response time in milliseconds. Furthermore, a subjective evaluation was conducted with the participation of 30 students from the Information Systems course, who analyzed the responses generated by the models based on criteria such as clarity, objectivity, and pedagogical usefulness. The results indicate that Meta LLaMA presented better performance in terms of computational efficiency and structural similarity of responses, while the Gemma model demonstrated greater semantic richness and explanatory capacity in contexts that require deeper conceptual understanding. The human evaluation corroborated these findings, showing a preference for Meta LLaMA in objective questions and for Gemma in questions that require more detailed explanations. It is concluded that the models have complementary characteristics, and their combined use is recommended to enhance the teaching-learning process in programming logic.

Downloads

Download data is not yet available.

References

BANERJEE, Satanjeev; LAVIE, Alon. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments.In: WORKSHOP ON INTRINSIC AND EXTRINSIC EVALUATION MEASURES FOR MACHINE TRANSLATION AND/OR SUMMARIZATION, 2005, Ann Arbor.Proceedings […]. Ann Arbor: Association for Computational Linguistics, 2005.

BECKER, B. A., DENNY, P. AND FINNIE-ANSLEY, J. et al. (2023) “Programming Is Hard - Or at Least It Used to Be: Educational Opportunities and Challenges of AI Code Generation”. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. . ACM. https://doi/10.1145/3545945.3569759.

BRILHANTE, F. J. P. TellMe: uma ponte de comunicação simplificada entre o aluno e a instituição de ensino. 2025. Trabalho de Conclusão de Curso (Tecnologia em Análise e Desenvolvimento de Sistemas) – Instituto Federal da Paraíba, Cajazeiras, 2025.

CASTILHO, Gustavo Uruguay; RODRIGUEZ, Carla Lopes; HERRERA, Victoria Alejandra Salazar. Um relato de experiência de aplicação de engenharia de prompt no ensino superior em STEM. In: CONGRESSO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (CBIE 2024); WORKSHOP EM ESTRATÉGIAS TRANSFORMADORAS E INOVAÇÃO NA EDUCAÇÃO (WETIE 2024), 2024, Santo André – SP. Anais [...]. Santo André: UFABC, 2024.

DENNY, P., PRATHER, J., BECKER, B. A., FINNIE-ANSLEY, J., HELLAS, A., LEINONEN, J., LUXTON-REILLY, A., REEVES, B. N., SANTOS, E. A., & SARSA, S. (2024). “Computing Education in the Era of Generative AI”. Communications of the ACM, 67(2), 56–67. https://doi.org/10.1145/3624720.

DUBEY, A., et al. (2024). The Llama 3 Herd of Models. arXiv preprint arXiv:2407.21783.

ELNAFFAR, Said et al. Teaching with AI: A systematic review of chatbots, generative tools, and tutoring systems in programming education. arXiv preprint, 2025. Disponível em: https://arxiv.org/abs/2510.03884. Acesso em: 15 jan. 2026.

GEG BRASIL. Educação 5.0: Pedagogia de Prompt Vs Engenharia de Prompt em Sala de Aula. 2024. Disponível em:https://comunidadegegbrasil.blogspot.com/2024/04/educacao-50-pedagogia-de-prompt-vs.html. Acesso em: 03/04/2025.

GOOGLE DEEPMIND. (2024). Gemma: Open Models Based on Gemini Research and Technology. arXiv preprint arXiv:2403.08295.

GOMES, A.; MENDES, A. J. Learning to program – difficulties and solutions. In: INTERNATIONAL CONFERENCE ON ENGINEERING EDUCATION, 2007, Coimbra. Proceedings [...]. Coimbra: ICEE, 2007.

LIESENFELD, A., LOPEZ, A., & DINGEMANSE, M. (2023). Opening up ChatGPT: Tracking openness of instruction-tuned LLMs. Proceedings of the 5th International Conference on Conversational User Interfaces.

LIN, Chin-Yew. ROUGE: A package for automatic evaluation of summaries.In: WORKSHOP ON TEXT SUMMARIZATION BRANCHES OUT, 2004, Barcelona. Proceedings […]. Barcelona: Association for Computational Linguistics, 2004.

LUCKIN, R. et al. Intelligence Unleashed: An argument for AI in Education. London: Pearson Education, 2016.

MARQUES, T. M.; SANT’ANA, C. C. A inteligência artificial como recurso para o ensino de matemática: comparativo entre ChatGPT e Gemini. 2024. Disponível em: [PDF].

PAPINENI, Kishore et al. BLEU: a method for automatic evaluation of machine translation.In: ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 40., 2002, Philadelphia. Proceedings […]. Philadelphia: Association for Computational Linguistics, 2002.

SILVA, J. Estudo exploratório e análise comparativa de ferramentas de inteligência artificial generativa para o ensino de computação. 2024. Trabalho de Conclusão de Curso (Bacharelado em Ciência da Computação) – Universidade Federal de Uberlândia, Uberlândia, 2024.

SILVA, RODRIGO MERO SARMENTO DA; SILVA, JANIEL DOS SANTOS. Prompt Engineering na Educação em Engenharia: Potencializando a Experiência dos Alunos em Sala de Aula. In: CONGRESSO BRASILEIRO DE EDUCAÇÃO EM ENGENHARIA, 52., 2024, Maceió. Anais [...]. Maceió: ABENGE, 2024. Disponível em: https://abenge.org.br/sis_artigo_com_capa.php/?cod_trab=4897. Acesso em: 03/04/2025

SILVA, Teresinha Letícia da; VIDOTTO, Kajiana Nuernberg Sartor; TAROUCO, Liane Margarida Rockenbach; SILVA, Patrícia Fernanda da. Potencialidades do uso de Inteligência Artificial Generativa como apoio ao Ensino de Programação. In: CONGRESSO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (CBIE 2024); SIMPÓSIO BRASILEIRO DE INFORMÁTICA NA EDUCAÇÃO (SBIE 2024), 2024, Porto Alegre. Anais [...]. Porto Alegre: UFRGS, 2024.

WANG, Tianyu; ZHOU, Nianjun; CHEN, Zhixiong. Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation. A Preprint. Mercy University; IBM Research, 2024. Disponível em: https://arxiv.org/abs/2407.05437. Acesso em: 10 out. 2025.

ZHANG, Tianyi et al. BERTScore: Evaluating text generation with BERT.In: INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS, 2019.

ZAWACKI-RICHTER, O. et al. Systematic review of research on artificial intelligence applications in higher education: Where are the educators? International Journal of Educational Technology in Higher Education, [S.l.], v. 16, n. 1, p. 1–27, 2019. DOI: https://doi.org/10.1186/s41239-019-0171-0. Acesso em: 17 abr. 2025.

Published

2026-04-08

How to Cite

MATOS, Henrique Augusto Santos; SAMPAIO JÚNIOR, Jefferson Sousa; DA SILVA, Aline Lopes; MELO, Dadilton Bastos. EVALUATION OF GENERATIVE ARTIFICIAL INTELLIGENCE MODELS IN TEACHING PROGRAMMING LOGIC: A COMPARATIVE ANALYSIS BETWEEN THE GEMMA AND META LLAMA MODELS. LUMEN ET VIRTUS, [S. l.], v. 17, n. 59, p. e12799 , 2026. DOI: 10.56238/levv17n59-014. Disponível em: https://periodicos.newsciencepubl.com/LEV/article/view/12799. Acesso em: 8 apr. 2026.