CLASSIFICATION OF OBESITY LEVELS USING MACHINE LEARNING MODELS: A COMPARATIVE ANALYSIS OF RANDOM FOREST, SVM, AND LOGISTIC REGRESSION UNDER A CLINICAL ARTIFICIAL INTELLIGENCE PERSPECTIVE

Authors

  • Vitor Ramos Machado Author
  • Wesley Junio Soares de Oliveira Author
  • Cleber Asmar Ganzaroli Author
  • Edyane Luzia Pires Franco Author
  • Gabriel dos Santos Cabral Author
  • Wellington Miguel Lopes dos Santos Júnior Author
  • Hugo Leonardo Souza Lara Leão Author
  • Heyde Francielle do Carmo França Author

DOI:

https://doi.org/10.56238/arev7n12-107

Keywords:

Machine Learning, Artificial Intelligence, Obesity, Random Forest, Digital Health

Abstract

The global rise in obesity prevalence has increased the demand for analytical tools capable of improving diagnostic precision and risk stratification. This study evaluates three Machine Learning models (Random Forest, Support Vector Machine and Multinomial Logistic Regression) for classifying adult obesity levels. The proposed pipeline includes preprocessing, imputation, categorical encoding, normalization, cross-validation and multicriteria evaluation. Modern interpretability techniques based on Permutation Importance were incorporated to quantify the impact of each variable on the F1-macro metric, enhancing the transparency of the system within a clinical Artificial Intelligence perspective. A classical Body Mass Index baseline was also implemented, allowing comparisons between traditional clinical heuristics and supervised methods. Results indicate superior performance from Random Forest, surpassing both the baseline and the other algorithms. Findings highlight the potential of Machine Learning as a digital health support tool offering more robust predictions than simplified rules.

Downloads

Download data is not yet available.

References

AGGARWAL, C. C. Outlier Analysis. 2. ed. Cham: Springer, 2015. DOI: https://doi.org/10.1007/978-3-319-14142-8_8

BENCEK, M.; KHALIL, A.; RAHMANI, A. A comprehensive review on obesity analytics using machine learning. Journal of Biomedical Informatics, v. 137, p. 104253, 2023.

BEAM, A. L.; KOHANE, I. S. Big data and machine learning in health care. JAMA, v. 319, n. 13, p. 1317–1318, 2018. DOI: https://doi.org/10.1001/jama.2017.18391

BHASKAR, R.; SINGH, A. Obesity epidemiology in emerging economies: updated perspectives. Current Obesity Reports, 2022.

BIECEK, P.; BURZYKOWSKI, T. Explanatory Model Analysis: Explore, Explain, and Examine Predictive Models. Chapman and Hall/CRC, 2021. DOI: https://doi.org/10.1201/9780429027192

BREIMAN, L. Random forests. Machine Learning, v. 45, p. 5–32, 2021. (Reimpresso em edição comemorativa) DOI: https://doi.org/10.1023/A:1010933404324

CHENG, J.; SALAZAR, C. Body mass index and health risk: a critical review. Obesity Reviews, v. 22, n. 11, p. e13305, 2021.

FARRAN, B. et al. Global patterns and trends in body mass index. The Lancet Global Health, v. 11, n. 3, p. e350–e361, 2023.

GÉRON, A. Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow. 3. ed. Sebastopol: O’Reilly, 2022.

GHASSEMI, M.; OAKDEN-RAYNER, L.; BEAM, A. L. The false hope of current approaches to explainable AI in health care. The Lancet Digital Health, v. 3, n. 11, p. e745–e750, 2021. DOI: https://doi.org/10.1016/S2589-7500(21)00208-9

GEURTS, P.; ERNST, D.; WEHENKEL, L. Extremely randomized trees. Machine Learning, v. 63, p. 3–42, 2021. (Reedição especial) DOI: https://doi.org/10.1007/s10994-006-6226-1

HAN, J.; KAMBER, M.; PEI, J. Data Mining: Concepts and Techniques. 4. ed. Cambridge: Morgan Kaufmann, 2022.

HAYES, C.; FLINT, S. W. Understanding obesity as a complex, multifactorial disease. Current Obesity Reports, v. 12, p. 1–9, 2023.

HOAGLIN, D. C.; IGLEWICZ, B.; TUKEY, J. W. Performance of some resistant rules for outlier labeling. Journal of the American Statistical Association, v. 81, p. 991–999, 1986. DOI: https://doi.org/10.1080/01621459.1986.10478363

HRUBY, A.; HU, F. B. The epidemiology of obesity: a big picture. Pharmacoeconomics, v. 39, p. 673–689, 2021. DOI: https://doi.org/10.1007/s40273-014-0243-x

HRUBY, A.; HU, F. B. Obesity and metabolic risk: clinical implications. Annual Review of Public Health, v. 43, p. 185–204, 2022.

JAMES, G. et al. An Introduction to Statistical Learning. 2. ed. Springer, 2021. DOI: https://doi.org/10.1007/978-1-0716-1418-1

KARYOTAKIS, M. et al. Data quality challenges in machine learning-based health prediction models. npj Digital Medicine, v. 6, p. 121, 2023.

KUHN, M.; JOHNSON, K. Applied Predictive Modeling. 2. ed. Springer, 2020.

LEE, D. et al. Visual analytics for population health. IEEE Transactions on Visualization and Computer Graphics, v. 27, n. 2, p. 1126–1136, 2021.

LI, Y.; ZHANG, J.; WANG, X. Support vector machine applications in medical classification: a 2023 update. Artificial Intelligence in Medicine, v. 140, p. 102600, 2023.

LUO, W. et al. Strategies for normalizing categorical variables in predictive models. Statistics in Medicine, v. 35, n. 25, p. 4630–4645, 2016.

MARTÍNEZ-MILLANA, A. et al. AI-powered obesity prediction systems: a systematic review. Healthcare Analytics, v. 3, p. 100123, 2023.

MENDES, D. et al. Integrating machine learning into clinical obesity management. International Journal of Obesity, v. 45, p. 129–140, 2021.

MOLNAR, C. Interpretable Machine Learning. 2. ed. 2022.

NCD-RISK FACTOR COLLABORATION. Worldwide trends in BMI, underweight and obesity. The Lancet, v. 397, p. 191–202, 2021.

NEELAND, I. J. et al. Obesity phenotypes and metabolic risk. JACC, v. 81, n. 2, p. 203–219, 2023.

NGUYEN, M. et al. Machine learning for population-level obesity risk. PLoS Digital Health, v. 2, n. 9, p. e0000293, 2023.

NUTTALL, F. Q. Body mass index: Obsession or logic? Nutrition Today, v. 57, p. 123–131, 2022.

OJO, O. et al. AI-driven diagnostic tools for metabolic disorders. Frontiers in Digital Health, v. 5, p. 121–136, 2023.

OMS — ORGANIZAÇÃO MUNDIAL DA SAÚDE. Obesity and overweight: key facts. Geneva, 2023.

POPKIN, B. M. et al. Global nutrition transition and obesity trends. Lancet Diabetes & Endocrinology, v. 8, p. 1–15, 2020.

PROVOST, F.; FAWCETT, T. Data Science for Business. Cambridge: O’Reilly, 2013.

RASCHKA, S.; MIRJALILI, V. Advances in machine learning model development with Python. Journal of Machine Learning Research, 2021.

RASCHKA, S.; PATTERSON, J.; NOLET, C. Machine Learning in Python: advances and best practices. Journal of Machine Learning Applications, v. 4, n. 1, p. 1–18, 2022.

RIBEIRO, M. T.; SINGH, S.; GUESTRIN, C. Anchors: high-precision model-agnostic explanations. AAAI, p. 1521–1529, 2020.

RIBEIRO, L. M.; OLIVEIRA, J. H. Assessing data quality impacts on predictive modeling in healthcare. Data & Knowledge Engineering, v. 147, p. 102195, 2023.

RIBEIRO, A. L.; CARVALHO, B. Random forest optimization for clinical prediction tasks. BMC Medical Informatics and Decision Making, 2023.

RUDIN, C. Stop explaining black box models: instead use interpretable models. Nature Machine Intelligence, v. 3, p. 206–215, 2021.

SALIHU, H. M.; ALAM, S. The global burden of obesity. Global Health Journal, v. 6, p. 31–39, 2022.

SINGH, S.; KIM, J.; SHAH, N. AI-driven metabolic disorder prediction. npj Digital Medicine, v. 7, p. 11, 2024.

SMITH, K. B.; SMITH, M. S. Obesity classification limitations and clinical implications. Nature Metabolism, 2021.

SOKOLOVA, M.; LAPALME, G. A systematic analysis of performance measures for classification. Information Processing & Management, v. 57, p. 102345, 2020.

SUTTON, B.; PINCOCK, R. Reassessing BMI thresholds: a population study. Public Health Nutrition, v. 25, n. 4, p. 567–575, 2022.

SUN, X. et al. Performance of cross-validation in high-dimensional health data. Scientific Reports, v. 11, p. 22410, 2021.

TAN, Z.; YU, S.; JIANG, X. Evaluation strategies for multi-class medical classifiers. Artificial Intelligence in Medicine, v. 129, p. 102299, 2022.

TOPOL, E. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again. New York: Basic Books, 2019.

VAN DEN BROECK, J. et al. Data cleaning in epidemiology. American Journal of Epidemiology, v. 161, p. 103–113, 2005. DOI: https://doi.org/10.1093/aje/kwi016

WANG, Y. et al. A comparative analysis of clinical ML models. IEEE Journal of Biomedical and Health Informatics, v. 26, p. 345–357, 2022.

WORLD OBESITY FEDERATION. Global Obesity Atlas 2024. Londres: WOF, 2024.

XU, Y. et al. Advances in explainable AI for clinical risk models. Patterns, v. 4, n. 2, p. 100678, 2023.

XU, Z.; LI, M.; HAN, Y. Challenges in multi-class epidemiological classification using machine learning. BMC Bioinformatics, 2022.

ZHANG, Q. et al. Deep learning-based obesity classification: a systematic review. Computers in Biology and Medicine, v. 142, p. 105201, 2021.

ZHANG, T. et al. Evaluating multi-class classifiers under imbalanced settings. Knowledge-Based Systems, v. 257, p. 110098, 2023.

ZHOU, Z.-H. Ensemble Methods: Foundations and Algorithms. 2. ed. Boca Raton: CRC Press, 2021.

Published

2025-12-11

Issue

Section

Articles

How to Cite

MACHADO, Vitor Ramos; DE OLIVEIRA, Wesley Junio Soares; GANZAROLI, Cleber Asmar; FRANCO, Edyane Luzia Pires; CABRAL, Gabriel dos Santos; DOS SANTOS JÚNIOR, Wellington Miguel Lopes; LEÃO, Hugo Leonardo Souza Lara; FRANÇA, Heyde Francielle do Carmo. CLASSIFICATION OF OBESITY LEVELS USING MACHINE LEARNING MODELS: A COMPARATIVE ANALYSIS OF RANDOM FOREST, SVM, AND LOGISTIC REGRESSION UNDER A CLINICAL ARTIFICIAL INTELLIGENCE PERSPECTIVE. ARACÊ , [S. l.], v. 7, n. 12, p. e10958, 2025. DOI: 10.56238/arev7n12-107. Disponível em: https://periodicos.newsciencepubl.com/arace/article/view/10958. Acesso em: 17 feb. 2026.