LLM-BASED EXTERNAL CONTROL: EMPIRICAL EVALUATION OF PRUME AI

Authors

  • Alessandro de Souza Bezerra Author
  • Luciane Cavalcante Lopes Author

DOI:

https://doi.org/10.56238/arev7n9-271

Keywords:

Public Audit, Artificial Intelligence, Language Models, RAG, Provenance (PROV), Explainability, Compliance

Abstract

This article presents and evaluates PRUMe AI, an audit platform assisted by Language Models anchored in RAG and PROV tracks, applied to typical external control documents. Combining Design Science Research and a case study with a real sample from TCE-AM (150 documents including bids, contracts/addenda, and reports/opinions; 55% native PDFs, 45% scanned), PRUMe AI performs screening, extraction, compliance checks, and explainable reporting with structured outputs and provenance records. The results indicate material gains: average screening time from 21.4 to 7.9 min/doc (-63%) and total analysis from 39.2 to 17.8 min/doc (-55%); coverage per cycle from 25% (manual process) to 82%. In a subset annotated by experts (n=20), we obtained F1=0.86 (contract fields) and F1=0.82 (clauses), with precision@k=0.91 in the prioritization of “points of attention.” In RAG-anchored checks, 94% of findings included textual citations; the average reliability was 0.88 and inter-rater agreement reached k=0.78. PROV trails covered 96% of decisions and repetition reproduced 92% of results. We discuss limitations (OCR/layout quality, missing metadata, ambiguous wording, and curation of the normative collection) and propose an evolution agenda (document pipeline optimization, knowledge governance for RAG, and training). We conclude that PRUMe AI offers a replicable path to increasing efficiency, coverage, and standardization with transparency and auditability in external control.

Downloads

Download data is not yet available.

References

[1] J. E. Otia e E. Bracci, “Digital transformation and the public sector auditing: The SAI’s perspective”, Financ Acc Manag, v. 38, n. 2, p. 252–280, maio 2022, doi: 10.1111/faam.12317. DOI: https://doi.org/10.1111/faam.12317

[2] R. Roratto e E. D. Dias, “Security information in production and operations: a study on audit trails in database systems”, JISTEM USP, v. 11, n. 3, p. 717–734, dez. 2014, doi: 10.4301/s1807-17752014000300010. DOI: https://doi.org/10.4301/s1807-17752014000300010

[3] TCU, “Tribunal de Contas da União”, TCU é única instituição com uso avançado de inteligência artificial generativa, segundo a OCDE. Acesso em: 13 de agosto de 2025. [Online]. Disponível em: https://portal.tcu.gov.br/imprensa/noticias/tcu-e-unica-instituicao-com-uso-avancado-de-inteligencia-artificial-generativa-segundo-a-ocde?utm_source=chatgpt.com

[4] Tribunal de Contas do Estado de Santa Catarina, “Inteligência artificial criada pelo TCE/SC possibilita retificação em 215 editais de licitação, com previsão de investimentos de R$ 2 bilhões”, Inteligência artificial criada pelo TCE/SC possibilita retificação em 215 editais de licitação, com previsão de investimentos de R$ 2 bilhões. [Online]. Disponível em: https://www.tcesc.tc.br/inteligencia-artificial-criada-pelo-tcesc-possibilita-retificacao-em-215-editais-de-licitacao-com?utm_source=chatgpt.com

[5] ATRICON, “Inteligência artificial do TCE-SC identifica inconsistências em editais para transporte de estudantes e orienta ajustes a gestores”, Inteligência artificial do TCE-SC identifica inconsistências em editais para transporte de estudantes e orienta ajustes a gestores. [Online]. Disponível em: https://atricon.org.br/inteligencia-artificial-do-tce-sc-identifica-inconsistencias-em-editais-para-transporte-de-estudantes-e-orienta-ajustes-a-gestores/?utm_source=chatgpt.com

[6] Tribunal de Contas de Pernanbuco, “Aurora: TCE-PE lança plataforma de IA”, Aurora: TCE-PE lança plataforma de IA. [Online]. Disponível em: https://www.tcepe.tc.br/internet/index.php/noticias/439-2024/maio/7517-aurora-tce-pe-lanca-plataforma-de-ia?utm_source=chatgpt.com

[7] Tribunal de Contas da União, “Uso de inteligência artificial aprimora processos internos no Tribunal de Contas da União”, Uso de inteligência artificial aprimora processos internos no Tribunal de Contas da União. [Online]. Disponível em: https://portal.tcu.gov.br/imprensa/noticias/uso-de-inteligencia-artificial-aprimora-processos-internos-no-tribunal-de-contas-da-uniao?utm_source=chatgpt.com

[8] A. Zuiderwijk, Y.-C. Chen, e F. Salem, “Implications of the use of artificial intelligence in public governance: A systematic literature review and a research agenda”, Government Information Quarterly, v. 38, n. 3, p. 101577, jul. 2021, doi: 10.1016/j.giq.2021.101577. DOI: https://doi.org/10.1016/j.giq.2021.101577

[9] M. Overton, S. Larson, L. J. Carlson, e S. Kleinschmit, “Public data primacy: the changing landscape of public service delivery as big data gets bigger”, GPPG, v. 2, n. 4, p. 381–399, dez. 2022, doi: 10.1007/s43508-022-00052-z. DOI: https://doi.org/10.1007/s43508-022-00052-z

[10] International Standard on Auditing, “Audit Sampling”, AUDIT SAMPLING, Acesso em: 14 de agosto de 2025. [Online]. Disponível em: https://mia.org.my/storage/2022/04/ISA_530.pdf?utm_source=chatgpt.com

[11] M. G. Alles, A. Kogan, e M. A. Vasarhelyi, “Putting Continuous Auditing Theory into Practice: Lessons from Two Pilot Implementations”, Journal of Information Systems, v. 22, n. 2, p. 195–214, set. 2008, doi: 10.2308/jis.2008.22.2.195. DOI: https://doi.org/10.2308/jis.2008.22.2.195

[12] INTOSAI, “GUIDANCE ON CONDUCTING AUDIT ACTIVITIES WITH DATA ANALYTICS”, GUIDANCE ON CONDUCTING AUDIT ACTIVITIES WITH DATA ANALYTICS, [Online]. Disponível em: https://www.idi.no/elibrary/relevant-sais/lota/other-resources/1877-wgbd-audit-activities-with-data-analytics-2022

[13] F. Ariai, J. Mackenzie, e G. Demartini, “Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges”, 30 de julho de 2025, arXiv: arXiv:2410.21306. doi: 10.48550/arXiv.2410.21306.

[14] D. Van Strien, K. Beelen, M. Ardanuy, K. Hosseini, B. McGillivray, e G. Colavizza, “Assessing the Impact of OCR Quality on Downstream NLP Tasks”:, em Proceedings of the 12th International Conference on Agents and Artificial Intelligence, Valletta, Malta: SCITEPRESS - Science and Technology Publications, 2020, p. 484–496. doi: 10.5220/0009169004840496. DOI: https://doi.org/10.5220/0009169004840496

[15] E. Hsu, I. Malagaris, Y.-F. Kuo, R. Sultana, e K. Roberts, “Deep learning-based NLP data pipeline for EHR-scanned document information extraction”, JAMIA Open, v. 5, n. 2, p. ooac045, abr. 2022, doi: 10.1093/jamiaopen/ooac045. DOI: https://doi.org/10.1093/jamiaopen/ooac045

[16] J. Bright, F. Enock, S. Esnaashari, J. Francis, Y. Hashem, e D. Morgan, “Generative AI is already widespread in the public sector: evidence from a survey of UK public sector professionals”, Digit. Gov.: Res. Pract., v. 6, n. 1, p. 1–13, mar. 2025, doi: 10.1145/3700140. DOI: https://doi.org/10.1145/3700140

[17] A. Fang e J. Perkins, “Large language models (LLMs): Risks and policy implications”, MIT SPR, v. 5, p. 134–145, ago. 2024, doi: 10.38105/spr.3qrco9kp8x. DOI: https://doi.org/10.38105/spr.3qrco9kp8x

[18] A. Taeihagh, “Governance of Generative AI”, Policy and Society, v. 44, n. 1, p. 1–22, abr. 2025, doi: 10.1093/polsoc/puaf001. DOI: https://doi.org/10.1093/polsoc/puaf001

[19] I. Hjaltalin, “The strategic use of AI in the public sector: A public values analysis of national AI strategies”, The strategic use of AI in the public sector: A public values analysis of national AI strategies, 2024, [Online]. Disponível em: https://doi.org/10.1016/j.giq.2024.101914 DOI: https://doi.org/10.1016/j.giq.2024.101914

[20] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, e D. Pedreschi, “A Survey of Methods for Explaining Black Box Models”, ACM Comput. Surv., v. 51, n. 5, p. 1–42, set. 2019, doi: 10.1145/3236009. DOI: https://doi.org/10.1145/3236009

[21] M. T. Ribeiro, S. Singh, e C. Guestrin, “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier”, em Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco California USA: ACM, ago. 2016, p. 1135–1144. doi: 10.1145/2939672.2939778. DOI: https://doi.org/10.1145/2939672.2939778

[22] “Security and Privacy Controlsfor Information Systems and Organizations”, Security and Privacy Controlsfor Information Systems and Organizations, 2020, [Online]. Disponível em: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-53r5.pdf

[23] A. Cavoukian, “Privacy by Design The 7 Foundational Principles”.

[24] A. R. Hevner, S. T. March, J. Park, e S. Ram, “Design Science in Information Systems Research”.

[25] P. Runeson e M. Höst, “Guidelines for conducting and reporting case study research in software engineering”, Empir Software Eng, v. 14, n. 2, p. 131–164, abr. 2009, doi: 10.1007/s10664-008-9102-8. DOI: https://doi.org/10.1007/s10664-008-9102-8

[27] X. Zhong, J. Tang, e A. J. Yepes, “PubLayNet: largest dataset ever for document layout analysis”, 16 de agosto de 2019, arXiv: arXiv:1908.07836. doi: 10.48550/arXiv.1908.07836. DOI: https://doi.org/10.1109/ICDAR.2019.00166

[28] A. Vaswani et al., “Attention Is All You Need”, 2 de agosto de 2023, arXiv: arXiv:1706.03762. doi: 10.48550/arXiv.1706.03762.

[29] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks”.

[30] “PROV-DM: The PROV Data Model”, PROV-DM: The PROV Data Model, 2013, [Online]. Disponível em: https://www.w3.org/TR/prov-dm/

[31] FERRARI, D. G.; DE CASTRO SILVA, L. N, Introdução a mineração de dados. Editora Saraiva, 2021.

[32] T. Saito e M. Rehmsmeier, “The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets”, PLoS ONE, v. 10, n. 3, p. e0118432, mar. 2015, doi: 10.1371/journal.pone.0118432. DOI: https://doi.org/10.1371/journal.pone.0118432

[33] POWERS, D.M.W., “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION”, Journal of Machine Learning Technologies, 2011.

[34] J. Maynez, S. Narayan, B. Bohnet, e R. McDonald, “On Faithfulness and Factuality in Abstractive Summarization”, em Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online: Association for Computational Linguistics, 2020, p. 1906–1919. doi: 10.18653/v1/2020.acl-main.173. DOI: https://doi.org/10.18653/v1/2020.acl-main.173

[35] Cohen, Jacob, “A Coefficient of Agreement for Nominal Scales”, A Coefficient of Agreement for Nominal Scales, 1960. DOI: https://doi.org/10.1177/001316446002000104

[36] P. Missier, K. Belhajjame, e J. Cheney, “The W3C PROV family of specifications for modelling provenance metadata”, em Proceedings of the 16th International Conference on Extending Database Technology, Genoa Italy: ACM, mar. 2013, p. 773–776. doi: 10.1145/2452376.2452478. DOI: https://doi.org/10.1145/2452376.2452478

[37] Y. L. Simmhan, B. Plale, e D. Gannon, “A survey of data provenance in e-science”, SIGMOD Rec., v. 34, n. 3, p. 31–36, set. 2005, doi: 10.1145/1084805.1084812. DOI: https://doi.org/10.1145/1084805.1084812

Published

2025-09-25

Issue

Section

Articles

How to Cite

BEZERRA, Alessandro de Souza; LOPES, Luciane Cavalcante. LLM-BASED EXTERNAL CONTROL: EMPIRICAL EVALUATION OF PRUME AI. ARACÊ , [S. l.], v. 7, n. 9, p. e8409, 2025. DOI: 10.56238/arev7n9-271. Disponível em: https://periodicos.newsciencepubl.com/arace/article/view/8409. Acesso em: 5 dec. 2025.