CAMINHANDO POR MILHÕES DE DIMENSÕES
DOI:
https://doi.org/10.56238/arev8n3-119Palavras-chave:
Aprendizado de Máquina, Geometria do Erro, Alta Dimensão, Otimização, Inferência Bayesiana, Sistemas SociotécnicosResumo
Este trabalho propõe uma interpretação geométrica do aprendizado de máquina como um processo de navegação em um espaço de parâmetros de alta dimensão, cujo relevo é induzido pela função de erro. Redes neurais são analisadas como sistemas que exploram uma paisagem abstrata de otimização, onde arquitetura, dados, algoritmos de treinamento e fontes de incerteza moldam a topologia da aprendizagem. A abordagem estabelece conexões entre dinâmica de otimização, capacidade de generalização e inferência bayesiana, sugerindo que tais fenômenos podem ser compreendidos sob uma estrutura geométrica unificada. Para além do domínio técnico, discute-se como essa perspectiva influencia a interpretação, a governança e o impacto sociotécnico de sistemas de inteligência artificial, oferecendo uma linguagem conceitual integrada para analisar sua atuação em contextos humanos e computacionais.
Downloads
Referências
[1] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
[2] Christopher M. Bishop. Pattern Recognition and Machine Learning. Sprin- ger, 2006.
[3] Shun-ichi Amari. Information Geometry and Its Applications. Springer, 2016.
[4] Vladimir Vapnik. Statistical Learning Theory. Wiley, 1998.
[5] Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the Loss Landscape of Neural Nets. Advances in Neural Information Processing Systems, 2018.
[6] Sepp Hochreiter and Jürgen Schmidhuber. Flat Minima. Neural Computation, 1997.
[7] Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. International Conference on Learning Representations, 2017.
[8] Pratik Chaudhari and Stefano Soatto. Stochastic Gradient Descent Per- forms Variational Inference. arXiv preprint arXiv:1710.11029, 2018.
[9] Stephan Mandt, Matthew D. Hoffman, and David M. Blei. Stochastic Gradient Descent as Approximate Bayesian Inference. Journal of Machine Learning Research, 2017.
[10] Samuel L. Smith and Quoc V. Le. A Bayesian Perspective on Generalization and Stochastic Gradient Descent. International Conference on Learning Representations, 2020.
[11] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations, 2015.
[12] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5 — RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
[13] Sebastian Ruder. An Overview of Gradient Descent Optimization Algorithms. arXiv preprint arXiv:1609.04747, 2016.
[14] Andrew M. Saxe, James L. McClelland, and Surya Ganguli. Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks. International Conference on Learning Representations, 2014.
[15] Connor Shorten and Taghi M. Khoshgoftaar. A Survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 2019.
[16] Yann N. Dauphin, Razvan Pascanu, Çağlar Gülçehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in Neural Information Processing Systems (NeurIPS), 2014.
[17] A. Gal and Z. Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” in International Conference on Machine Learning, 2016.
[18] A. Kendall and Y. Gal, “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?” in Advances in Neural Information Processing Systems, 2017.
[19] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra,“Weight Uncertainty in Neural Networks,” in International Conference on Machine Learning (ICML), 2015.
[20] T. Garipov, P. Izmailov, D. Podoprikhin, D. Vetrov, and A. G. Wilson,“Loss Surfaces, Mode Connectivity, and Fast Ensembling,” in Advances in Neural Information Processing Systems, 2018.
[21] F. Draxler, K. Veschgini, M. Salmhofer, and F. Hamprecht,“Essentially No Barriers in Neural Network Energy Landscape,” in International Conference on Machine Learning, 2018.
[22] S. Hochreiter and J. Schmidhuber,“Long Short-Term Memory,” Neural Computation, 1997.
[23] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares,H. Schwenk, and Y. Bengio,“Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation,” in Proceedings of EMNLP, 2014.