WALKING THROUGH MILLIONS OF DIMENSIONS
DOI:
https://doi.org/10.56238/arev8n3-119Keywords:
Machine Learning, Error Geometry, High Dimension, Optimization, Bayesian Inference, Socio-Technical SystemsAbstract
This work proposes a geometric interpretation of machine learning as a navigation process in a high-dimensional parameter space, whose relief is induced by the error function. Neural networks are analyzed as systems that explore an abstract optimization landscape, where architecture, data, training algorithms, and sources of uncertainty shape the learning topology. The approach establishes connections between optimization dynamics, generalization capacity, and Bayesian inference, suggesting that such phenomena can be understood under a unified geometric framework. Beyond the technical domain, it discusses how this perspective influences the interpretation, governance, and socio-technical impact of artificial intelligence systems, offering an integrated conceptual language to analyze their performance in human and computational contexts.
Downloads
References
[1] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
[2] Christopher M. Bishop. Pattern Recognition and Machine Learning. Sprin- ger, 2006.
[3] Shun-ichi Amari. Information Geometry and Its Applications. Springer, 2016.
[4] Vladimir Vapnik. Statistical Learning Theory. Wiley, 1998.
[5] Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the Loss Landscape of Neural Nets. Advances in Neural Information Processing Systems, 2018.
[6] Sepp Hochreiter and Jürgen Schmidhuber. Flat Minima. Neural Computation, 1997.
[7] Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. International Conference on Learning Representations, 2017.
[8] Pratik Chaudhari and Stefano Soatto. Stochastic Gradient Descent Per- forms Variational Inference. arXiv preprint arXiv:1710.11029, 2018.
[9] Stephan Mandt, Matthew D. Hoffman, and David M. Blei. Stochastic Gradient Descent as Approximate Bayesian Inference. Journal of Machine Learning Research, 2017.
[10] Samuel L. Smith and Quoc V. Le. A Bayesian Perspective on Generalization and Stochastic Gradient Descent. International Conference on Learning Representations, 2020.
[11] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations, 2015.
[12] Tijmen Tieleman and Geoffrey Hinton. Lecture 6.5 — RMSProp: Divide the Gradient by a Running Average of Its Recent Magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
[13] Sebastian Ruder. An Overview of Gradient Descent Optimization Algorithms. arXiv preprint arXiv:1609.04747, 2016.
[14] Andrew M. Saxe, James L. McClelland, and Surya Ganguli. Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks. International Conference on Learning Representations, 2014.
[15] Connor Shorten and Taghi M. Khoshgoftaar. A Survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 2019.
[16] Yann N. Dauphin, Razvan Pascanu, Çağlar Gülçehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in Neural Information Processing Systems (NeurIPS), 2014.
[17] A. Gal and Z. Ghahramani, “Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning,” in International Conference on Machine Learning, 2016.
[18] A. Kendall and Y. Gal, “What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?” in Advances in Neural Information Processing Systems, 2017.
[19] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra,“Weight Uncertainty in Neural Networks,” in International Conference on Machine Learning (ICML), 2015.
[20] T. Garipov, P. Izmailov, D. Podoprikhin, D. Vetrov, and A. G. Wilson,“Loss Surfaces, Mode Connectivity, and Fast Ensembling,” in Advances in Neural Information Processing Systems, 2018.
[21] F. Draxler, K. Veschgini, M. Salmhofer, and F. Hamprecht,“Essentially No Barriers in Neural Network Energy Landscape,” in International Conference on Machine Learning, 2018.
[22] S. Hochreiter and J. Schmidhuber,“Long Short-Term Memory,” Neural Computation, 1997.
[23] K. Cho, B. van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares,H. Schwenk, and Y. Bengio,“Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation,” in Proceedings of EMNLP, 2014.