How the fundamental concepts of mathematics and physics explain deep learning
Starting from the Fermat's principle of least action, which governs classical and quantum mechanics and from the theory of exterior differential forms, which governs the geometry of curved manifolds, we show how to derive the equations governing neural networks in an intrinsic, coordinate invariant way, where the differential dW of the parameters W appears as the cotangent pullback of the differential of the loss function L: dW = -eta f*(dL) where f denotes the action of the network, and eta the learning rate. To be covariant, these equations imply a layer metric which is instrumental in pretraining and explains the role of conjugation when using complex numbers. The differential formalism also clarifies the relation of the gradient descent optimizer with Aristotelian and Newtonian mechanics and why large learning steps break the logic of the linearization procedure. We hope that this formal presentation of the differential geometry of neural networks will encourage some physicists to dive into deep learning, and reciprocally, that the specialists of deep learning will better appreciate the close interconnection of their subject with the foundations of classical and quantum field theory.
READ FULL TEXT