Transformers from Spin Models: Approximate Free Energy Minimization

In a previous project on Deep Implicit Attention: A Mean-Field Theory Perspective on Attention Mechanisms, we introduced a mean-field theory perspective on transformer modules. We showed how their outputs can be understood as mean-field spin expectation values of simple Ising-like vector-spin systems. Physically, the process of training a transformer module can be interpreted as a classical many-body system modulating its behavior by learning how to respond to being probed by incoming data.


