Deep Implicit Attention: A Mean-Field Theory Perspective on Attention Mechanisms

In this project, we model attention in terms of the collective response of a statistical-mechanical system. We consider a vector generalization of an Ising-like spin system and treat incoming data as applied magnetic fields and outputs of attention modules as spin expectation values in order to rephrase attention as an (inner-loop) fixed-point optimization.


There's unfortunately not much to read here yet...

