Log-Precision Transformers are Constant-Depth Uniform Threshold Circuits
We prove that transformer neural networks with logarithmic precision in the input length (and where the feedforward subnetworks are computable using linear space in their input length) can be simulated by constant-depth uniform threshold circuits. Thus, such transformers only recognize formal languages in 𝖳𝖢^0, the class of languages defined by constant-depth, poly-size threshold circuits. This demonstrates a connection between a practical claim in NLP and a theoretical conjecture in computational complexity theory: "attention is all you need" (Vaswani et al., 2017), i.e., transformers are capable of all efficient computation, only if all efficiently computable problems can be solved with log space, i.e., 𝖫 = 𝖯. We also construct a transformer that can evaluate any constant-depth threshold circuit on any input, proving that transformers can follow instructions that are representable in 𝖳𝖢^0.
READ FULL TEXT