Do Transformers use variable binding?

by   Tommi Gröndahl, et al.

Increasing the explainability of deep neural networks (DNNs) requires evaluating whether they implement symbolic computation. One central symbolic capacity is variable binding: linking an input value to an abstract variable held in system-internal memory. Prior work on the computational abilities of DNNs has not resolved the question of whether their internal processes involve variable binding. We argue that the reason for this is fundamental, inherent in the way experiments in prior work were designed. We provide the first systematic evaluation of the variable binding capacities of the state-of-the-art Transformer networks BERT and RoBERTa. Our experiments are designed such that the model must generalize a rule across disjoint subsets of the input vocabulary, and cannot rely on associative pattern matching alone. The results show a clear discrepancy between classification and sequence-to-sequence tasks: BERT and RoBERTa can easily learn to copy or reverse strings even when trained on task-specific vocabularies that are switched in the test set; but both models completely fail to generalize across vocabularies in similar sequence classification tasks. These findings indicate that the effectiveness of Transformers in sequence modelling may lie in their extensive use of the input itself as an external "memory" rather than network-internal symbolic operations involving variable binding. Therefore, we propose a novel direction for future work: augmenting the inputs available to circumvent the lack of network-internal variable binding.


How to Stop Off-the-Shelf Deep Neural Networks from Overthinking

While deep neural networks (DNNs) can perform complex classification tas...

Neural Sequence-to-grid Module for Learning Symbolic Rules

Logical reasoning tasks over symbols, such as learning arithmetic operat...

BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data

Deep neural networks (DNNs) used for brain-computer-interface (BCI) clas...

B-cos Alignment for Inherently Interpretable CNNs and Vision Transformers

We present a new direction for increasing the interpretability of deep n...

Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

Harnessing the statistical power of neural networks to perform language ...

Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics

Neural sequence models trained with maximum likelihood estimation have l...

Analyzing the Nuances of Transformers' Polynomial Simplification Abilities

Symbolic Mathematical tasks such as integration often require multiple w...

Please sign up or login with your details

Forgot password? Click here to reset