Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages

09/02/2023
by   Shunjie Wang, et al.
0

Despite that Transformers perform well in NLP tasks, recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages. These findings motivated us to think about their implications in modeling natural language, which is hypothesized to be mildly context-sensitive. We test Transformer's ability to learn a variety of mildly context-sensitive languages of varying complexities, and find that they generalize well to unseen in-distribution data, but their ability to extrapolate to longer strings is worse than that of LSTMs. Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior, which may have helped the models solve the languages.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset