Grokking is a phenomenon where a model trained on an algorithmic task fi...
Many current NLP systems are built from language models trained to optim...
Characterizing the implicit structure of the computation within neural
n...
Language models are often trained on text alone, without additional
grou...
We prove that transformer neural networks with logarithmic precision in ...
One way to interpret the behavior of a blackbox recurrent neural network...
Transformers have become a standard architecture for many NLP problems. ...
Language models trained on billions of tokens have recently led to
unpre...
Much recent work in NLP has documented dataset artifacts, bias, and spur...
NLP is deeply intertwined with the formal study of language, both
concep...
The capacity of neural networks like the widely adopted transformer is k...
The COVID-19 Open Research Dataset (CORD-19) is a growing resource of
sc...
We develop a formal hierarchy of the expressive capacity of RNN
architec...
Counter machines have achieved a newfound relevance to the field of natu...
We train a diachronic long short-term memory (LSTM) part-of-speech tagge...
This work attempts to explain the types of computation that neural netwo...
Neural network architectures have been augmented with differentiable sta...
This paper analyzes the behavior of stack-augmented recurrent neural net...
We present a graph-based Tree Adjoining Grammar (TAG) parser that uses
B...