William Merrill

research

∙ 03/21/2023

A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

Grokking is a phenomenon where a model trained on an algorithmic task fi...

0 William Merrill, et al. ∙

research

∙ 10/14/2022

Transparency Helps Reveal When Language Models Learn Meaning

Many current NLP systems are built from language models trained to optim...

0 Zhaofeng Wu, et al. ∙

research

∙ 10/06/2022

Transformers Can Be Expressed In First-Order Logic with Majority

Characterizing the implicit structure of the computation within neural n...

0 William Merrill, et al. ∙

research

∙ 09/26/2022

Entailment Semantics Can Be Extracted from an Ideal Language Model

Language models are often trained on text alone, without additional grou...

0 William Merrill, et al. ∙

research

∙ 07/02/2022

Log-Precision Transformers are Constant-Depth Uniform Threshold Circuits

We prove that transformer neural networks with logarithmic precision in ...

0 William Merrill, et al. ∙

research

∙ 01/28/2022

Extracting Finite Automata from RNNs Using State Merging

One way to interpret the behavior of a blackbox recurrent neural network...

0 William Merrill, et al. ∙

research

∙ 06/30/2021

On the Power of Saturated Transformers: A View from Circuit Complexity

Transformers have become a standard architecture for many NLP problems. ...

0 William Merrill, et al. ∙

research

∙ 04/22/2021

Provable Limitations of Acquiring Meaning from Ungrounded Form: What will Future Language Models Understand?

Language models trained on billions of tokens have recently led to unpre...

0 William Merrill, et al. ∙

research

∙ 04/17/2021

Competency Problems: On Finding and Removing Artifacts in Language Data

Much recent work in NLP has documented dataset artifacts, bias, and spur...

0 Matt Gardner, et al. ∙

research

∙ 02/19/2021

Formal Language Theory Meets Modern NLP

NLP is deeply intertwined with the formal study of language, both concep...

0 William Merrill, et al. ∙

research

∙ 10/19/2020

Parameter Norm Growth During Training of Transformers

The capacity of neural networks like the widely adopted transformer is k...

0 William Merrill, et al. ∙

research

∙ 04/22/2020

CORD-19: The COVID-19 Open Research Dataset

The COVID-19 Open Research Dataset (CORD-19) is a growing resource of sc...

0 Lucy Lu Wang, et al. ∙

research

∙ 04/18/2020

A Formal Hierarchy of RNN Architectures

We develop a formal hierarchy of the expressive capacity of RNN architec...

0 William Merrill, et al. ∙

research

∙ 04/15/2020

On the Linguistic Capacity of Real-Time Counter Automata

Counter machines have achieved a newfound relevance to the field of natu...

0 William Merrill, et al. ∙

research

∙ 06/04/2019

Detecting Syntactic Change Using a Neural Part-of-Speech Tagger

We train a diachronic long short-term memory (LSTM) part-of-speech tagge...

0 William Merrill, et al. ∙

research

∙ 06/04/2019

Sequential Neural Networks as Automata

This work attempts to explain the types of computation that neural netwo...

0 William Merrill, et al. ∙

research

∙ 06/04/2019

Finding Syntactic Representations in Neural Stacks

Neural network architectures have been augmented with differentiable sta...

0 William Merrill, et al. ∙

research

∙ 09/08/2018

Context-Free Transductions with Neural Stacks

This paper analyzes the behavior of stack-augmented recurrent neural net...

0 Yiding Hao, et al. ∙

research

∙ 04/18/2018

End-to-end Graph-based TAG Parsing with Neural Networks

We present a graph-based Tree Adjoining Grammar (TAG) parser that uses B...

0 Jungo Kasai, et al. ∙

William Merrill

Featured Co-authors

Sign in with Google

Consider DeepAI Pro