Arithmetic with Language Models: from Memorization to Computation

08/02/2023
by   Davide Maltoni, et al.
0

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language model, trained to predict the next token, can perform arithmetic computations generalizing beyond training data. Binary addition and multiplication constitute a good testbed for this purpose, since they require a very small vocabulary and exhibit relevant input/output discontinuities making smooth input interpolation ineffective for novel data. We successfully trained a light language model to learn these tasks and ran a number of experiments to investigate the extrapolation capabilities and internal information processing. Our findings support the hypotheses that the language model works as an Encoding-Regression-Decoding machine where the computation takes place in the value space once the input token representation is mapped to an appropriate internal representation.

READ FULL TEXT

page 3

page 9

research
07/28/2023

The Hydra Effect: Emergent Self-repair in Language Model Computations

We investigate the internal structure of language model computations usi...
research
07/07/2023

Teaching Arithmetic to Small Transformers

Large language models like GPT-4 exhibit emergent capabilities across ge...
research
01/31/2023

Numeracy from Literacy: Data Science as an Emergent Skill from Large Language Models

Large language models (LLM) such as OpenAI's ChatGPT and GPT-3 offer uni...
research
07/07/2023

Discovering Variable Binding Circuitry with Desiderata

Recent work has shown that computation in language models may be human-u...
research
09/29/2020

Improving Low Compute Language Modeling with In-Domain Embedding Initialisation

Many NLP applications, such as biomedical data and technical support, ha...
research
09/06/2023

GPT Can Solve Mathematical Problems Without a Calculator

Previous studies have typically assumed that large language models are u...
research
12/05/2022

Building Metadata Inference Using a Transducer Based Language Model

Solving the challenges of automatic machine translation of Building Auto...

Please sign up or login with your details

Forgot password? Click here to reset