HistAlign: Improving Context Dependency in Language Generation by Aligning with History

05/08/2023
by   David Wan, et al.
3

Language models (LMs) can generate hallucinations and incoherent outputs, which highlights their weak context dependency. Cache-LMs, which augment LMs with a memory of recent history, can increase context dependency and have shown remarkable performance in diverse language generation tasks. However, we find that even with training, the performance gain stemming from the cache component of current cache-LMs is suboptimal due to the misalignment between the current hidden states and those stored in the memory. In this work, we present HistAlign, a new training approach to ensure good cache alignment such that the model receives useful signals from the history. We first prove our concept on a simple and synthetic task where the memory is essential for correct predictions, and we show that the cache component of HistAlign is better aligned and improves overall performance. Next, we evaluate HistAlign on diverse downstream language generation tasks, including prompt continuation, abstractive summarization, and data-to-text. We demonstrate that HistAlign improves text coherence and faithfulness in open-ended and conditional generation settings respectively. HistAlign is also generalizable across different model families, showcasing its strength in improving context dependency of LMs in diverse scenarios. Our code is publicly available at https://github.com/meetdavidwan/histalign

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2016

Improving Neural Language Models with a Continuous Cache

We propose an extension to neural network language models to adapt their...
research
07/07/2017

Duty to Delete on Non-Volatile Memory

We firstly suggest new cache policy applying the duty to delete invalid ...
research
04/06/2022

Knowledge Infused Decoding

Pre-trained language models (LMs) have been shown to memorize a substant...
research
06/24/2023

H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Large Language Models (LLMs), despite their recent impressive accomplish...
research
11/07/2017

Unbounded cache model for online language modeling with open vocabulary

Recently, continuous cache models were proposed as extensions to recurre...
research
04/18/2021

Extract, Denoise, and Enforce: Evaluating and Predicting Lexical Constraints for Conditional Text Generation

Recently, pre-trained language models (PLMs) have dominated conditional ...
research
06/30/2022

Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

Previous Part-Of-Speech (POS) induction models usually assume certain in...

Please sign up or login with your details

Forgot password? Click here to reset