LM-Switch: Lightweight Language Model Conditioning in Word Embedding Space

by   Chi Han, et al.

In recent years, large language models (LMs) have achieved remarkable progress across various natural language processing tasks. As pre-training and fine-tuning are costly and might negatively impact model performance, it is desired to efficiently adapt an existing model to different conditions such as styles, sentiments or narratives, when facing different audiences or scenarios. However, efficient adaptation of a language model to diverse conditions remains an open challenge. This work is inspired by the observation that text conditions are often associated with selection of certain words in a context. Therefore we introduce LM-Switch, a theoretically grounded, lightweight and simple method for generative language model conditioning. We begin by investigating the effect of conditions in Hidden Markov Models (HMMs), and establish a theoretical connection with language model. Our finding suggests that condition shifts in HMMs are associated with linear transformations in word embeddings. LM-Switch is then designed to deploy a learnable linear factor in the word embedding space for language model conditioning. We show that LM-Switch can model diverse tasks, and achieves comparable or better performance compared with state-of-the-art baselines in LM detoxification and generation control, despite requiring no more than 1 with baselines and little extra time overhead compared with base LMs. It is also able to learn from as few as a few sentences or one document. Moreover, a learned LM-Switch can be transferred to other LMs of different sizes, achieving a detoxification performance similar to the best baseline. We will make our code available to the research community following publication.


FRAGE: Frequency-Agnostic Word Representation

Continuous word representation (aka word embedding) is a basic building ...

Representing Affect Information in Word Embeddings

A growing body of research in natural language processing (NLP) and natu...

Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval

Recent research demonstrates the effectiveness of using fine-tuned langu...

Rotations and Interpretability of Word Embeddings: the Case of the Russian Language

Consider a continuous word embedding model. Usually, the cosines between...

Frequency-aware Dimension Selection for Static Word Embedding by Mixed Product Distance

Static word embedding is still useful, particularly for context-unavaila...

Using a Large Language Model to Control Speaking Style for Expressive TTS

Appropriate prosody is critical for successful spoken communication. Con...

Please sign up or login with your details

Forgot password? Click here to reset