Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models

10/28/2022
by   Xiaoman Pan, et al.
0

Fully-parametric language models generally require a huge number of model parameters to store the necessary knowledge for solving multiple natural language tasks in zero/few-shot settings. In addition, it is hard to adapt to the evolving world knowledge without the costly model re-training. In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory. Specifically, the external memory contains six different types of knowledge: entity, dictionary, commonsense, event, script, and causality knowledge. For each input instance, the KiC model adaptively selects a knowledge type and retrieves the most helpful pieces of knowledge. The input instance along with its knowledge augmentation is fed into a text-to-text model (e.g., T5) to generate the output answer, where both the input and the output are in natural language forms after prompting. Interestingly, we find that KiC can be identified as a special mixture-of-experts (MoE) model, where the knowledge selector plays the role of a router that is used to determine the sequence-to-expert assignment in MoE. This key observation inspires us to develop a novel algorithm for training KiC with an instance-adaptive knowledge selector. As a knowledge-rich semi-parametric language model, KiC only needs a much smaller parametric part to achieve superior zero-shot performance on unseen tasks. By evaluating on 40+ different tasks, we show that KiC_Large with 770M parameters easily outperforms large language models (LMs) that are 4-39x larger by a large margin. We also demonstrate that KiC exhibits emergent abilities at a much smaller model scale compared to the fully-parametric models.

READ FULL TEXT
research
10/01/2022

Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks

Although large language models have achieved impressive zero-shot abilit...
research
07/19/2023

Thrust: Adaptively Propels Large Language Models with External Knowledge

Although large-scale pre-trained language models (PTLMs) are shown to en...
research
02/09/2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

Augmenting pretrained language models (LMs) with a vision encoder (e.g.,...
research
02/10/2020

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

It has recently been observed that neural language models trained on uns...
research
05/29/2022

Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning

Prompt learning approaches have made waves in natural language processin...
research
08/26/2023

Planning with Logical Graph-based Language Model for Instruction Generation

Despite the superior performance of large language models to generate na...
research
09/13/2023

Unsupervised Contrast-Consistent Ranking with Language Models

Language models contain ranking-based knowledge and are powerful solvers...

Please sign up or login with your details

Forgot password? Click here to reset