Discrete Prompt Compression with Reinforcement Learning

08/17/2023
by   Hoyoun Jung, et al.
0

Instruction-tuned Language Models (LMs) are widely used by users to address various problems with task-specific prompts. Constraints associated with the context window length and computational costs encourage the development of compressed prompts. Existing methods rely heavily on training embeddings, which are designed to accommodate multiple token meanings. This presents challenges in terms of interpretability, a fixed number of embedding tokens, reusability across different LMs, and inapplicability when interacting with black-box APIs. This study proposes prompt compression with reinforcement learning (PCRL), a novel discrete prompt compression method that addresses these issues. PCRL employs a computationally efficient policy network that directly edits prompts. The PCRL training approach can be flexibly applied to various types of LMs, as well as decoder-only and encoder-decoder architecture, and can be trained without gradient access to LMs or labeled data. PCRL achieves an average reduction of 24.6 preserving performance. Further, we demonstrate that the learned policy can be transferred to larger LMs, and through various analyses, we aid the understanding of token importance within prompts.

READ FULL TEXT
research
04/17/2023

Learning to Compress Prompts with Gist Tokens

Prompting is now the primary way to utilize the multitask capabilities o...
research
07/01/2020

Personalization of Hearing Aid Compression by Human-In-Loop Deep Reinforcement Learning

Existing prescriptive compression strategies used in hearing aid fitting...
research
11/25/2020

Auto Graph Encoder-Decoder for Model Compression and Network Acceleration

Model compression aims to deploy deep neural networks (DNN) to mobile de...
research
07/02/2023

TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition

High-dimensional token embeddings underpin Large Language Models (LLMs),...
research
02/08/2019

Architecture Compression

In this paper we propose a novel approach to model compression termed Ar...
research
03/01/2022

E-LANG: Energy-Based Joint Inferencing of Super and Swift Language Models

Building huge and highly capable language models has been a trend in the...
research
11/08/2022

Efficient Compressed Ratio Estimation using Online Sequential Learning for Edge Computing

Owing to the widespread adoption of the Internet of Things, a vast amoun...

Please sign up or login with your details

Forgot password? Click here to reset