Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation

08/16/2023
by   Xinshuo Hu, et al.
0

Large language models (LLMs) have been widely used in various applications but are known to suffer from issues related to untruthfulness and toxicity. While parameter-efficient modules (PEMs) have demonstrated their effectiveness in equipping models with new skills, leveraging PEMs for deficiency unlearning remains underexplored. In this work, we propose a PEMs operation approach, namely Extraction-before-Subtraction (Ext-Sub), to enhance the truthfulness and detoxification of LLMs through the integration of “expert” PEM and “anti-expert” PEM. Remarkably, even anti-expert PEM possess valuable capabilities due to their proficiency in generating fabricated content, which necessitates language modeling and logical narrative competence. Rather than merely negating the parameters, our approach involves extracting and eliminating solely the deficiency capability within anti-expert PEM while preserving the general capabilities. To evaluate the effectiveness of our approach in terms of truthfulness and detoxification, we conduct extensive experiments on LLMs, encompassing additional abilities such as language modeling and mathematical reasoning. Our empirical results demonstrate that our approach effectively improves truthfulness and detoxification, while largely preserving the fundamental abilities of LLMs.

READ FULL TEXT

page 7

page 13

research
06/26/2023

Composing Parameter-Efficient Modules with Arithmetic Operations

As an efficient alternative to conventional full finetuning, parameter-e...
research
08/18/2023

Enhancing Reasoning Capabilities of Large Language Models: A Graph-Based Verification Approach

Large Language Models (LLMs) have showcased impressive reasoning capabil...
research
12/19/2022

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

Logical reasoning of text is an important ability that requires understa...
research
06/15/2022

DIRECTOR: Generator-Classifiers For Supervised Language Modeling

Current language models achieve low perplexity but their resulting gener...
research
06/08/2020

Mathematical Reasoning via Self-supervised Skip-tree Training

We examine whether self-supervised language modeling applied to mathemat...
research
07/18/2023

Towards A Unified Agent with Foundation Models

Language Models and Vision Language Models have recently demonstrated un...
research
02/01/2023

Anti-unification and Generalization: A Survey

Anti-unification (AU), also known as generalization, is a fundamental op...

Please sign up or login with your details

Forgot password? Click here to reset