Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning

by   Mingyang Zhang, et al.

Large pre-trained models (LPMs), such as LLaMA and ViT-G, have shown exceptional performance across various tasks. Although parameter-efficient fine-tuning (PEFT) has emerged to cheaply fine-tune these large models on downstream tasks, their deployment is still hindered by the vast model scale and computational costs. Neural network pruning offers a solution for model compression by removing redundant parameters, but most existing methods rely on computing parameter gradients. However, obtaining the gradients is computationally prohibitive for LPMs, which necessitates the exploration of alternative approaches. To this end, we propose a unified framework for efficient fine-tuning and deployment of LPMs, termed LoRAPrune. We first design a PEFT-aware pruning criterion, which utilizes the values and gradients of Low-Rank Adaption (LoRA), rather than the gradients of pre-trained parameters for importance estimation. We then propose an iterative pruning procedure to remove redundant parameters while maximizing the advantages of PEFT. Thus, our LoRAPrune delivers an accurate, compact model for efficient inference in a highly cost-effective manner. Experimental results on various tasks demonstrate that our method achieves state-of-the-art results. For instance, in the VTAB-1k benchmark, LoRAPrune utilizes only 0.76 outperforms magnitude and movement pruning methods by a significant margin, achieving a mean Top-1 accuracy that is 5.7 Moreover, our approach achieves comparable performance to PEFT methods, highlighting its efficacy in delivering high-quality results while benefiting from the advantages of pruning.


page 1

page 2

page 3

page 4


Pruning Pre-trained Language Models Without Fine-Tuning

To overcome the overparameterized problem in Pre-trained Language Models...

IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning

With the increasing size of pre-trained language models (PLMs), fine-tun...

Towards a Unified View of Parameter-Efficient Transfer Learning

Fine-tuning large pre-trained language models on downstream tasks has be...

Weight Reparametrization for Budget-Aware Network Pruning

Pruning seeks to design lightweight architectures by removing redundant ...

Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for Downstream Tasks

As a few large-scale pre-trained models become the major choices of vari...

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

Large pre-trained transformers have been receiving explosive attention i...

Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models

Model compression by way of parameter pruning, quantization, or distilla...

Please sign up or login with your details

Forgot password? Click here to reset