CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

04/13/2022
by   Chen Liang, et al.
0

Model ensemble is a popular approach to produce a low-variance and well-generalized model. However, it induces large memory and inference costs, which are often not affordable for real-world deployment. Existing work has resorted to sharing weights among models. However, when increasing the proportion of the shared weights, the resulting models tend to be similar, and the benefits of using model ensemble diminish. To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO. Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity. Meanwhile, we apply a prediction consistency regularizer across the perturbed models to control the variance due to the model diversity. Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model. Specifically, CAMERO outperforms the standard ensemble of 8 BERT-base models on the GLUE benchmark by 0.7 with a significantly smaller model size (114.2M vs. 880.6M).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2023

Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models

Foundational Language Models (FLMs) have advanced natural language proce...
research
10/27/2021

Diversity Matters When Learning From Ensembles

Deep ensembles excel in large-scale image classification tasks both in t...
research
06/01/2023

Consistency-guided Prompt Learning for Vision-Language Models

We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tun...
research
06/05/2023

Input gradient diversity for neural network ensembles

Deep Ensembles (DEs) demonstrate improved accuracy, calibration and robu...
research
01/29/2023

Towards Inference Efficient Deep Ensemble Learning

Ensemble methods can deliver surprising performance gains but also bring...
research
10/08/2021

Speeding up Deep Model Training by Sharing Weights and Then Unsharing

We propose a simple and efficient approach for training the BERT model. ...
research
10/24/2022

Visualizing the Obvious: A Concreteness-based Ensemble Model for Noun Property Prediction

Neural language models encode rich knowledge about entities and their re...

Please sign up or login with your details

Forgot password? Click here to reset