Recycling diverse models for out-of-distribution generalization

by   Alexandre Ramé, et al.

Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: download a copy of a foundation model, and fine-tune it using some in-house data about the target task of interest. Consequently, the Internet is swarmed by a handful of foundation models fine-tuned on many diverse tasks. Yet, these individual fine-tunings often lack strong generalization and exist in isolation without benefiting from each other. In our opinion, this is a missed opportunity, as these specialized models contain diverse features. Based on this insight, we propose model recycling, a simple strategy that leverages multiple fine-tunings of the same foundation model on diverse auxiliary tasks, and repurposes them as rich and diverse initializations for the target task. Specifically, model recycling fine-tunes in parallel each specialized model on the target task, and then averages the weights of all target fine-tunings into a final model. Empirically, we show that model recycling maximizes model diversity by benefiting from diverse auxiliary tasks, and achieves a new state of the art on the reference DomainBed benchmark for out-of-distribution generalization. Looking forward, model recycling is a contribution to the emerging paradigm of updatable machine learning where, akin to open-source software development, the community collaborates to incrementally and reliably update machine learning models.


page 1

page 2

page 3

page 4


Llama 2: Open Foundation and Fine-Tuned Chat Models

In this work, we develop and release Llama 2, a collection of pretrained...

Jack and Masters of All Trades: One-Pass Learning of a Set of Model Sets from Foundation AI Models

For deep learning, size is power. Massive neural nets trained on broad d...

AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Large language models excel in many human-language tasks but often falte...

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

Foundation models have shown outstanding performance and generalization ...

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Foundation models are first pre-trained on vast unsupervised datasets an...

On the power of foundation models

With infinitely many high-quality data points, infinite computational po...

Big Learning: A Universal Machine Learning Paradigm?

Recent breakthroughs based on big/foundation models reveal a vague avenu...

Please sign up or login with your details

Forgot password? Click here to reset