Deep Model Assembling

by   Zanlin Ni, et al.

Large deep learning models have achieved remarkable success in many scenarios. However, training large models is usually challenging, e.g., due to the high computational cost, the unstable and painfully slow optimization procedure, and the vulnerability to overfitting. To alleviate these problems, this work studies a divide-and-conquer strategy, i.e., dividing a large model into smaller modules, training them independently, and reassembling the trained modules to obtain the target model. This approach is promising since it avoids directly training large models from scratch. Nevertheless, implementing this idea is non-trivial, as it is difficult to ensure the compatibility of the independently trained modules. In this paper, we present an elegant solution to address this issue, i.e., we introduce a global, shared meta model to implicitly link all the modules together. This enables us to train highly compatible modules that collaborate effectively when they are assembled together. We further propose a module incubation mechanism that enables the meta model to be designed as an extremely shallow network. As a result, the additional overhead introduced by the meta model is minimalized. Though conceptually simple, our method significantly outperforms end-to-end (E2E) training in terms of both final accuracy and training efficiency. For example, on top of ViT-Huge, it improves the accuracy by 2.7 baseline on ImageNet-1K, while saving the training cost by 43 Code is available at


MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler

Imbalanced learning (IL), i.e., learning unbiased models from class-imba...

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Due to the need to store the intermediate activations for back-propagati...

Feature-compatible Progressive Learning for Video Copy Detection

Video Copy Detection (VCD) has been developed to identify instances of u...

Learning Compatible Embeddings

Achieving backward compatibility when rolling out new models can highly ...

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

Recent works have shown that the computational efficiency of video recog...

Training Deep AutoEncoders for Collaborative Filtering

This paper proposes a novel model for the rating prediction task in reco...

Bi-Real Net V2: Rethinking Non-linearity for 1-bit CNNs and Going Beyond

Binary neural networks (BNNs), where both weights and activations are bi...

Please sign up or login with your details

Forgot password? Click here to reset