Mod-Squad: Designing Mixture of Experts As Modular Multi-Task Learners

by   Zitian Chen, et al.

Optimization in multi-task learning (MTL) is more challenging than single-task learning (STL), as the gradient from different tasks can be contradictory. When tasks are related, it can be beneficial to share some parameters among them (cooperation). However, some tasks require additional parameters with expertise in a specific type of data or discrimination (specialization). To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad'). This structure allows us to formalize cooperation and specialization as the process of matching experts and tasks. We optimize this matching process during the training of a single model. Specifically, we incorporate mixture of experts (MoE) layers into a transformer model, with a new loss that incorporates the mutual dependence between tasks and experts. As a result, only a small set of experts are activated for each task. This prevents the sharing of the entire backbone model between all tasks, which strengthens the model, especially when the training set size and the number of tasks scale up. More interestingly, for each task, we can extract the small set of experts as a standalone model that maintains the same performance as the large model. Extensive experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.


page 8

page 13

page 14

page 15


Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners

Traditional multi-task learning (MTL) methods use dense networks that us...

M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

Multi-task learning (MTL) encapsulates multiple learned tasks in a singl...

Diversified Dynamic Routing for Vision Tasks

Deep learning models for vision tasks are trained on large datasets unde...

Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts

Accurate estimation of multiple quality variables is critical for buildi...

Eliciting Transferability in Multi-task Learning with Task-level Mixture-of-Experts

Recent work suggests that transformer models are capable of multi-task l...

RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction

In subcellular biological research, fluorescence staining is a key techn...

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

Vision Transformers (ViTs) have shown impressive performance and have be...

Please sign up or login with your details

Forgot password? Click here to reset