UFO: Unified Feature Optimization

by   Teng Xi, et al.

This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models under real-world and large-scale scenarios, which requires a collection of multiple AI functions. UFO aims to benefit each single task with a large-scale pretraining on all tasks. Compared with the well known foundation model, UFO has two different points of emphasis, i.e., relatively smaller model size and NO adaptation cost: 1) UFO squeezes a wide range of tasks into a moderate-sized unified model in a multi-task learning manner and further trims the model size when transferred to down-stream tasks. 2) UFO does not emphasize transfer to novel tasks. Instead, it aims to make the trimmed model dedicated for one or more already-seen task. With these two characteristics, UFO provides great convenience for flexible deployment, while maintaining the benefits of large-scale pretraining. A key merit of UFO is that the trimming process not only reduces the model size and inference consumption, but also even improves the accuracy on certain tasks. Specifically, UFO considers the multi-task training and brings two-fold impact on the unified model: some closely related tasks have mutual benefits, while some tasks have conflicts against each other. UFO manages to reduce the conflicts and to preserve the mutual benefits through a novel Network Architecture Search (NAS) method. Experiments on a wide range of deep representation learning tasks (i.e., face recognition, person re-identification, vehicle re-identification and product retrieval) show that the model trimmed from UFO achieves higher accuracy than its single-task-trained counterpart and yet has smaller model size, validating the concept of UFO. Besides, UFO also supported the release of 17 billion parameters computer vision (CV) foundation model which is the largest CV model in the industry.


page 3

page 15


TBGC: Task-level Backbone-Oriented Gradient Clip for Multi-Task Foundation Model Learning

The AllInOne training paradigm squeezes a wide range of tasks into a uni...

Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations

Large-scale pretraining of visual representations has led to state-of-th...

SubTuning: Efficient Finetuning for Multi-Task Learning

Finetuning a pretrained model has become a standard approach for trainin...

Optical multi-task learning using multi-wavelength diffractive deep neural networks

Photonic neural networks are brain-inspired information processing techn...

Joint Learning of Neural Transfer and Architecture Adaptation for Image Recognition

Current state-of-the-art visual recognition systems usually rely on the ...

Tattoo Image Search at Scale: Joint Detection and Compact Representation Learning

The explosive growth of digital images in video surveillance and social ...

Portrait Interpretation and a Benchmark

We propose a task we name Portrait Interpretation and construct a datase...

Please sign up or login with your details

Forgot password? Click here to reset