FocusFormer: Focusing on What We Need via Architecture Sampler

by   Jing Liu, et al.

Vision Transformers (ViTs) have underpinned the recent breakthroughs in computer vision. However, designing the architectures of ViTs is laborious and heavily relies on expert knowledge. To automate the design process and incorporate deployment flexibility, one-shot neural architecture search decouples the supernet training and architecture specialization for diverse deployment scenarios. To cope with an enormous number of sub-networks in the supernet, existing methods treat all architectures equally important and randomly sample some of them in each update step during training. During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment. In this paper, we devise a simple yet effective method, called FocusFormer, to bridge such a gap. To this end, we propose to learn an architecture sampler to assign higher sampling probabilities to those architectures on the Pareto frontier under different resource constraints during supernet training, making them sufficiently optimized and hence improving their performance. During specialization, we can directly use the well-trained architecture sampler to obtain accurate architectures satisfying the given resource constraint, which significantly improves the search efficiency. Extensive experiments on CIFAR-100 and ImageNet show that our FocusFormer is able to improve the performance of the searched architectures while significantly reducing the search cost. For example, on ImageNet, our FocusFormer-Ti with 1.4G FLOPs outperforms AutoFormer-Ti by 0.5 the Top-1 accuracy.


page 1

page 2

page 3

page 4


Pareto-aware Neural Architecture Generation for Diverse Computational Budgets

Designing feasible and effective architectures under diverse computation...

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling

Neural architecture search (NAS) has shown great promise designing state...

Pareto-Frontier-aware Neural Architecture Generation for Diverse Budgets

Designing feasible and effective architectures under diverse computation...

CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment

The emergence of CNNs in mainstream deployment has necessitated methods ...

How to Simplify Search: Classification-wise Pareto Evolution for One-shot Neural Architecture Search

In the deployment of deep neural models, how to effectively and automati...

Breaking the Architecture Barrier: A Method for Efficient Knowledge Transfer Across Networks

Transfer learning is a popular technique for improving the performance o...

TOFA: Transfer-Once-for-All

Weight-sharing neural architecture search aims to optimize a configurabl...

Please sign up or login with your details

Forgot password? Click here to reset