Generalized Latency Performance Estimation for Once-For-All Neural Architecture Search

by   Muhtadyuzzaman Syed, et al.

Neural Architecture Search (NAS) has enabled the possibility of automated machine learning by streamlining the manual development of deep neural network architectures defining a search space, search strategy, and performance estimation strategy. To solve the need for multi-platform deployment of Convolutional Neural Network (CNN) models, Once-For-All (OFA) proposed to decouple Training and Search to deliver a one-shot model of sub-networks that are constrained to various accuracy-latency tradeoffs. We find that the performance estimation strategy for OFA's search severely lacks generalizability of different hardware deployment platforms due to single hardware latency lookup tables that require significant amount of time and manual effort to build beforehand. In this work, we demonstrate the framework for building latency predictors for neural network architectures to address the need for heterogeneous hardware support and reduce the overhead of lookup tables altogether. We introduce two generalizability strategies which include fine-tuning using a base model trained on a specific hardware and NAS search space, and GPU-generalization which trains a model on GPU hardware parameters such as Number of Cores, RAM Size, and Memory Bandwidth. With this, we provide a family of latency prediction models that achieve over 50 compared to with ProxylessNAS. We also show that the use of these latency predictors match the NAS performance of the lookup table baseline approach if not exceeding it in certain cases.


page 1

page 2

page 3

page 4


Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search

Many hardware-aware neural architecture search (NAS) methods have been d...

MAPLE: Microprocessor A Priori for Latency Estimation

Modern deep neural networks must demonstrate state-of-the-art accuracy w...

LETI: Latency Estimation Tool and Investigation of Neural Networks inference on Mobile GPU

A lot of deep learning applications are desired to be run on mobile devi...

NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction

With the wide and deep adoption of deep learning models in real applicat...

SONAR: Joint Architecture and System Optimization Search

There is a growing need to deploy machine learning for different tasks o...

CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment

The emergence of CNNs in mainstream deployment has necessitated methods ...

A Study on the Intersection of GPU Utilization and CNN Inference

There has been significant progress in developing neural network archite...

Please sign up or login with your details

Forgot password? Click here to reset