Model Slicing for Supporting Complex Analytics with Elastic Inference Cost and Resource Constraints

04/03/2019
by   Shaofeng Cai, et al.
0

Deep learning models have been used to support analytics beyond simple aggregation, where deeper and wider models have been shown to yield great results. These models consume a huge amount of memory and computational operations. However, most of the large-scale industrial applications are often computational budget constrained. Current solutions are mainly based on model compression -- deploying a smaller model to save the computational resources. Meanwhile, the peak workload of inference service could be 10x higher than the average cases, with even unpredictable extreme cases. Lots of computational resources could be wasted during off-peak hours. On the other hand, the system may crash when the workload exceeds system design. Supporting such deep learning service with dynamic workload cost-efficiently remains to be a challenging problem. We address this conflict with a general and novel training scheme called model slicing, which enables deep learning models to provide predictions within prescribed computational resource budget dynamically. Model slicing could be viewed as an elastic computation solution without requiring more computation resources, but by slightly sacrificing prediction accuracy. In a nutshell, partially ordered relation is introduced to the basic components of each layer in the model, namely neurons in dense layers and channels in convolutional layers. Specifically, if one component participates in the forward pass, then all of its preceding components are also activated. Dynamically trained under such structural constraint, we can slice a narrower sub-model during inference whose run-time memory and computational operation consumption is roughly quadratic to the width controlled by a single parameter slice rate. Extensive experiments show that models trained with model slicing can support elastic inference cost effectively with minimum performance loss.

READ FULL TEXT

page 11

page 12

research
12/31/2021

SplitBrain: Hybrid Data and Model Parallel Deep Learning

The recent success of deep learning applications has coincided with thos...
research
06/20/2018

Doubly Nested Network for Resource-Efficient Inference

We propose doubly nested network(DNNet) where all neurons represent thei...
research
08/03/2021

SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis

Deep learning has achieved great success in a wide spectrum of multimedi...
research
09/08/2021

An Optimal Resource Allocator of Elastic Training for Deep Learning Jobs on Cloud

Cloud training platforms, such as Amazon Web Services and Huawei Cloud p...
research
04/07/2022

Elastic Model Aggregation with Parameter Service

Model aggregation, the process that updates model parameters, is an impo...
research
03/23/2018

SEGEN: Sample-Ensemble Genetic Evolutional Network Model

Deep learning, a rebranding of deep neural network research works, has a...
research
01/29/2018

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

It is desirable to train convolutional networks (CNNs) to run more effic...

Please sign up or login with your details

Forgot password? Click here to reset