TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

by   Abdul Dakkak, et al.

Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.


page 1

page 3

page 6

page 7

page 8

page 11

page 12

page 13


CacheNet: A Model Caching Framework for Deep Learning Inference on the Edge

The success of deep neural networks (DNN) in machine perception applicat...

No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy

With the rise of machine learning, inference on deep neural networks (DN...

The Case for Hierarchical Deep Learning Inference at the Network Edge

Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcont...

Accelerating Deep Learning Inference via Learned Caches

Deep Neural Networks (DNNs) are witnessing increased adoption in multipl...

Quantifying and Improving Performance of Distributed Deep Learning with Cloud Storage

Cloud computing provides a powerful yet low-cost environment for distrib...

RDMA Performance Isolation With Justitia

Despite its increasing popularity, most of RDMA's benefits such as ultra...

Cloudburst: Stateful Functions-as-a-Service

Function-as-a-Service (FaaS) platforms and "serverless" cloud computing ...

Please sign up or login with your details

Forgot password? Click here to reset