Optimizing the Transition Waste in Coded Elastic Computing

10/02/2019
by   Hoang Dau, et al.
0

Distributed computing, in which a resource-intensive task is divided into subtasks and distributed among different machines, plays a key role in solving large-scale problems, e.g., machine learning for large datasets or massive computational problems arising in genomic research. Coded computing is a recently emerging paradigm where redundancy for distributed computing is introduced to alleviate the impact of slow machines, or stragglers, on the completion time. Motivated by recently available services in the cloud computing industry, e.g., EC2 Spot or Azure Batch, where spare/low-priority virtual machines are offered at a fraction of the price of the on-demand instances but can be preempted in a short notice, we investigate coded computing solutions over elastic resources, where the set of available machines may change in the middle of the computation. Our contributions are two-fold: We first propose an efficient method to minimize the transition waste, a newly introduced concept quantifying the total number of tasks that existing machines have to abandon or take on anew when a machine joins or leaves, for the cyclic elastic task allocation scheme recently proposed in the literature (Yang et al. ISIT'19). We then proceed to generalize such a scheme and introduce new task allocation schemes based on finite geometry that achieve zero transition wastes as long as the number of active machines varies within a fixed range. The proposed solutions can be applied on top of every existing coded computing scheme tolerating stragglers.

READ FULL TEXT
research
12/16/2018

Coded Elastic Computing

Cloud providers have recently introduced low-priority machines to reduce...
research
07/20/2021

A New Design Framework for Heterogeneous Uncoded Storage Elastic Computing

Elasticity is one important feature in modern cloud computing systems an...
research
06/19/2022

Hierarchical coded elastic computing

Elasticity is offered by cloud service providers to exploit under-utiliz...
research
01/12/2020

Heterogeneous Computation Assignments in Coded Elastic Computing

We study the optimal design of a heterogeneous coded elastic computing (...
research
08/12/2020

Coded Elastic Computing on Machines with Heterogeneous Storage and Computation Speed

We study the optimal design of heterogeneous Coded Elastic Computing (CE...
research
01/13/2020

Coded Distributed Computing Schemes with Smaller Numbers of Input Files and Output Functions

Li et al. (IEEE Transaction on Information Theory, 64, 109-128, 2018) in...
research
08/08/2018

On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers

We study the expected completion time of some recently proposed algorith...

Please sign up or login with your details

Forgot password? Click here to reset