Synergy via Redundancy: Adaptive Replication Strategies and Fundamental Limits

12/25/2020
by   Gauri Joshi, et al.
0

The maximum possible throughput (or the rate of job completion) of a multi-server system is typically the sum of the service rates of individual servers. Recent work shows that launching multiple replicas of a job and canceling them as soon as one copy finishes can boost the throughput, especially when the service time distribution has high variability. This means that redundancy can, in fact, create synergy among servers such that their overall throughput is greater than the sum of individual servers. This work seeks to find the fundamental limit of the throughput boost achieved by job replication and the optimal replication policy to achieve it. While most previous works consider upfront replication policies, we expand the set of possible policies to delayed launch of replicas. The search for the optimal adaptive replication policy can be formulated as a Markov Decision Process, using which we propose two myopic replication policies, MaxRate and AdaRep, to adaptively replicate jobs. In order to quantify the optimality gap of these and other policies, we derive upper bounds on the service capacity, which provide fundamental limits on the throughput of queueing systems with redundancy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2020

Threshold-based rerouting and replication for resolving job-server affinity relations

We consider a system with several job types and two parallel server pool...
research
12/06/2019

Data Replication for Reducing Computing Time in Distributed Systems with Stragglers

In distributed computing systems with stragglers, various forms of redun...
research
12/06/2019

Data Replication for Reducing Computing Time inDistributed Systems with Stragglers

In distributed computing systems with stragglers,various forms of redund...
research
01/09/2018

Optimal Content Replication and Request Matching in Large Caching Systems

We consider models of content delivery networks in which the servers are...
research
02/04/2023

Getting to "rate-optimal” in ranking selection

In their 2004 seminal paper, Glynn and Juneja formally and precisely est...
research
08/08/2020

Achievable Stability in Redundancy Systems

We consider a system with N parallel servers where incoming jobs are imm...
research
07/25/2019

MDS coding is better than replication for job completion times

In a multi-server system, how can one get better performance than random...

Please sign up or login with your details

Forgot password? Click here to reset