PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy

03/15/2021
by   Saurabh Kadekodi, et al.
0

Data redundancy provides resilience in large-scale storage clusters, but imposes significant cost overhead. Substantial space-savings can be realized by tuning redundancy schemes to observed disk failure rates. However, prior design proposals for such tuning are unusable in real-world clusters, because the IO load of transitions between schemes overwhelms the storage infrastructure (termed transition overload). This paper analyzes traces for millions of disks from production systems at Google, NetApp, and Backblaze to expose and understand transition overload as a roadblock to disk-adaptive redundancy: transition IO under existing approaches can consume 100 insights drawn, we present PACEMAKER, a low-overhead disk-adaptive redundancy orchestrator. PACEMAKER mitigates transition overload by (1) proactively organizing data layouts to make future transitions efficient, and (2) initiating transitions proactively in a manner that avoids urgency while not compromising on space-savings. Evaluation of PACEMAKER with traces from four large (110K-450K disks) production clusters show that the transition IO requirement decreases to never needing more than 5 (0.2-0.4 space-savings of 14-20 describe and experiment with an integration of PACEMAKER into HDFS.

READ FULL TEXT

page 7

page 8

page 9

page 15

page 16

page 20

page 21

page 22

research
04/20/2020

Vilamb: Low Overhead Asynchronous Redundancy for Direct Access NVM

Vilamb provides efficient asynchronous systemredundancy for direct acces...
research
10/13/2019

Load Balancing Performance in Distributed Storage with Regular Balanced Redundancy

Contention at the storage nodes is the main cause of long and variable d...
research
04/30/2021

Isolation Without Taxation: Near Zero Cost Transitions for SFI

Almost all SFI systems use heavyweight transitions that incur significan...
research
07/12/2018

Modeling, Analysis, and Hard Real-time Scheduling of Adaptive Streaming Applications

In real-time systems, the application's behavior has to be predictable a...
research
02/25/2022

VLSM: Validating Labelled State Transition and Message Production Systems

In this paper we introduce the notion of a validating labelled state tra...
research
08/26/2019

Tvarak: Software-managed hardware offload for DAX NVM storage redundancy

Tvarak efficiently implements system-level redundancy for direct-access ...
research
08/06/2023

System-Initiated Transitions from Chit-Chat to Task-Oriented Dialogues with Transition Info Extractor and Transition Sentence Generator

In this work, we study dialogue scenarios that start from chit-chat but ...

Please sign up or login with your details

Forgot password? Click here to reset