On Byzantine Fault Tolerance in Multi-Master Kubernertes Clusters

04/11/2019
by   Gor Mack Diouf, et al.
0

Docker container virtualization technology is being widely adopted in cloud computing environments because of its lightweight and effiency. However, it requires adequate control and management via an orchestrator. As a result, cloud providers are adopting the open-access Kubernetes platform as the standard orchestrator of containerized applications. To ensure applications' availability in Kubernetes, the latter uses Raft protocol's replication mechanism. Despite its simplicity, Raft assumes that machines fail only when shutdown. This failure event is rarely the only reason for a machine's malfunction. Indeed, software errors or malicious attacks can cause machines to exhibit Byzantine (i.e. random) behavior and thereby corrupt the accuracy and availability of the replication protocol. In this paper, we propose a Kubernetes multi-Master Robust (KmMR) platform to overcome this limitation. KmMR is based on the adaptation and integration of the BFT-SMaRt fault-tolerant replication protocol into Kubernetes environment. Unlike Raft protocol, BFT-SMaRt is resistant to both Byzantine and non-Byzantine faults. Experimental results show that KmMR is able to guarantee the continuity of services, even when the total number of tolerated faults is exceeded. In addition, KmMR provides on average a consensus time 1000 times shorter than that achieved by the conventional platform (with Raft), in such condition. Finally, we show that KmMR generates a small additional cost in terms of resource consumption compared to the conventional platform.

READ FULL TEXT

page 17

page 22

research
01/11/2021

Strengthened Fault Tolerance in Byzantine Fault Tolerant Replication

Byzantine fault tolerant (BFT) state machine replication (SMR) is an imp...
research
07/01/2022

Automatic Integration of BFT State-Machine Replication into IoT Systems

Byzantine fault tolerance (BFT) can preserve the availability and integr...
research
10/02/2019

ROS Rescue : Fault Tolerance System for Robot Operating System

In this chapter we discuss the problem of master failure in ROS1.0 and i...
research
01/26/2018

Enhancing Byzantine fault tolerance using MD5 checksum and delay variation in Cloud services

Cloud computing management are beyond typical human narratives. However ...
research
05/03/2020

Behind the Last Line of Defense – Surviving SoC Faults and Intrusions

Today, leveraging the enormous modular power, diversity and flexibility ...
research
10/20/2017

Hardened Paxos Through Consistency Validation

Due to the emergent adoption of distributed systems when building applic...
research
05/16/2023

Availability Evaluation of IoT Systems with Byzantine Fault-Tolerance for Mission-critical Applications

Byzantine fault-tolerant (BFT) systems are able to maintain the availabi...

Please sign up or login with your details

Forgot password? Click here to reset