Model-based Reinforcement Learning for Service Mesh Fault Resiliency in a Web Application-level

10/21/2021
by   Fanfei Meng, et al.
0

Microservice-based architectures enable different aspects of web applications to be created and updated independently, even after deployment. Associated technologies such as service mesh provide application-level fault resilience through attribute configurations that govern the behavior of request-response service – and the interactions among them – in the presence of failures. While this provides tremendous flexibility, the configured values of these attributes – and the relationships among them – can significantly affect the performance and fault resilience of the overall application. Furthermore, it is impossible to determine the best and worst combinations of attribute values with respect to fault resiliency via testing, due to the complexities of the underlying distributed system and the many possible attribute value combinations. In this paper, we present a model-based reinforcement learning workflow towards service mesh fault resiliency. Our approach enables the prediction of the most significant fault resilience behaviors at a web application-level, scratching from single service to aggregated multi-service management with efficient agent collaborations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2022

Distributed Execution Indexing

This work-in-progress report presents both the design and partial evalua...
research
11/22/2021

Reliable Actors with Retry Orchestration

Enterprise cloud developers have to build applications that are resilien...
research
12/07/2018

PARIS: Predicting Application Resilience Using Machine Learning

Extreme-scale scientific applications can be more vulnerable to soft err...
research
09/11/2021

MODC: Resilience for disaggregated memory architectures using task-based programming

Disaggregated memory architectures provide benefits to applications beyo...
research
12/25/2022

An Adaptive Resilience Testing Framework for Microservice Systems

Resilience testing, which measures the ability to minimize service degra...
research
11/08/2022

Designing an Adaptive Application-Level Checkpoint Management System for Malleable MPI Applications

Dynamic resource management opens up numerous opportunities in High Perf...
research
07/30/2019

Observability and Chaos Engineering on System Calls for Containerized Applications in Docker

In this paper, we present a novel fault injection system called ChaosOrc...

Please sign up or login with your details

Forgot password? Click here to reset