Affinity-Aware Resource Provisioning for Long-Running Applications in Shared Clusters

08/26/2022
by   Clément Mommessin, et al.
0

Resource provisioning plays a pivotal role in determining the right amount of infrastructure resource to run applications and target the global decarbonization goal. A significant portion of production clusters is now dedicated to long-running applications (LRAs), which are typically in the form of microservices and executed in the order of hours or even months. It is therefore practically important to plan ahead the placement of LRAs in a shared cluster so that the number of compute nodes required by them can be minimized to reduce carbon footprint and lower operational costs. Existing works on LRA scheduling are often application-agnostic, without particularly addressing the constraining requirements imposed by LRAs, such as co-location affinity constraints and time-varying resource requirements. In this paper, we present an affinity-aware resource provisioning approach for deploying large-scale LRAs in a shared cluster subject to multiple constraints, with the objective of minimizing the number of compute nodes in use. We investigate a broad range of solution algorithms which fall into three main categories: Application-Centric, Node-Centric, and Multi-Node approaches, and tune them for typical large-scale real-world scenarios. Experimental studies driven by the Alibaba Tianchi dataset show that our algorithms can achieve competitive scheduling effectiveness and running time, as compared with the heuristics used by the latest work including Medea and LraSched.

READ FULL TEXT

page 15

page 16

page 17

page 18

research
07/30/2019

DeepPlace: Learning to Place Applications in Multi-Tenant Clusters

Large multi-tenant production clusters often have to handle a variety of...
research
05/22/2019

Online Collection and Forecasting of Resource Utilization in Large-Scale Distributed Systems

Large-scale distributed computing systems often contain thousands of dis...
research
02/09/2019

Linear Time Algorithms for Multiple Cluster Scheduling and Multiple Strip Packing

We study the Multiple Cluster Scheduling problem and the Multiple Strip ...
research
01/17/2019

Scheduling Jobs with Random Resource Requirements in Computing Clusters

We consider a natural scheduling problem which arises in many distribute...
research
11/20/2021

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

Many organizations employ compute clusters equipped with accelerators su...
research
01/28/2020

A Scheduling Algorithm to Maximize Storm Throughput in Heterogeneous Cluster

In the most popular distributed stream processing frameworks (DSPFs), pr...
research
08/20/2023

I/O Burst Prediction for HPC Clusters using Darshan Logs

Understanding cluster-wide I/O patterns of large-scale HPC clusters is e...

Please sign up or login with your details

Forgot password? Click here to reset