Artificial Intelligence (AI)-Centric Management of Resources in Modern Distributed Computing Systems
Contemporary Distributed Computing Systems (DCS) such as Cloud Data Centers are large scale, complex, heterogeneous, and are distributed across multiple networks and geographical boundaries. Cloud applications have evolved from traditional workloads to microservices/serverless based even as underlying architectures have become more heterogeneous and networks have also transformed to software-defined, large, hierarchical systems. Fueling the pipeline from edge to the cloud, Internet of Things (IoT)-based applications are producing massive amounts of data that require real-time processing and fast response, especially in scenarios such as industrial automation or autonomous systems. Managing these resources efficiently to provide reliable services to end-users or applications is a challenging task. Existing Resource Management Systems (RMS) rely on either static or heuristic solutions that are proving to be inadequate for such composite and dynamic systems, as well for upcoming workloads that demand even higher bandwidth, throughput, and lower latencies. The advent of Artificial Intelligence (AI) and the availability of data have manifested into possibilities of exploring data-driven solutions in RMS tasks that are adaptive, accurate, and efficient. This paper aims to draw motivations and necessities for data-driven solutions in resource management. It identifies the challenges associated with it and outlines the potential future research directions detailing different RMS tasks and the scenarios where data-driven techniques can be applied. Finally, it proposes a conceptual AI-centric RMS model for DCS and presents the two use cases demonstrating the feasibility of AI-centric approaches.
READ FULL TEXT