Launchpad: Learning to Schedule Using Offline and Online RL Methods

by   Vanamala Venkataswamy, et al.

Deep reinforcement learning algorithms have succeeded in several challenging domains. Classic Online RL job schedulers can learn efficient scheduling strategies but often takes thousands of timesteps to explore the environment and adapt from a randomly initialized DNN policy. Existing RL schedulers overlook the importance of learning from historical data and improving upon custom heuristic policies. Offline reinforcement learning presents the prospect of policy optimization from pre-recorded datasets without online environment interaction. Following the recent success of data-driven learning, we explore two RL methods: 1) Behaviour Cloning and 2) Offline RL, which aim to learn policies from logged data without interacting with the environment. These methods address the challenges concerning the cost of data collection and safety, particularly pertinent to real-world applications of RL. Although the data-driven RL methods generate good results, we show that the performance is highly dependent on the quality of the historical datasets. Finally, we demonstrate that by effectively incorporating prior expert demonstrations to pre-train the agent, we short-circuit the random exploration phase to learn a reasonable policy with online training. We utilize Offline RL as a launchpad to learn effective scheduling policies from prior experience collected using Oracle or heuristic policies. Such a framework is effective for pre-training from historical datasets and well suited to continuous improvement with online data collection.


Critic Regularized Regression

Offline reinforcement learning (RL), also known as batch RL, offers the ...

When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

Learning effective reinforcement learning (RL) policies to solve real-wo...

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

Offline Reinforcement Learning (RL) methods leverage previous experience...

VRL3: A Data-Driven Framework for Visual Deep Reinforcement Learning

We propose a simple but powerful data-driven framework for solving highl...

Dynamic Measurement Scheduling for Event Forecasting using Deep RL

Current clinical practice for monitoring patients' health follows either...

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Most reinforcement learning (RL) algorithms assume online access to the ...

Deep Offline Reinforcement Learning for Real-World Treatment Optimization Applications

There is increasing interest in data-driven approaches for dynamically c...

Please sign up or login with your details

Forgot password? Click here to reset