Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

02/14/2022
by   Jiawei Huang, et al.
0

Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL). Despite the community's increasing interest, there lacks a formal theoretical formulation for the problem. In this paper, we propose such a formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective: we are interested in exploring an MDP and obtaining a near-optimal policy within minimal deployment complexity, whereas in each deployment the policy can sample a large batch of data. Using finite-horizon linear MDPs as a concrete structural model, we reveal the fundamental limit in achieving deployment efficiency by establishing information-theoretic lower bounds, and provide algorithms that achieve the optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible and can serve as a building block for other practically relevant settings; we give "Safe DE-RL" and "Sample-Efficient DE-RL" as two examples, which may be worth future investigation.

READ FULL TEXT
research
10/03/2022

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

We study the problem of deployment efficient reinforcement learning (RL)...
research
08/31/2020

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Reinforcement learning (RL) in episodic, factored Markov decision proces...
research
12/14/2020

Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

Several practical applications of reinforcement learning involve an agen...
research
05/22/2023

Offline Primal-Dual Reinforcement Learning for Linear MDPs

Offline Reinforcement Learning (RL) aims to learn a near-optimal policy ...
research
01/19/2023

Advanced Scaling Methods for VNF deployment with Reinforcement Learning

Network function virtualization (NFV) and software-defined network (SDN)...
research
05/17/2021

Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting

Low-complexity models such as linear function representation play a pivo...
research
06/14/2023

Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information

Partially observable Markov decision processes (POMDPs) have been widely...

Please sign up or login with your details

Forgot password? Click here to reset