Achieving Sample and Computational Efficient Reinforcement Learning by Action Space Reduction via Grouping

by   Yining Li, et al.

Reinforcement learning often needs to deal with the exponential growth of states and actions when exploring optimal control in high-dimensional spaces (often known as the curse of dimensionality). In this work, we address this issue by learning the inherent structure of action-wise similar MDP to appropriately balance the performance degradation versus sample/computational complexity. In particular, we partition the action spaces into multiple groups based on the similarity in transition distribution and reward function, and build a linear decomposition model to capture the difference between the intra-group transition kernel and the intra-group rewards. Both our theoretical analysis and experiments reveal a surprising and counter-intuitive result: while a more refined grouping strategy can reduce the approximation error caused by treating actions in the same group as identical, it also leads to increased estimation error when the size of samples or the computation resources is limited. This finding highlights the grouping strategy as a new degree of freedom that can be optimized to minimize the overall performance loss. To address this issue, we formulate a general optimization problem for determining the optimal grouping strategy, which strikes a balance between performance loss and sample/computational complexity. We further propose a computationally efficient method for selecting a nearly-optimal grouping strategy, which maintains its computational complexity independent of the size of the action space.


page 1

page 2

page 3

page 4


Exact Reduction of Huge Action Spaces in General Reinforcement Learning

The reinforcement learning (RL) framework formalizes the notion of learn...

Linear Reinforcement Learning with Ball Structure Action Space

We study the problem of Reinforcement Learning (RL) with linear function...

Computational-Statistical Gaps in Reinforcement Learning

Reinforcement learning with function approximation has recently achieved...

Learning to Control in Metric Space with Optimal Regret

We study online reinforcement learning for finite-horizon deterministic ...

Perturbational Complexity by Distribution Mismatch: A Systematic Analysis of Reinforcement Learning in Reproducing Kernel Hilbert Space

Most existing theoretical analysis of reinforcement learning (RL) is lim...

Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus

This paper considers offline multi-agent reinforcement learning. We prop...

Reinforcement Learning for Transition-Based Mention Detection

This paper describes an application of reinforcement learning to the men...

Please sign up or login with your details

Forgot password? Click here to reset