Quantum Computing Provides Exponential Regret Improvement in Episodic Reinforcement Learning

by   Bhargav Ganguly, et al.

In this paper, we investigate the problem of episodic reinforcement learning with quantum oracles for state evolution. To this end, we propose an Upper Confidence Bound (UCB) based quantum algorithmic framework to facilitate learning of a finite-horizon MDP. Our quantum algorithm achieves an exponential improvement in regret as compared to the classical counterparts, achieving a regret of ๐’ช(1) as compared to ๐’ช(โˆš(K)) [๐’ช(ยท) hides logarithmic terms.], K being the number of training episodes. In order to achieve this advantage, we exploit efficient quantum mean estimation technique that provides quadratic improvement in the number of i.i.d. samples needed to estimate the mean of sub-Gaussian random variables as compared to classical mean estimation. This improvement is a key to the significant regret improvement in quantum reinforcement learning. We provide proof-of-concept experiments on various RL environments that in turn demonstrate performance gains of the proposed algorithmic framework.


page 1

page 2

page 3

page 4

โˆ™ 02/21/2023

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

While quantum reinforcement learning (RL) has attracted a surge of atten...
โˆ™ 06/09/2022

Quantum Policy Iteration via Amplitude Estimation and Grover Search โ€“ Towards Quantum Advantage for Reinforcement Learning

We present a full implementation and simulation of a novel quantum reinf...
โˆ™ 04/27/2023

Batch Quantum Reinforcement Learning

Training DRL agents is often a time-consuming process as a large number ...
โˆ™ 09/26/2022

Quantum Speedups of Optimizing Approximately Convex Functions with Applications to Logarithmic Regret Stochastic Convex Bandits

We initiate the study of quantum algorithms for optimizing approximately...
โˆ™ 02/26/2018

Variance Reduction Methods for Sublinear Reinforcement Learning

This work considers the problem of provably optimal reinforcement learni...
โˆ™ 04/27/2023

Logarithmic-Regret Quantum Learning Algorithms for Zero-Sum Games

We propose the first online quantum algorithm for zero-sum games with ร•(...
โˆ™ 11/06/2021

Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning

We study risk-sensitive reinforcement learning (RL) based on the entropi...

Please sign up or login with your details

Forgot password? Click here to reset