Thompson Sampling Achieves Õ(√(T)) Regret in Linear Quadratic Control

06/17/2022
by   Taylan Kargin, et al.
31

Thompson Sampling (TS) is an efficient method for decision-making under uncertainty, where an action is sampled from a carefully prescribed distribution which is updated based on the observed data. In this work, we study the problem of adaptive control of stabilizable linear-quadratic regulators (LQRs) using TS, where the system dynamics are unknown. Previous works have established that Õ(√(T)) frequentist regret is optimal for the adaptive control of LQRs. However, the existing methods either work only in restrictive settings, require a priori known stabilizing controllers, or utilize computationally intractable approaches. We propose an efficient TS algorithm for the adaptive control of LQRs, TS-based Adaptive Control, TSAC, that attains Õ(√(T)) regret, even for multidimensional systems, thereby solving the open problem posed in Abeille and Lazaric (2018). TSAC does not require a priori known stabilizing controller and achieves fast stabilization of the underlying system by effectively exploring the environment in the early stages. Our result hinges on developing a novel lower bound on the probability that the TS provides an optimistic sample. By carefully prescribing an early exploration strategy and a policy update rule, we show that TS achieves order-optimal regret in adaptive control of multidimensional stabilizable LQRs. We empirically demonstrate the performance and the efficiency of TSAC in several adaptive control tasks.

READ FULL TEXT
research
07/23/2020

Explore More and Improve Regret in Linear Quadratic Regulators

Stabilizing the unknown dynamics of a control system and minimizing regr...
research
03/12/2020

Regret Bound of Adaptive Control in Linear Quadratic Gaussian (LQG) Systems

We study the problem of adaptive control in partially observable linear ...
research
05/23/2018

Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

We consider adaptive control of the Linear Quadratic Regulator (LQR), wh...
research
10/31/2021

Safe Adaptive Learning-based Control for Constrained Linear Quadratic Regulators with Regret Guarantees

We study the adaptive control of an unknown linear system with a quadrat...
research
03/24/2013

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

We study the problem of adaptive control of a high dimensional linear qu...
research
01/25/2022

Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems

We consider the problem of controlling a stochastic linear system with q...
research
11/02/2020

Exact Asymptotics for Linear Quadratic Adaptive Control

Recent progress in reinforcement learning has led to remarkable performa...

Please sign up or login with your details

Forgot password? Click here to reset