An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

03/31/2023
by   Victor G. Lopez, et al.
0

In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. Finally, a method to determine a stabilizing policy to initialize the algorithm using only measured data is proposed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2019

Hamilton-Jacobi-Bellman Equations for Q-Learning in Continuous Time

In this paper, we introduce Hamilton-Jacobi-Bellman (HJB) equations for ...
research
12/29/2017

Smoothed Dual Embedding Control

We revisit the Bellman optimality equation with Nesterov's smoothing tec...
research
10/11/2014

Q-learning for Optimal Control of Continuous-time Systems

In this paper, two Q-learning (QL) methods are proposed and their conver...
research
05/09/2017

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space

Policy iteration (PI) is a recursive process of policy evaluation and im...
research
01/22/2017

Binary Matrix Guessing Problem

We introduce the Binary Matrix Guessing Problem and provide two algorith...
research
02/18/2018

Estimating scale-invariant future in continuous time

Natural learners must compute an estimate of future outcomes that follow...

Please sign up or login with your details

Forgot password? Click here to reset