The remarkable success of reinforcement learning (RL) heavily relies on
...
Existing offline reinforcement learning (RL) methods face a few major
ch...
The problem of constrained Markov decision process (CMDP) is investigate...
Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a
s...
General Value Function (GVF) is a powerful tool to represent both the pr...
Designing off-policy reinforcement learning algorithms is typically a ve...
The gradient descent-ascent (GDA) algorithm has been widely applied to s...
Safe reinforcement learning (SRL) problems are typically modeled as
cons...
Two timescale stochastic approximation (SA) has been widely used in
valu...
Generative adversarial imitation learning (GAIL) is a popular inverse
re...
Min-max optimization captures many important machine learning problems s...
As an important type of reinforcement learning algorithms, actor-critic ...
The actor-critic (AC) algorithm is a popular method to find an optimal p...
Despite the wide applications of Adam in reinforcement learning (RL), th...
Temporal difference (TD) learning is a popular algorithm for policy
eval...
Gradient-based temporal difference (GTD) algorithms are widely used in
o...
Though the convergence of major reinforcement learning algorithms has be...
We consider the binary classification problem in which the objective fun...