Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

11/28/2021
by   Dailin Hu, et al.
0

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness. Most MaxEnt RL methods, however, use a constant tradeoff coefficient (temperature), contrary to the intuition that the temperature should be high early in training to avoid overfitting to noisy value estimates and decrease later in training as we increasingly trust high value estimates to truly lead to good rewards. Moreover, our confidence in value estimates is state-dependent, increasing every time we use more evidence to update an estimate. In this paper, we present a simple state-based temperature scheduling approach, and instantiate it for SQL as Count-Based Soft Q-Learning (CBSQL). We evaluate our approach on a toy domain as well as in several Atari 2600 domains and show promising results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2021

Target Entropy Annealing for Discrete Soft Actor-Critic

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in ...
research
10/28/2021

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

Temporal-Difference (TD) learning methods, such as Q-Learning, have prov...
research
07/03/2020

Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient

Exploration-exploitation dilemma has long been a crucial issue in reinfo...
research
05/19/2023

Regularization of Soft Actor-Critic Algorithms with Automatic Temperature Adjustment

This work presents a comprehensive analysis to regularize the Soft Actor...
research
09/11/2019

Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning

Cumulative entropy regularization introduces a regulatory signal to the ...
research
02/02/2023

MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion Control in Real Networks

Fast and efficient transport protocols are the foundation of an increasi...
research
02/07/2022

Soft Actor-Critic with Inhibitory Networks for Faster Retraining

Reusing previously trained models is critical in deep reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset