From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

04/29/2018
by   Zihang Dai, et al.
0

In this work, we study the credit assignment problem in reward augmented maximum likelihood (RAML) learning, and establish a theoretical equivalence between the token-level counterpart of RAML and the entropy regularized reinforcement learning. Inspired by the connection, we propose two sequence prediction algorithms, one extending RAML with fine-grained credit assignment and the other improving Actor-Critic with a systematic entropy regularization. On two benchmark datasets, we show the proposed algorithms outperform RAML and Actor-Critic respectively, providing new alternatives to sequence prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2021

Towards Practical Credit Assignment for Deep Reinforcement Learning

Credit assignment is a fundamental problem in reinforcement learning, th...
research
02/14/2019

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

We propose a new policy iteration theory as an important extension of so...
research
07/06/2020

Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning

We present a multi-agent actor-critic method that aims to implicitly add...
research
02/28/2017

Bridging the Gap Between Value and Policy Based Reinforcement Learning

We establish a new connection between value and policy based reinforceme...
research
02/09/2022

Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy Regularization

In cooperative multi-agent systems, agents jointly take actions and rece...
research
09/30/2018

Efficient Sequence Labeling with Actor-Critic Training

Neural approaches to sequence labeling often use a Conditional Random Fi...
research
04/08/2020

Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms

In this paper we investigate some of the issues that arise from the scal...

Please sign up or login with your details

Forgot password? Click here to reset