Maximum Reward Formulation In Reinforcement Learning

10/08/2020
by   Sai Krishna Gottipati, et al.
23

Reinforcement learning (RL) algorithms typically deal with maximizing the expected cumulative return (discounted or undiscounted, finite or infinite horizon). However, several crucial applications in the real world, such as drug discovery, do not fit within this framework because an RL agent only needs to identify states (molecules) that achieve the highest reward within a trajectory and does not need to optimize for the expected cumulative return. In this work, we formulate an objective function to maximize the expected maximum reward along a trajectory, derive a novel functional form of the Bellman equation, introduce the corresponding Bellman operators, and provide a proof of convergence. Using this formulation, we achieve state-of-the-art results on the task of molecule generation that mimics a real-world drug discovery pipeline.

READ FULL TEXT
research
05/24/2019

Rethinking Expected Cumulative Reward Formalism of Reinforcement Learning: A Micro-Objective Perspective

The standard reinforcement learning (RL) formulation considers the expec...
research
09/30/2022

RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery

The extraction of sequence patterns from a collection of functionally li...
research
02/24/2021

Information Directed Reward Learning for Reinforcement Learning

For many reinforcement learning (RL) applications, specifying a reward i...
research
05/29/2019

On the Generalization Gap in Reparameterizable Reinforcement Learning

Understanding generalization in reinforcement learning (RL) is a signifi...
research
10/21/2019

IPO: Interior-point Policy Optimization under Constraints

In this paper, we study reinforcement learning (RL) algorithms to solve ...
research
05/31/2019

Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement Learning

Recent advances in deep reinforcement learning algorithms have shown gre...
research
05/07/2023

Truncating Trajectories in Monte Carlo Reinforcement Learning

In Reinforcement Learning (RL), an agent acts in an unknown environment ...

Please sign up or login with your details

Forgot password? Click here to reset