Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality

06/10/2022
by   Ming Yin, et al.
0

Goal-oriented Reinforcement Learning, where the agent needs to reach the goal state while simultaneously minimizing the cost, has received significant attention in real-world applications. Its theoretical formulation, stochastic shortest path (SSP), has been intensively researched in the online setting. Nevertheless, it remains understudied when such an online interaction is prohibited and only historical data is provided. In this paper, we consider the offline stochastic shortest path problem when the state space and the action space are finite. We design the simple value iteration-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks. Notably, our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal. We hope our study could help illuminate the fundamental statistical limits of the offline SSP problem and motivate further studies beyond the scope of current consideration.

READ FULL TEXT
research
08/27/2018

On the convergence of optimistic policy iteration for stochastic shortest path problem

In this paper, we prove some convergence results of a special case of op...
research
03/24/2021

Minimax Regret for Stochastic Shortest Path

We study the Stochastic Shortest Path (SSP) problem in which an agent ha...
research
10/19/2019

Opinion shaping in social networks using reinforcement learning

In this paper, we study how to shape opinions in social networks when th...
research
06/09/2021

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

We consider the problem of online reinforcement learning for the Stochas...
research
10/24/2018

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

We introduce efficient algorithms which achieve nearly optimal regrets f...
research
03/11/2022

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historica...
research
05/11/2022

Hierarchical Constrained Stochastic Shortest Path Planning via Cost Budget Allocation

Stochastic sequential decision making often requires hierarchical struct...

Please sign up or login with your details

Forgot password? Click here to reset