Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation

04/07/2021
by   Kai Wang, et al.
0

In recent years, there are great interests as well as challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in recommendation. All these problems remain largely unexplored in the existing literature and make the application of RL challenging. We develop a model-based reinforcement learning framework, called GoalRec. Inspired by the ideas of world model (model-based), value function estimation (model-free), and goal-based RL, a novel disentangled universal value function designed for item recommendation is proposed. It can generalize to various goals that the recommender may have, and disentangle the stochastic environmental dynamics and high-variance reward signals accordingly. As a part of the value function, free from the sparse and high-variance reward signals, a high-capacity reward-independent world model is trained to simulate complex environmental dynamics under a certain goal. Based on the predicted environmental dynamics, the disentangled universal value function is related to the user's future trajectory instead of a monolithic state and a scalar reward. We demonstrate the superiority of GoalRec over previous approaches in terms of the above three practical challenges in a series of simulations and a real application.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

Disentangling Dynamics and Returns: Value Function Decomposition with Future Prediction

Value functions are crucial for model-free Reinforcement Learning (RL) t...
research
12/27/2018

Neural Model-Based Reinforcement Learning for Recommendation

There are great interests as well as many challenges in applying reinfor...
research
06/08/2016

Deep Successor Reinforcement Learning

Learning robust value functions given raw observations and rewards is no...
research
12/27/2018

Generative Adversarial User Model for Reinforcement Learning Based Recommendation System

There are great interests as well as many challenges in applying reinfor...
research
02/02/2023

Is Model Ensemble Necessary? Model-based RL via a Single Model with Lipschitz Regularized Value Function

Probabilistic dynamics model ensemble is widely used in existing model-b...
research
02/19/2020

Value-driven Hindsight Modelling

Value estimation is a critical component of the reinforcement learning (...
research
06/18/2019

Hill Climbing on Value Estimates for Search-control in Dyna

Dyna is an architecture for model-based reinforcement learning (RL), whe...

Please sign up or login with your details

Forgot password? Click here to reset