On the Value of Bandit Feedback for Offline Recommender System Evaluation

by   Olivier Jeunen, et al.

In academic literature, recommender systems are often evaluated on the task of next-item prediction. The procedure aims to give an answer to the question: "Given the natural sequence of user-item interactions up to time t, can we predict which item the user will interact with at time t+1?". Evaluation results obtained through said methodology are then used as a proxy to predict which system will perform better in an online setting. The online setting, however, poses a subtly different question: "Given the natural sequence of user-item interactions up to time t, can we get the user to interact with a recommended item at time t+1?". From a causal perspective, the system performs an intervention, and we want to measure its effect. Next-item prediction is often used as a fall-back objective when information about interventions and their effects (shown recommendations and whether they received a click) is unavailable. When this type of data is available, however, it can provide great value for reliably estimating online recommender system performance. Through a series of simulated experiments with the RecoGym environment, we show where traditional offline evaluation schemes fall short. Additionally, we show how so-called bandit feedback can be exploited for effective offline evaluation that more accurately reflects online performance.


page 1

page 2

page 3


Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective

Modern recommender systems aim to improve user experience. As reinforcem...

An Evaluation Framework for Interactive Recommender System

Traditional recommender systems present a relatively static list of reco...

Sequence-aware item recommendations for multiply repeated user-item interactions

Recommender systems are one of the most successful applications of machi...

On Offline Evaluation of Recommender Systems

In academic research, recommender models are often evaluated offline on ...

Existence conditions for hidden feedback loops in online recommender systems

We explore a hidden feedback loops effect in online recommender systems....

Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation

Both in academic and industry-based research, online evaluation methods ...

From Counter-intuitive Observations to a Fresh Look at Recommender System

Recently, a few papers report counter-intuitive observations made from e...

Please sign up or login with your details

Forgot password? Click here to reset