Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

07/25/2020
by   James McInerney, et al.
2

Users of music streaming, video streaming, news recommendation, and e-commerce services often engage with content in a sequential manner. Providing and evaluating good sequences of recommendations is therefore a central problem for these services. Prior reweighting-based counterfactual evaluation methods either suffer from high variance or make strong independence assumptions about rewards. We propose a new counterfactual estimator that allows for sequential interactions in the rewards with lower variance in an asymptotically unbiased manner. Our method uses graphical assumptions about the causal relationships of the slate to reweight the rewards in the logging policy in a way that approximates the expected sum of rewards under the target policy. Extensive experiments in simulation and on a live recommender system show that our approach outperforms existing methods in terms of bias and data efficiency for the sequential track recommendations problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Counterfactual evaluation can estimate Click-Through-Rate (CTR) differen...
research
09/03/2023

Double Clipping: Less-Biased Variance Reduction in Off-Policy Evaluation

"Clipping" (a.k.a. importance weight truncation) is a widely used varian...
research
08/05/2023

Disentangled Counterfactual Reasoning for Unbiased Sequential Recommendation

Sequential recommender systems have achieved state-of-the-art recommenda...
research
10/06/2021

Learning the Optimal Recommendation from Explorative Users

We propose a new problem setting to study the sequential interactions be...
research
10/05/2021

Live Multi-Streaming and Donation Recommendations via Coupled Donation-Response Tensor Factorization

In contrast to traditional online videos, live multi-streaming supports ...
research
09/15/2022

Semi-Counterfactual Risk Minimization Via Neural Networks

Counterfactual risk minimization is a framework for offline policy optim...

Please sign up or login with your details

Forgot password? Click here to reset