Sales Channel Optimization via Simulations Based on Observational Data with Delayed Rewards: A Case Study at LinkedIn

09/16/2022
by   Diana M. Negoescu, et al.
0

Training models on data obtained from randomized experiments is ideal for making good decisions. However, randomized experiments are often time-consuming, expensive, risky, infeasible or unethical to perform, leaving decision makers little choice but to rely on observational data collected under historical policies when training models. This opens questions regarding not only which decision-making policies would perform best in practice, but also regarding the impact of different data collection protocols on the performance of various policies trained on the data, or the robustness of policy performance with respect to changes in problem characteristics such as action- or reward- specific delays in observing outcomes. We aim to answer such questions for the problem of optimizing sales channel allocations at LinkedIn, where sales accounts (leads) need to be allocated to one of three channels, with the goal of maximizing the number of successful conversions over a period of time. A key problem feature constitutes the presence of stochastic delays in observing allocation outcomes, whose distribution is both channel- and outcome- dependent. We built a discrete-time simulation that can handle our problem features and used it to evaluate: a) a historical rule-based policy; b) a supervised machine learning policy (XGBoost); and c) multi-armed bandit (MAB) policies, under different scenarios involving: i) data collection used for training (observational vs randomized); ii) lead conversion scenarios; iii) delay distributions. Our simulation results indicate that LinUCB, a simple MAB policy, consistently outperforms the other policies, achieving a 18-47 relative to a rule-based policy

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2019

Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards

We study a multi-armed bandit problem with covariates in a setting where...
research
05/21/2020

Off-policy Learning for Remote Electrical Tilt Optimization

We address the problem of Remote Electrical Tilt (RET) optimization usin...
research
08/21/2019

Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem

The (contextual) multi-armed bandit problem (MAB) provides a formalizati...
research
10/02/2020

On Statistical Discrimination as a Failure of Social Learning: A Multi-Armed Bandit Approach

We analyze statistical discrimination using a multi-armed bandit model w...
research
12/21/2020

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently require...
research
09/18/2019

Learning from Bandit Feedback: An Overview of the State-of-the-art

In machine learning we often try to optimise a decision rule that would ...
research
08/22/2018

Genie: An Open Box Counterfactual Policy Estimator for Optimizing Sponsored Search Marketplace

In this paper, we propose an offline counterfactual policy estimation fr...

Please sign up or login with your details

Forgot password? Click here to reset