Output-Weighted Sampling for Multi-Armed Bandits with Extreme Payoffs

02/19/2021
by   Yibo Yang, et al.
0

We present a new type of acquisition functions for online decision making in multi-armed and contextual bandit problems with extreme payoffs. Specifically, we model the payoff function as a Gaussian process and formulate a novel type of upper confidence bound (UCB) acquisition function that guides exploration towards the bandits that are deemed most relevant according to the variability of the observed rewards. This is achieved by computing a tractable likelihood ratio that quantifies the importance of the output relative to the inputs and essentially acts as an attention mechanism that promotes exploration of extreme rewards. We demonstrate the benefits of the proposed methodology across several synthetic benchmarks, as well as a realistic example involving noisy sensor network data. Finally, we provide a JAX library for efficient bandit optimization using Gaussian processes.

READ FULL TEXT

page 6

page 7

page 8

research
02/19/2020

Residual Bootstrap Exploration for Bandit Algorithms

In this paper, we propose a novel perturbation-based exploration method ...
research
11/11/2018

Adapting multi-armed bandits policies to contextual bandits scenarios

This work explores adaptations of successful multi-armed bandits policie...
research
03/21/2022

Efficient Algorithms for Extreme Bandits

In this paper, we contribute to the Extreme Bandit problem, a variant of...
research
08/21/2023

Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach

Online decision making plays a crucial role in numerous real-world appli...
research
07/13/2019

Parameterized Exploration

We introduce Parameterized Exploration (PE), a simple family of methods ...
research
04/22/2020

Bayesian Optimization with Output-Weighted Importance Sampling

In Bayesian optimization, accounting for the importance of the output re...
research
09/19/2022

Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits

In autonomous robotic decision-making under uncertainty, the tradeoff be...

Please sign up or login with your details

Forgot password? Click here to reset