Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

03/04/2019
by   Hossein Aboutalebi, et al.
0

The stochastic multi-armed bandit problem is a well-known model for studying the exploration-exploitation trade-off. It has significant possible applications in adaptive clinical trials, which allow for dynamic changes in the treatment allocation probabilities of patients. However, most bandit learning algorithms are designed with the goal of minimizing the expected regret. While this approach is useful in many areas, in clinical trials, it can be sensitive to outlier data, especially when the sample size is small. In this paper, we define and study a new robustness criterion for bandit problems. Specifically, we consider optimizing a function of the distribution of returns as a regret measure. This provides practitioners more flexibility to define an appropriate regret measure. The learning algorithm we propose to solve this type of problem is a modification of the BESA algorithm [Baransi et al., 2014], which considers a more general version of regret. We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset from the clinical trial literature. Our approach compares favorably to a suite of standard bandit algorithms.

READ FULL TEXT
research
05/21/2017

Instrument-Armed Bandits

We extend the classic multi-armed bandit (MAB) model to the setting of n...
research
02/25/2014

Algorithms for multi-armed bandit problems

Although many algorithms for the multi-armed bandit problem are well-und...
research
06/22/2020

Bandit algorithms: Letting go of logarithmic regret for statistical robustness

We study regret minimization in a stochastic multi-armed bandit setting ...
research
01/04/2021

Etat de l'art sur l'application des bandits multi-bras

The Multi-armed bandit offer the advantage to learn and exploit the alre...
research
05/19/2022

Adaptive Experiments and a Rigorous Framework for Type I Error Verification and Computational Experiment Design

This PhD thesis covers breakthroughs in several areas of adaptive experi...
research
10/28/2020

Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility

The demand for seamless Internet access under extreme user mobility, suc...
research
10/01/2020

Learning to be safe, in finite time

This paper aims to put forward the concept that learning to take safe ac...

Please sign up or login with your details

Forgot password? Click here to reset