Optimal dynamic treatment allocation
In a treatment allocation problem the individuals to be treated often arrive gradually. Initially, when the first treatments are made, little is known about the effect of the treatments but as more treatments are assigned the policy maker learns about their effects by observing outcomes. Thus, there is a tradeoff between exploring the available treatments to learn about their merits and exploiting the best treatment, i.e. administering it as often as possible, in order to maximise the cumulative welfare of all the assignments made. Furthermore, a policy maker may not only be interested in the expected effect of the treatment but also its riskiness. Thus, we allow the welfare function to depend on the first and second moments of the distribution of treatment outcomes. We propose a dynamic treatment policy which attains the minimax optimal regret relative to the unknown best treatment in this dynamic setting. We allow for the data to arrive in batches as, say, unemployment programs only start once a month or blood samples are only send to the laboratory for investigation in batches. Furthermore, we show that the minimax optimality does not come at the price of overly aggressive experimentation as we provide upper bounds on the expected number of times any suboptimal treatment is assigned. We also consider the case where the outcome of a treatment is only observed with delay as it may take time for the treatment to work. Thus, a doctor faces a tradeoff between getting imprecise information quickly by making the measurement soon after the treatment is given or getting precise information later at the expense of less information for the individuals who are treated in the meantime. Finally, using Danish register data, we show how our treatment policy can be used to assign unemployed to active labor market policy programs in order to maximise the probability of ending the unemployment spell.
READ FULL TEXT