We consider the problem of control in the setting of reinforcement learn...
We present in this paper a family of generalized simultaneous perturbati...
We study the finite-time behaviour of the popular temporal difference (T...
In this paper, we present a stochastic gradient algorithm for minimizing...
In several applications such as clinical trials and financial portfolio
...
We consider the problem of sequentially learning to estimate, in the mea...
We propose approximate gradient ascent algorithms for risk-sensitive
rei...
Utility-Based Shortfall Risk (UBSR) is a risk metric that is increasingl...
We propose policy-gradient algorithms for solving the problem of control...
We consider the problem of control in an off-policy reinforcement learni...
We consider the problem of optimizing an objective function with and wit...
We consider the problem of estimating a spectral risk measure (SRM) from...
Known finite-sample concentration bounds for the Wasserstein distance be...
While the objective in traditional multi-armed bandit problems is to fin...
Traditional multi-armed bandit problems are geared towards finding the a...
The classic objective in a reinforcement learning (RL) problem is to fin...
We introduce deterministic perturbation schemes for the recently propose...
In several real-world applications involving decision making under
uncer...
Algorithms for bandit convex optimization and online learning often rely...
We study a risk-constrained version of the stochastic shortest path (SSP...
In many sequential decision-making problems we may want to manage risk b...
Online learning algorithms require to often recompute least squares
regr...