While popularity bias is recognized to play a role in recommmender (and ...
We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a no...
We present the problem of reinforcement learning with exogenous terminat...
We consider a stochastic multi-armed bandit setting where feedback is li...
Q-learning (QL), a common reinforcement learning algorithm, suffers from...
The computational model of reinforcement learning is based upon the abil...
We consider the Multi-Armed Bandit (MAB) problem, where the agent
sequen...
The Combinatorial Multi-Armed Bandit problem is a sequential decision-ma...
In recent years, advances in deep learning have enabled the application ...
State-of-the-art efficient model-based Reinforcement Learning (RL) algor...
We consider the combinatorial multi-armed bandit (CMAB) problem, where t...
Learning how to act when there are many available actions in each state ...