research
∙
01/09/2023
Minimax Weight Learning for Absorbing MDPs
Reinforcement learning policy evaluation problems are often modeled as f...
research
∙
08/20/2018
A General Framework of Multi-Armed Bandit Processes by Arm Switch Restrictions
This paper proposes a general framework of multi-armed bandit (MAB) proc...
research
∙
08/20/2018