Nadav Merlis

research

∙ 05/24/2023

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

While popularity bias is recognized to play a role in recommmender (and ...

0 Guy Tennenholtz, et al. ∙

research

∙ 02/04/2023

Reinforcement Learning with History-Dependent Dynamic Contexts

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a no...

0 Guy Tennenholtz, et al. ∙

research

∙ 05/30/2022

Reinforcement Learning with a Terminator

We present the problem of reinforcement learning with exogenous terminat...

0 Guy Tennenholtz, et al. ∙

research

∙ 10/12/2021

Dare not to Ask: Problem-Dependent Guarantees for Budgeted Bandits

We consider a stochastic multi-armed bandit setting where feedback is li...

0 Nadav Merlis, et al. ∙

research

∙ 02/28/2021

Ensemble Bootstrapping for Q-Learning

Q-learning (QL), a common reinforcement learning algorithm, suffers from...

0 Oren Peer, et al. ∙

research

∙ 08/13/2020

Reinforcement Learning with Trajectory Feedback

The computational model of reinforcement learning is based upon the abil...

46 Yonathan Efroni, et al. ∙

research

∙ 08/10/2020

Lenient Regret for Multi-Armed Bandits

We consider the Multi-Armed Bandit (MAB) problem, where the agent sequen...

5 Nadav Merlis, et al. ∙

research

∙ 02/13/2020

Tight Lower Bounds for Combinatorial Multi-Armed Bandits

The Combinatorial Multi-Armed Bandit problem is a sequential decision-ma...

11 Nadav Merlis, et al. ∙

research

∙ 10/02/2019

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

In recent years, advances in deep learning have enabled the application ...

0 Chen Tessler, et al. ∙

research

∙ 05/27/2019

Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

State-of-the-art efficient model-based Reinforcement Learning (RL) algor...

0 Yonathan Efroni, et al. ∙

research

∙ 05/08/2019

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

We consider the combinatorial multi-armed bandit (CMAB) problem, where t...

0 Nadav Merlis, et al. ∙

research

∙ 09/06/2018

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

Learning how to act when there are many available actions in each state ...

6 Tom Zahavy, et al. ∙

Nadav Merlis

Featured Co-authors

Sign in with Google

Consider DeepAI Pro