Identifying Co-Adaptation of Algorithmic and Implementational Innovations in Deep Reinforcement Learning: A Taxonomy and Case Study of Inference-based Algorithms

by   Hiroki Furuta, et al.

Recently many algorithms were devised for reinforcement learning (RL) with function approximation. While they have clear algorithmic distinctions, they also have many implementation differences that are algorithm-agnostic and sometimes subtle. Such mixing of algorithmic novelty and implementation craftsmanship makes rigorous analyses of the sources of performance improvements difficult. In this work, we focus on a series of inference-based actor-critic algorithms – MPO, AWR, and SAC – to decouple their algorithmic innovations and implementation decisions. We present unified derivations through a single control-as-inference objective, where we can categorize each algorithm as based on either Expectation-Maximization (EM) or direct Kullback-Leibler (KL) divergence minimization and treat the rest of specifications as implementation details. We performed extensive ablation studies, and identified substantial performance drops whenever implementation details are mismatched for algorithmic choices. These results show which implementation details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC. We hope our work can inspire future work to further demystify sources of performance improvements across multiple algorithms and allow researchers to build on one another's both algorithmic and implementational innovations.


page 7

page 8

page 17

page 18

page 19


Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

We study the roots of algorithmic progress in deep policy gradient algor...

Towards Deeper Deep Reinforcement Learning

In computer vision and natural language processing, innovations in model...

VIREL: A Variational Inference Framework for Reinforcement Learning

Applying probabilistic models to reinforcement learning (RL) has become ...

An intelligent algorithmic trading based on a risk-return reinforcement learning algorithm

This scientific paper propose a novel portfolio optimization model using...

Partially Detected Intelligent Traffic Signal Control: Environmental Adaptation

Partially Detected Intelligent Traffic Signal Control (PD-ITSC) systems ...

Resolving Implementation Ambiguity and Improving SURF

Speeded Up Robust Features (SURF) has emerged as one of the more popular...

Please sign up or login with your details

Forgot password? Click here to reset