On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
The lottery ticket hypothesis questions the role of overparameterization in supervised deep learning. But how does the distributional shift inherent to the reinforcement learning problem affect the performance of winning lottery tickets? In this work, we show that feed-forward networks trained via supervised policy distillation and reinforcement learning can be pruned to the same level of sparsity. Furthermore, we establish the existence of winning tickets for both on- and off-policy methods in a visual navigation and classic control task. Using a set of carefully designed baseline conditions, we find that the majority of the lottery ticket effect in reinforcement learning can be attributed to the identified mask. The resulting masked observation space eliminates redundant information and yields minimal task-relevant representations. The mask identified by iterative magnitude pruning provides an interpretable inductive bias. Its costly generation can be amortized by training dense agents with low-dimensional input and thereby at lower computational cost.
READ FULL TEXT