The streaming rollout of deep networks - towards fully model-parallel execution
Deep neural networks, and in particular recurrent networks, are promising candidates to control autonomous agents that interact in real-time with the physical world. However, this requires a seamless integration of temporal features into the network's architecture. For the training of and inference with recurrent neural networks, they are usually rolled out over time, and different rollouts exist. Conventionally, during inference the layers of a network are computed in a sequential manner resulting in sparse temporal integration of information and long response times. In this study, we present a theoretical framework to describe the set of all rollouts and demonstrate their differences in solving specific tasks. We prove that certain rollouts, also with only skip and no recurrent connections, enable earlier and more frequent responses, and show empirically that these early responses have better performance. The streaming rollout maximizes these properties and, in addition, enables a fully parallel execution of the network reducing the runtime on massively parallel devices. Additionally, we provide an open-source toolbox to design, train, evaluate, and online-interact with streaming rollouts.
READ FULL TEXT