CHAOS: A Parallelization Scheme for Training Convolutional Neural Networks on Intel Xeon Phi

by   Andre Viebke, et al.

Deep learning is an important component of big-data analytic tools and intelligent applications, such as, self-driving cars, computer vision, speech recognition, or precision medicine. However, the training process is computationally intensive, and often requires a large amount of time if performed sequentially. Modern parallel computing systems provide the capability to reduce the required training time of deep neural networks. In this paper, we present our parallelization scheme for training convolutional neural networks (CNN) named Controlled Hogwild with Arbitrary Order of Synchronization (CHAOS). Major features of CHAOS include the support for thread and vector parallelism, non-instant updates of weight parameters during back-propagation without a significant delay, and implicit synchronization in arbitrary order. CHAOS is tailored for parallel computing systems that are accelerated with the Intel Xeon Phi. We evaluate our parallelization approach empirically using measurement techniques and performance modeling for various numbers of threads and CNN architectures. Experimental results for the MNIST dataset of handwritten digits using the total number of threads on the Xeon Phi show speedups of up to 103x compared to the execution on one thread of the Xeon Phi, 14x compared to the sequential execution on Intel Xeon E5, and 58x compared to the sequential execution on Intel Core i5.


page 1

page 2

page 3

page 4


Performance Modelling of Deep Learning on Intel Many Integrated Core Architectures

Many complex problems, such as natural language processing or visual obj...

A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks

Benefitting from large-scale training datasets and the complex training ...

Horn: A System for Parallel Training and Regularizing of Large-Scale Neural Networks

I introduce a new distributed system for effective training and regulari...

Overhead Management in Multi-Core Environment

In multi-core systems, various factors like inter-process communication,...

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Deep Neural Networks (DNNs) are becoming an important tool in modern com...

Scheduling Computation Graphs of Deep Learning Models on Manycore CPUs

For a deep learning model, efficient execution of its computation graph ...

A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms

This paper studies parallelization schemes for stochastic Vector Quantiz...

Please sign up or login with your details

Forgot password? Click here to reset