On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

12/05/2017
by   Wonje Choi, et al.
0

Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs. It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. To address this issue, we first analyze the on-chip traffic patterns that arise from the computational processes associated with training two deep CNN architectures, namely, LeNet and CDBNet, to perform image classification. By leveraging this knowledge, we design a hybrid Network-on-Chip (NoC) architecture, which consists of both wireline and wireless links, to improve the performance of CPU-GPU based heterogeneous manycore platforms running the above-mentioned CNN training workloads. The proposed NoC achieves 1.8x reduction in network latency and improves the network throughput by a factor of 2.2 for training CNNs, when compared to a highly-optimized wireline mesh NoC. For the considered CNN workloads, these network-level improvements translate into 25 that the proposed hybrid NoC for heterogeneous manycore architectures is capable of significantly accelerating training of CNNs while remaining energy-efficient.

READ FULL TEXT

page 1

page 5

page 9

page 10

page 12

page 13

research
10/20/2018

Learning-based Application-Agnostic 3D NoC Design for Heterogeneous Manycore Systems

The rising use of deep learning and other big-data algorithms has led to...
research
11/17/2021

ReaLPrune: ReRAM Crossbar-aware Lottery Ticket Pruned CNNs

Training machine learning (ML) models at the edge (on-chip training on e...
research
04/16/2015

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

We present Caffe con Troll (CcT), a fully compatible end-to-end version ...
research
11/07/2019

Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design

In recent years, the CNNs have achieved great successes in the image pro...
research
02/04/2022

EcoFlow: Efficient Convolutional Dataflows for Low-Power Neural Network Accelerators

Dilated and transposed convolutions are widely used in modern convolutio...
research
12/03/2019

Understanding the Impact of On-chip Communication on DNN Accelerator Performance

Deep Neural Networks have flourished at an unprecedented pace in recent ...
research
09/30/2019

System-level optimization of Network-on-Chips for heterogeneous 3D System-on-Chips

For a system-level design of Networks-on-Chip for 3D heterogeneous Syste...

Please sign up or login with your details

Forgot password? Click here to reset