Efficient Deep Learning Using Non-Volatile Memory Technology

by   Ahmet Inci, et al.

Embedded machine learning (ML) systems have now become the dominant platform for deploying ML serving tasks and are projected to become of equal importance for training ML models. With this comes the challenge of overall efficient deployment, in particular low power and high throughput implementations, under stringent memory constraints. In this context, non-volatile memory (NVM) technologies such as STT-MRAM and SOT-MRAM have significant advantages compared to conventional SRAM due to their non-volatility, higher cell density, and scalability features. While prior work has investigated several architectural implications of NVM for generic applications, in this work we present DeepNVM++, a comprehensive framework to characterize, model, and analyze NVM-based caches in GPU architectures for deep learning (DL) applications by combining technology-specific circuit-level models and the actual memory behavior of various DL workloads. DeepNVM++ relies on iso-capacity and iso-area performance and energy models for last-level caches implemented using conventional SRAM and emerging STT-MRAM and SOT-MRAM technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional SRAM, respectively. Under iso-area assumptions, STT-MRAM and SOT-MRAM provide up to 2.2x and 2.4x EDP reduction and accommodate 2.3x and 3.3x cache capacity when compared to SRAM, respectively. We also perform a scalability analysis and show that STT-MRAM and SOT-MRAM achieve orders of magnitude EDP reduction when compared to SRAM for large cache capacities. DeepNVM++ is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in GPUs for DL applications.


page 1

page 2

page 3

page 4


DeepNVM++: Cross-Layer Modeling and Optimization Framework of Non-Volatile Memories for Deep Learning

Non-volatile memory (NVM) technologies such as spin-transfer torque magn...

GPU Domain Specialization via Composable On-Package Architecture

As GPUs scale their low precision matrix math throughput to boost deep l...

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs

GPUs offer orders-of-magnitude higher memory bandwidth than traditional ...

Architectural Techniques to Enable Reliable and Scalable Memory Systems

High capacity and scalable memory systems play a vital role in enabling ...

Demonstration of fully integrated parity-time-symmetric electronics

Harnessing parity-time (PT) symmetry with balanced gain and loss profile...

Beyond Human-Level Accuracy: Computational Challenges in Deep Learning

Deep learning (DL) research yields accuracy and product improvements fro...

Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation

Graphics Processing Units (GPUs) employ large register files to accommod...

Please sign up or login with your details

Forgot password? Click here to reset