DeepNVM++: Cross-Layer Modeling and Optimization Framework of Non-Volatile Memories for Deep Learning

by   Ahmet Inci, et al.

Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional SRAM due to their non-volatility, higher cell density, and scalability features. While previous work has investigated several architectural implications of NVM for generic applications, in this work we present DeepNVM++, a framework to characterize, model, and analyze NVM-based caches in GPU architectures for deep learning (DL) applications by combining technology-specific circuit-level models and the actual memory behavior of various DL workloads. We present both iso-capacity and iso-area performance and energy analysis for systems whose last-level caches rely on conventional SRAM and emerging STT-MRAM and SOT-MRAM technologies. In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional SRAM, respectively. Under iso-area assumptions, STT-MRAM and SOT-MRAM provide up to 2x and 2.3x EDP reduction and accommodate 2.3x and 3.3x cache capacity when compared to SRAM, respectively. We also perform a scalability analysis and show that STT-MRAM and SOT-MRAM achieve orders of magnitude EDP reduction when compared to SRAM for large cache capacities. Our comprehensive cross-layer framework is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in GPUs for DL applications.


page 1

page 5

page 7

page 9

page 12


Efficient Deep Learning Using Non-Volatile Memory Technology

Embedded machine learning (ML) systems have now become the dominant plat...

Architectural Techniques to Enable Reliable and Scalable Memory Systems

High capacity and scalable memory systems play a vital role in enabling ...

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs

GPUs offer orders-of-magnitude higher memory bandwidth than traditional ...

Optimizing for In-memory Deep Learning with Emerging Memory Technology

In-memory deep learning computes neural network models where they are st...

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

As the models and the datasets to train deep learning (DL) models scale,...

Demonstration of fully integrated parity-time-symmetric electronics

Harnessing parity-time (PT) symmetry with balanced gain and loss profile...

A System-Level Framework for Analytical and Empirical Reliability Exploration of STT-MRAM Caches

Spin-Transfer Torque Magnetic RAM (STT-MRAM) is known as the most promis...

Please sign up or login with your details

Forgot password? Click here to reset