Integrating Deep Learning in Domain Sciences at Exascale

by   Rick Archibald, et al.

This paper presents some of the current challenges in designing deep learning artificial intelligence (AI) and integrating it with traditional high-performance computing (HPC) simulations. We evaluate existing packages for their ability to run deep learning models and applications on large-scale HPC systems efficiently, identify challenges, and propose new asynchronous parallelization and optimization techniques for current large-scale heterogeneous systems and upcoming exascale systems. These developments, along with existing HPC AI software capabilities, have been integrated into MagmaDNN, an open-source HPC deep learning framework. Many deep learning frameworks are targeted at data scientists and fall short in providing quality integration into existing HPC workflows. This paper discusses the necessities of an HPC deep learning framework and how those needs can be provided (e.g., as in MagmaDNN) through a deep integration with existing HPC libraries, such as MAGMA and its modular memory management, MPI, CuBLAS, CuDNN, MKL, and HIP. Advancements are also illustrated through the use of algorithmic enhancements in reduced- and mixed-precision, as well as asynchronous optimization methods. Finally, we present illustrations and potential solutions for enhancing traditional compute- and data-intensive applications at ORNL and UTK with AI. The approaches and future challenges are illustrated in materials science, imaging, and climate applications.


page 1

page 2

page 3

page 4


Deploying Scientific AI Networks at Petaflop Scale on Secure Large Scale HPC Production Systems with Containers

There is an ever-increasing need for computational power to train comple...

Deploying AI Frameworks on Secure HPC Systems with Containers

The increasing interest in the usage of Artificial Intelligence techniqu...

Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation

The convergence of HPC and data-intensive methodologies provide a promis...

A Software-Defined QoS Provisioning Framework for HPC Applications

With the emergence of large-scale data-intensive high-performance applic...

Enabling Dynamic and Intelligent Workflows for HPC, Data Analytics, and AI Convergence

The evolution of High-Performance Computing (HPC) platforms enables the ...

accelerating wrf i/o performance with adios2 and network-based streaming

With the approach of Exascale computing power for large-scale High Perfo...

Driving asynchronous distributed tasks with events

Open-source matters, not just to the current cohort of HPC users but als...

Please sign up or login with your details

Forgot password? Click here to reset