GACT: Activation Compressed Training for General Architectures

by   Xiaoxuan Liu, et al.
Tsinghua University
berkeley college

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint. This paper presents GACT, an ACT framework to support a broad range of machine learning tasks for generic NN architectures with limited domain knowledge. By analyzing a linearized version of ACT's approximate gradient, we prove the convergence of GACT without prior knowledge on operator type or model architecture. To make training stable, we propose an algorithm that decides the compression ratio for each tensor by estimating its impact on the gradient at run time. We implement GACT as a PyTorch library that readily applies to any NN architecture. GACT reduces the activation memory for convolutional NNs, transformers, and graph NNs by up to 8.1x, enabling training with a 4.2x to 24.7x larger batch size, with negligible accuracy loss.


page 1

page 2

page 3

page 4


ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

The increasing size of neural network models has been critical for impro...

Compressing neural network by tensor network with exponentially fewer variational parameters

Neural network (NN) designed for challenging machine learning tasks is i...

Provable Convergence of Tensor Decomposition-Based Neural Network Training

Advanced tensor decomposition, such as tensor train (TT), has been widel...

Dynamic Hard Pruning of Neural Networks at the Edge of the Internet

Neural Networks (NN), although successfully applied to several Artificia...

Faster Neural Network Training with Approximate Tensor Operations

We propose a novel technique for faster Neural Network (NN) training by ...

Sparse Weight Activation Training

Training convolutional neural networks (CNNs) is time-consuming. Prior w...

Two Instances of Interpretable Neural Network for Universal Approximations

This paper proposes two bottom-up interpretable neural network (NN) cons...

Please sign up or login with your details

Forgot password? Click here to reset