SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

by   Rui-Jie Zhu, et al.
University of California Santa Cruz
Beijing Kuaishou Technology Co.,Ltd.

As the size of large language models continue to scale, so does the computational resources required to run it. Spiking neural networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have also proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and we are yet to see the effectiveness of SNNs in language generation. In this paper, inspired by the RWKV language model, we successfully implement `SpikeGPT', a generative language model with pure binary, event-driven spiking activation units. We train the proposed model on three model variants: 45M, 125M and 260M parameters. To the best of our knowledge, this is 4x larger than any functional backprop-trained SNN to date. We achieve this by modifying the transformer block to replace multi-head self attention to reduce quadratic computational complexity to linear with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our preliminary experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 5x less energy consumption when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.


page 1

page 2

page 3

page 4


Spikformer: When Spiking Neural Network Meets Transformer

We consider two biologically plausible structures, the Spiking Neural Ne...

Direct Training for Spiking Neural Networks: Faster, Larger, Better

Spiking neural networks (SNNs) are gaining more attention as a promising...

Spike-driven Transformer

Spiking Neural Networks (SNNs) provide an energy-efficient deep learning...

Spiking GATs: Learning Graph Attentions via Spiking Neural Network

Graph Attention Networks (GATs) have been intensively studied and widely...

SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation

Large language Models (LLMs), though growing exceedingly powerful, compr...

Efficient Spiking Transformer Enabled By Partial Information

Spiking neural networks (SNNs) have received substantial attention in re...

Efficient Hardware Acceleration of Sparsely Active Convolutional Spiking Neural Networks

Spiking Neural Networks (SNNs) compute in an event-based matter to achie...

Please sign up or login with your details

Forgot password? Click here to reset