RPU: The Ring Processing Unit

03/30/2023
by   Deepraj Soni, et al.
0

Ring-Learning-with-Errors (RLWE) has emerged as the foundation of many important techniques for improving security and privacy, including homomorphic encryption and post-quantum cryptography. While promising, these techniques have received limited use due to their extreme overheads of running on general-purpose machines. In this paper, we present a novel vector Instruction Set Architecture (ISA) and microarchitecture for accelerating the ring-based computations of RLWE. The ISA, named B512, is developed to meet the needs of ring processing workloads while balancing high-performance and general-purpose programming support. Having an ISA rather than fixed hardware facilitates continued software improvement post-fabrication and the ability to support the evolving workloads. We then propose the ring processing unit (RPU), a high-performance, modular implementation of B512. The RPU has native large word modular arithmetic support, capabilities for very wide parallel processing, and a large capacity high-bandwidth scratchpad to meet the needs of ring processing. We address the challenges of programming the RPU using a newly developed SPIRAL backend. A configurable simulator is built to characterize design tradeoffs and quantify performance. The best performing design was implemented in RTL and used to validate simulator performance. In addition to our characterization, we show that a RPU using 20.5mm2 of GF 12nm can provide a speedup of 1485x over a CPU running a 64k, 128-bit NTT, a core RLWE workload

READ FULL TEXT

page 1

page 5

page 7

page 8

page 9

research
08/07/2023

FPPU: Design and Implementation of a Pipelined Full Posit Processing Unit

By exploiting the modular RISC-V ISA this paper presents the customizati...
research
04/26/2022

MemFHE: End-to-End Computing with Fully Homomorphic Encryption in Memory

The increasing amount of data and the growing complexity of problems has...
research
04/16/2020

The MosaicSim Simulator (Full Technical Report)

As Moore's Law has slowed and Dennard Scaling has ended, architects are ...
research
10/11/2022

Medha: Microcoded Hardware Accelerator for computing on Encrypted Data

Homomorphic encryption (HE) enables computation on encrypted data, and h...
research
07/27/2023

Accelerating Polynomial Modular Multiplication with Crossbar-Based Compute-in-Memory

Lattice-based cryptographic algorithms built on ring learning with error...
research
06/10/2017

Proposal for a High Precision Tensor Processing Unit

This whitepaper proposes the design and adoption of a new generation of ...
research
02/25/2019

Acceleration of expensive computations in Bayesian statistics using vector operations

Many applications in Bayesian statistics are extremely computationally i...

Please sign up or login with your details

Forgot password? Click here to reset