A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel 4th Gen Xeon Scalable Processors

by   Reese Kuper, et al.

As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application.


page 5

page 7

page 8

page 15

page 16


AVX-512 extension to OpenQCD 1.6

We publish an extension of openQCD-1.6 with AVX-512 vector instructions ...

DLAU: A Scalable Deep Learning Accelerator Unit on FPGA

As the emerging field of machine learning, deep learning shows excellent...

MgX: Near-Zero Overhead Memory Protection with an Application to Secure DNN Acceleration

In this paper, we propose MgX, a near-zero overhead memory protection sc...

Data Streaming and Traffic Gathering in Mesh-based NoC for Deep Neural Network Acceleration

The increasing popularity of deep neural network (DNN) applications dema...

Speeding up enclave transitions for IO-intensive applications

Process-based confidential computing enclaves such as Intel SGX can be u...

Blocked All-Pairs Shortest Paths Algorithm on Intel Xeon Phi KNL Processor: A Case Study

Manycores are consolidating in HPC community as a way of improving perfo...

Please sign up or login with your details

Forgot password? Click here to reset