Thread Batching for High-performance Energy-efficient GPU Memory Design

06/13/2019
by   Bing Li, et al.
0

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, the memory becomes a bottleneck of GPU's performance and energy efficiency. In this work, we propose an integrated architectural scheme to optimize the memory accesses and therefore boost the performance and energy efficiency of GPU. Firstly, we propose a thread batch enabled memory partitioning (TEMP) to improve GPU memory access parallelism. In particular, TEMP groups multiple thread blocks that share the same set of pages into a thread batch and applies a page coloring mechanism to bound each stream multiprocessor (SM) to the dedicated memory banks. After that, TEMP dispatches the thread batch to an SM to ensure high-parallel memory-access streaming from the different thread blocks. Secondly, a thread batch-aware scheduling (TBAS) scheme is introduced to improve the GPU memory access locality and to reduce the contention on memory controllers and interconnection networks. Experimental results show that the integration of TEMP and TBAS can achieve up to 10.3 performance improvement and 11.3 applications. We also evaluate the performance interference of the mixed CPU+GPU workloads when they are run on a heterogeneous system that employs our proposed schemes. Our results show that a simple solution can effectively ensure the efficient execution of both GPU and CPU applications.

READ FULL TEXT

page 4

page 6

page 7

page 8

page 14

page 15

page 16

page 17

research
09/11/2021

A readahead prefetcher for GPU file system layer

GPUs are broadly used in I/O-intensive big data applications. Prior work...
research
05/09/2022

Towards a High-performance and Secure Memory System and Architecture for Emerging Applications

In this dissertation, we propose a memory and computing coordinated meth...
research
12/12/2017

Intra-node Memory Safe GPU Co-Scheduling

GPUs in High-Performance Computing systems remain under-utilised due to ...
research
10/08/2019

Performance Impact of Memory Channels on Sparse and Irregular Algorithms

Graph processing is typically considered to be a memory-bound rather tha...
research
10/21/2019

Performance Evaluation of Advanced Features in CUDA Unified Memory

CUDA Unified Memory improves the GPU programmability and also enables GP...
research
05/06/2016

A Graph-based Model for GPU Caching Problems

Modeling data sharing in GPU programs is a challenging task because of t...
research
03/19/2022

Deep Learning based Data Prefetching in CPU-GPU Unified Virtual Memory

Unified Virtual Memory (UVM) relieves the developers from the onus of ma...

Please sign up or login with your details

Forgot password? Click here to reset