DynaSOAr: A Parallel Memory Allocator for Object-oriented Programming on GPUs with Efficient Memory Access

by   Matthias Springer, et al.

Object-oriented programming has long been regarded as too inefficient for SIMD high-performance computing, despite the fact that many important applications in HPC have an inherent object structure. On SIMD accelerators including GPUs, this is mainly due to performance problems with memory allocation: There are a few libraries that support parallel memory allocation directly on accelerator devices, but all of them suffer from uncoalesed memory accesses. In this work, we present DynaSOAr, a C++/CUDA data layout DSL for object-oriented programming, combined with a parallel dynamic object allocator. DynaSOAr was designed for a class of object-oriented programs that we call Single-Method Multiple Objects (SMMO), in which parallelism is expressed over a set of objects. DynaSOAr is the first GPU object allocator that provides a parallel do-all operation, which is the foundation of SMMO applications. DynaSOAr improves the usage of allocated memory with a Structure of Arrays (SOA) data layout and achieves low memory fragmentation through efficient management of free and allocated memory blocks with lock-free, hierarchical bitmaps. In our benchmarks, DynaSOAr achieves a significant speedup of application code of up to 3x over state-of-the-art allocators. Moreover, DynaSOAr manages heap memory more efficiently than other allocators, allowing programmers to run up to 2x larger problem sizes with the same amount of memory.


Memory-Efficient Object-Oriented Programming on GPUs

Object-oriented programming is often regarded as too inefficient for hig...

SoaAlloc: A Lock-free Hierarchical Bitmap-based Object Allocator for GPUs

Designing dynamic memory allocators for GPUs is challenging because appl...

PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development

This paper describes PlinyCompute, a system for development of high-perf...

SoaAlloc: Accelerating Single-Method Multiple-Objects Applications on GPUs

We propose SoaAlloc, a dynamic object allocator for Single-Method Multip...

Object-oriented design for massively parallel computing

We define an abstract framework for object-oriented programming and show...

ROLP: Runtime Object Lifetime Profiling for Big Data Memory Management

Low latency services such as credit-card fraud detection and website tar...

Lazy object copy as a platform for population-based probabilistic programming

This work considers dynamic memory management for population-based proba...

Please sign up or login with your details

Forgot password? Click here to reset