Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations

11/30/2021
by   Yuxin Chen, et al.
0

We present Atos, a task-parallel GPU dynamic scheduling framework that is especially suited to dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems with concurrency bottlenecks. Atos also offers implicit task-parallel load balancing in addition to data-parallel load balancing, providing users the flexibility to balance between them to achieve optimal performance. Finally, Atos allows users to adapt to different use cases by controlling the kernel strategy and task-parallel granularity. We demonstrate that each of these controls is important in practice. We evaluate and analyze the performance of Atos vs. BSP on three applications: breadth-first search, PageRank, and graph coloring. Atos implementations achieve geomean speedups of 3.44x, 2.1x, and 2.77x and peak speedups of 12.8x, 3.2x, and 9.08x across three case studies, compared to a state-of-the-art BSP GPU implementation. Beyond simply quantifying the speedup, we extensively analyze the reasons behind each speedup. This deeper understanding allows us to derive general guidelines for how to select the optimal Atos configuration for different applications. Finally, our analysis provides insights for future dynamic scheduling framework designs.

READ FULL TEXT
research
12/17/2022

GPU Load Balancing

Fine-grained workload and resource balancing is the key to high performa...
research
09/11/2020

Kvik: A task based middleware with composable scheduling policies

In this paper we present Kvik: an implementation of a task-based "middle...
research
08/13/2020

Strategies for Efficient Executions of Irregular Message-Driven Parallel Applications on GPU Systems

Message-driven executions with over-decomposition of tasks constitute an...
research
10/24/2018

On the analysis of scheduling algorithms for structured parallel computations

Algorithms for scheduling structured parallel computations have been wid...
research
05/22/2021

On the Complexity and Parallel Implementation of Hensel's Lemma and Weierstrass Preparation

Hensel's lemma, combined with repeated applications of Weierstrass prepa...
research
08/04/2019

GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU

High-performance implementations of graph algorithms are challenging to ...
research
05/16/2019

Auto-tuning of dynamic load balancing applied to 3D reverse time migration on multicore systems

Reverse time migration (RTM) is an algorithm widely used in the oil and ...

Please sign up or login with your details

Forgot password? Click here to reset