Performance portability through machine learning guided kernel selection in SYCL libraries

08/30/2020
by   John Lawson, et al.
0

Automatically tuning parallel compute kernels allows libraries and frameworks to achieve performance on a wide range of hardware, however these techniques are typically focused on finding optimal kernel parameters for particular input sizes and parameters. General purpose compute libraries must be able to cater to all inputs and parameters provided by a user, and so these techniques are of limited use. Additionally, parallel programming frameworks such as SYCL require that the kernels be deployed in a binary format embedded within the library. As such it is impractical to deploy a large number of possible kernel configurations without inflating the library size. Machine learning methods can be used to mitigate against both of these problems and provide performance for general purpose routines with a limited number of kernel configurations. We show that unsupervised clustering methods can be used to select a subset of the possible kernels that should be deployed and that simple classification methods can be trained to select from these kernels at runtime to give good performance. As these techniques are fully automated, relying only on benchmark data, the tuning process for new hardware or problems does not require any developer effort or expertise.

READ FULL TEXT
research
03/15/2020

Towards automated kernel selection in machine learning systems: A SYCL case study

Automated tuning of compute kernels is a popular area of research, mainl...
research
05/31/2022

HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness

The process of optimizing the latency of DNN operators with ML models an...
research
01/27/2020

Automated Parallel Kernel Extraction from Dynamic Application Traces

Modern program runtime is dominated by segments of repeating code called...
research
12/19/2014

A la Carte - Learning Fast Kernels

Kernel methods have great promise for learning rich statistical represen...
research
02/15/2018

Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

Efficient implementations of HPC applications for parallel architectures...
research
04/25/2023

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

Growing heterogeneity and configurability in HPC architectures has made ...
research
08/21/2022

IAAT: A Input-Aware Adaptive Tuning framework for Small GEMM

GEMM with the small size of input matrices is becoming widely used in ma...

Please sign up or login with your details

Forgot password? Click here to reset