Michael Garland

Chat Image Generator Video Music Voice Chat Photo Editor

Featured Co-authors

Animesh Garg
96 publications
Jinjun Xiong
93 publications
Wen-mei Hwu
57 publications
John D. Owens
27 publications
Maxim Naumov
21 publications
Iuri Frosio
14 publications
Muhammad Osama
14 publications
Ganesh Gopalakrishnan
9 publications
Vikram Sharma Mailthody
7 publications
Seung Won Min
7 publications
Zaid Qureshi
5 publications

research

∙ 07/07/2023

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

Data compression and decompression have become vital components of big-d...

0 Jeongmin Park, et al. ∙

research

∙ 01/09/2023

Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU

We introduce Stream-K, a work-centric parallelization of matrix multipli...

0 Muhammad Osama, et al. ∙

research

∙ 08/31/2022

Efficient Sparsely Activated Transformers

Transformer-based neural networks have achieved state-of-the-art task pe...

14 Salar Latifi, et al. ∙

research

∙ 03/09/2022

GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture

Graphics Processing Units (GPUs) have traditionally relied on the host C...

0 Zaid Qureshi, et al. ∙

research

∙ 11/06/2019

A Programmable Approach to Model Compression

Deep neural networks frequently contain far more weights, represented at...

14 Vinu Joseph, et al. ∙

research

∙ 07/19/2019

GPU-Accelerated Atari Emulation for Reinforcement Learning

We designed and implemented a CUDA port of the Atari Learning Environmen...

1 Steven Dalton, et al. ∙

research

∙ 12/06/2017

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Training deep neural networks with Stochastic Gradient Descent, or its v...

0 Aditya Devarakonda, et al. ∙

Success!

An error occurred

Michael Garland

Featured Co-authors

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU

Efficient Sparsely Activated Transformers

GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture

A Programmable Approach to Model Compression

GPU-Accelerated Atari Emulation for Reinforcement Learning

AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks

Sign in with Google

Consider DeepAI Pro