Using PHAST to port Caffe library: First experiences and lessons learned

by   Eduardo Gómez, et al.

Performance has always been a hot topic in computing. However, the viable ways to achieve it have taken many forms in the different moments of computing history. Today, technological limits have pushed the adoption of increasingly parallel multi-core and many-core architectures and even the use of highly specific hardware (aka Domain-Specific Architectures, or DSAs) to solve very specific problems. In this new context, one major problem is how to develop software once, and be able to run it on multiple accelerator architectures, seamlessly. Ideally aiming at a single programming model that can automatically target the code to different kinds of parallel architectures, allowing specific tuning with minimal, if any, changes to the source-code in order to seek performance portability. A comprehensive solution to this is still lacking. In this work, we present the use of the PHAST Library, which allows users to code once, at a high level of abstraction and thus with high productivity, and automatically targeting different parallel devices by changing the compilation process. As a case study, we have worked on the porting of the well-known deep-learning Caffe framework. The framework has been split into different parts and some of them have been ported, obtaining a working straightforward implementation that can be run on both CPUs and GPUs. We conclude discussing the lessons learned during the porting process, and analyzing the obtained performance in the perspective of completing the porting and expanding it to future consequent works.


page 1

page 2

page 3

page 4


FLOWER: A comprehensive dataflow compiler for high-level synthesis

FPGAs have found their way into data centers as accelerator cards, makin...

HSTREAM: A directive-based language extension for heterogeneous stream computing

Big data streaming applications require utilization of heterogeneous par...

Solving the Bethe-Salpeter equation on massively parallel architectures

The last ten years have witnessed fast spreading of massively parallel c...

Towards Green Computing: A Survey of Performance and Energy Efficiency of Different Platforms using OpenCL

When considering different hardware platforms, not just the time-to-solu...

Alpaka - An Abstraction Library for Parallel Kernel Acceleration

Porting applications to new hardware or programming models is a tedious ...

Early Experiences Migrating CUDA codes to oneAPI

The heterogeneous computing paradigm represents a real programming chall...

Please sign up or login with your details

Forgot password? Click here to reset