This paper compares OpenCL, CUDA, and HIP as compilation targets for Futhark, a functional array language. We compare the performance of OpenCL versus CUDA, and OpenCL versus HIP, on the code generated by the Futhark compiler on a collection of 48 application benchmarks on two different GPUs. Despite the generated code being in most cases equivalent, we observe significant performance differences on the same hardware, ranging from 0.42x to 1.72x in the most extreme cases. We identify the root causes of most of these differences, many of which are due to relatively superficial details such as inconsistent defaults regarding compiler optimisation and numerical accuracy, although a few remain mysterious.
I work on functional array programming and optimising compilers.
Program Display Configuration
Fri 6 Sep
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Viennachange