Autotuning for Heterogeneous Systems

Autotuning for Heterogeneous Systems

The OpenCL is open standard allowing to program many types of accelerators as well as classic CPUs. It allows implementing highly parallel computational kernels executed on computing devices. The development and optimization of kernels is challenging even for experienced programmers, as it requires to efficiently parallelize the code to thousands of independent thread and follow many performance characteristics of current hardware (which changes significantly for different hardware types and even generations). Thus, any automatic tool, which eases the exploration of performance-related parameters, has a great value in this area.

We develop methods and tool KTT for autotuning the code, where the programmer identifies parameters of the code, which may have an influence on the performance, and the autotuning tool automatically searches the parameter space and pick ones which lead to the highest performance on a particular hardware device. The autotuning decreases the time needed for manual exploration of code tuning parameters and allows developers to write flexible codes, which optimize themselves for underlying hardware architecture automatically. We have extended the state-of-the-art autotuning methods to allow autotuning global optimizations (affecting multiple kernels and host code) and dynamic autotuning (non-blocking tuning in during computation is performed). Currently, we are working towards improving tuning space search methods for dynamic tuning.

Results (more papers are under review or prepared):

  • Jiří Filipovič, Filip Petrovič, Siegfried Benkner. Autotuning of OpenCL Kernels with Global Optimizations. In 1st Workshop on Autotuning and Adaptivity Approaches for Energy Efficient HPC Systems (ANDARE). 2017.
  • Jaroslav Oľha, Jana Hozzová, Jan Fousek, Jiří Filipovič. Exploiting historical data: pruning autotuning spaces and estimating the number of tuning steps. Seventeenth International Workshop Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms. 2019.
  • Filip Petrovič, David Střelák, Jana Hozzová, Jaroslav Oľha, Richard Trembecký, Siegfried Benkner, Jiří Filipovič. A Benchmark Set of Highly-efficient CUDA and OpenCL Kernels and its Dynamic Autotuning with Kernel Tuning Toolkit. arXiv:1910.08498

This work was supported by the project OP RD&E CERIT Scientific Cloud CZ.02.1.01/0.0/0.0/16_013/0001802

You are running an old browser version which is not fully supported information system anymore. Some applications might not display correctly, some functions might not work as expected or might not work at all.