The OpenCL Kernels can be executed on a broad range of hardware devices, providing various level of concurrency, using different caching mechanism and introducing various kernel execution overhead. Thus, searching for optimal granularity of computational kernels is not an easy task. Larger kernels may introduce lower overhead, higher parallelism, and better memory locality, but may also consume more resources such as registers or cache and thus be inefficient.
In this research, we are focused to a kernel fusion method. When the computation is realized by multiple kernels, their fusion may improve data locality, parallelism or serial efficiency. However, it is highly impractical to develop libraries of fused kernels as the number of potential combination of kernels is very high and fusion may decrease performance in some cases. Instead, the programmer may define the computation as a data flow between simple kernels and the source-to-source compiler creates fusion automatically according to data dependencies between kernels and targeted hardware device. We have developed a source-to-source compiler performing fusion on kernels performing (potentially nested) map and reduce operations. Currently, we are extending kernel fusion to more generic kernels.
This work was supported by the project OP RD&E CERIT Scientific Cloud CZ.02.1.01/0.0/0.0/16_013/0001802
You are running an old browser version which is not fully supported information system anymore. Some applications might not display correctly, some functions might not work as expected or might not work at all.