CERIT-SC's HPC team developed a novel GPU-friendly algorithm effectivelly accelerating 3D Fourier Reconstruction

25 Mar 2019

Our accelerated 3D Fourier Reconstruction has been published in the International Journal of High-Performance Computing Applications.

The 3D Fourier Reconstruction is used during the reconstruction of volumes obtained by cryo-electron microscopy. It is a significant computational bottleneck in the pipeline. We have introduced a novel GPU-friendly algorithm, improving cache locality and removing race conditions in parallel writing into the 3-D volume over state-of-the-art implementations. Our algorithm has been auto-tuned by Kernel Tuning Toolkit (https://github.com/Fillo7/KTT).

We have integrated the algorithm into widely used software Xmipp, version 3.19, reaching 11.4× speedup compared to the original parallel CPU implementation using GPU with comparable power consumption. Moreover, we have reached 31.7× speedup using four GPUs and 2.14×–5.96× speedup compared to optimized GPU implementation based on state-of-the-art algorithms. The paper is available at https://journals.sagepub.com/doi/abs/10.1177/1094342019832958?journalCode=hpcc

We have released a new tool improving the setup of cuFFT library.

Fast Fourier transform (FFT) it is often one of the most computationally demanding kernels in various scientific applications. Although a lot of attention has been invested into tuning its performance on various hardware devices, FFT libraries usually have many possible settings, and it is not always easy to deduce which settings should be used for optimal performance. In practice, we can often slightly modify the FFT settings; for example, we can pad or crop input data.

We have developed a new tool, cuFFTAdvisor (https://github.com/DStrelak/cuFFTAdvisor), which proposes and using auto-tuning finds the best configuration of the cuFFT library (a popular library computing FFT on GPUs) for given constraints of input size and plan settings. We experimentally show that our tool can propose different settings of the transformation, resulting in an average 6x speedup using fast heuristics and 6.9x speedup using auto-tuning. The paper is available here: https://dl.acm.org/citation.cfm?id=3295817

More articles

All articles

You are running an old browser version. We recommend updating your browser to its latest version.