The Kernel Tuner documentation

Kernel Tuner is a software development tool for the creation of highly-optimized and tuned GPU applications.

The Kernel Tuner documentation pages are mostly about Kernel Tuner itself, but there are a number of related repositories that are considered part of the Kernel Tuner family:

Quick install

The easiest way to install the Kernel Tuner is using pip:

To tune CUDA kernels:

  • First, make sure you have the CUDA Toolkit installed

  • Then type: pip install kernel_tuner[cuda]

To tune OpenCL kernels:

  • First, make sure you have an OpenCL compiler for your intended OpenCL platform

  • Then type: pip install kernel_tuner[opencl]

To tune HIP kernels:

  • First, make sure you have an HIP runtime and compiler installed

  • Then type: pip install kernel_tuner[hip]

Or all:

  • pip install kernel_tuner[cuda,opencl,hip]

More information about how to install Kernel Tuner and its dependencies can be found under Installation.

Example usage

The following shows a simple example for tuning a CUDA kernel:

kernel_string = """
__global__ void vector_add(float *c, float *a, float *b, int n) {
    int i = blockIdx.x * block_size_x + threadIdx.x;
    if (i<n) {
        c[i] = a[i] + b[i];
    }
}
"""

size = 10000000

a = numpy.random.randn(size).astype(numpy.float32)
b = numpy.random.randn(size).astype(numpy.float32)
c = numpy.zeros_like(b)
n = numpy.int32(size)
args = [c, a, b, n]

tune_params = dict()
tune_params["block_size_x"] = [32, 64, 128, 256, 512]

tune_kernel("vector_add", kernel_string, size, args, tune_params)

Citation

If you use Kernel Tuner in research or research software, please cite the most relevant among the following publications:

The first paper on Kernel Tuner, please note that the capabilities of Kernel Tuner have significantly expanded since the first publication:

@article{kerneltuner,
  author  = {Ben van Werkhoven},
  title   = {Kernel Tuner: A search-optimizing GPU code auto-tuner},
  journal = {Future Generation Computer Systems},
  year = {2019},
  volume  = {90},
  pages = {347-358},
  url = {https://www.sciencedirect.com/science/article/pii/S0167739X18313359},
  doi = {https://doi.org/10.1016/j.future.2018.08.004}
}

For referencing to Kernel Tuner’s Bayesian Optimization strategy, please cite the following:

@article{willemsen2021bayesian,
  author = {Willemsen, Floris-Jan and Van Nieuwpoort, Rob and Van Werkhoven, Ben},
  title = {Bayesian Optimization for auto-tuning GPU kernels},
  journal = {International Workshop on Performance Modeling, Benchmarking and Simulation
     of High Performance Computer Systems (PMBS) at Supercomputing (SC21)},
  year = {2021},
  url = {https://arxiv.org/abs/2111.14991}
}

For a performance comparison of different optimization algorithms for auto-tuning and an analysis of tuning difficulty for different GPUs:

@article{schoonhoven2022benchmarking,
  title={Benchmarking optimization algorithms for auto-tuning GPU kernels},
  author={Schoonhoven, Richard and van Werkhoven, Ben and Batenburg, K Joost},
  journal={IEEE Transactions on Evolutionary Computation},
  year={2022},
  publisher={IEEE}
}

For referencing to Kernel Tuner’s capabilities in measuring and optimizing energy consumption of GPU kernels, please cite the following:

@article{schoonhoven2022going,
  author = {Schoonhoven, Richard and Veenboer, Bram, and van Werkhoven, Ben and Batenburg, K Joost},
  title = {Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning},
  journal = {International Workshop on Performance Modeling, Benchmarking and Simulation
     of High Performance Computer Systems (PMBS) at Supercomputing (SC22)},
  year = {2022},
  url = {https://arxiv.org/abs/2211.07260}
}