KernelBuilder

struct KernelBuilder : public kernel_launcher::ConfigSpace

A KernelBuilder is essentially a blueprint that describes the information required to compile and run a CUDA kernel for a given configuration. Most methods take expressions that will be evaluated for a particular Config.

The most important methods are:

  • tuning_key: set the tuning key.

  • problem_size: set the problem size.

  • block_size: set the thread block size.

  • grid_divisors: calculate grid size by dividing the problem size.

  • shared_mem: set amount of shared memory in bytes.

  • template_arg: Add a template argument.

  • compiler_flag: Add a compiler_flag.

  • define: Define a preprocessor variable.

Public Functions

KernelBuilder(std::string kernel_name, KernelSource kernel_source, ConfigSpace space = {})

Construct a new KernelBuilder.

Parameters:
  • kernel_name – Function name of the kernel. This should be the fully qualified name of the function, i.e. including the namespace. This name should not contain template parameters (this can be added by calling template_args).

  • kernel_source – The kernel source code. Can be either the file name as a string or a KernelSource instance.

KernelBuilder &tuning_key(std::string)

Set the tuning key that will be used to find wisdom files for this kernel.

Returns:

this

KernelBuilder &problem_size(ProblemSize p)

Set problem size for this kernel by providing a ProblemSize instance.

Returns:

this

KernelBuilder &problem_size(TypedExpr<uint32_t> x, TypedExpr<uint32_t> y = 1, TypedExpr<uint32_t> z = 1)

Set the problem size for this kernel by providing an expression for each dimension. These expressions can contain ArgExpr expressions such as arg0, arg1, etc.

Returns:

this

KernelBuilder &problem_size(ProblemProcessor f)

Set the problem size for this kernel by providing a function to extract the problem size from the kernel arguments.

Returns:

this

KernelBuilder &argument_processor(ArgumentsProcessor f)

Add a ArgumentsProcessor to the list of processors for this kernel. This processors are user-defined that can modify the kernel arguments before they are passed to the actual kernel.

Parameters:

f – The processors

Returns:

this

KernelBuilder &buffer_size(ArgExpr arg, TypedExpr<size_t> len)

Set the size (in number of elements) for the given argument of this kernel. For example, the following example sets the length of the buffer given by the 5th argument (index=4) to the integer value given by the 1st argument (index=0).

builder.buffer_size(4, arg0);

Alternatively, it is recommended to use this function in combination with the args function for more readable variable names. For example, the following kernel takes three arguments (n, A, B) where the size of A and B is given by the variable n.

auto [n, A, B] = args<3>();
builder.buffer_size(A, n);
builder.buffer_size(B, 2 * n);
Parameters:
  • arg – The argument buffer that this size is applied to.

  • len – The length of the buffer in number of elements.

Returns:

this

template<typename ...Ts>
inline KernelBuilder &buffers(Ts... buffers)

Short-hand for using KernelBuilder::buffer_size(...). For example, the following kernel takes three arguments (n, A, B) where the size of A and B is given by the expressions n and 2 * n.

auto [n, A, B] = args<3>();
builder.buffers(A[n], B[2 * n]);
Parameters:

buffers – Expressions of type ArgBuffer.

Returns:

this

KernelBuilder &block_size(TypedExpr<uint32_t> x, TypedExpr<uint32_t> y = 1, TypedExpr<uint32_t> z = 1)

Set the block size for this kernel (i.e., number of threads per thread block).

Parameters:
  • x – Block size along X.

  • y – Block size along Y.

  • z – Block size along Z.

Returns:

this

KernelBuilder &grid_size(TypedExpr<uint32_t> x, TypedExpr<uint32_t> y = 1, TypedExpr<uint32_t> z = 1)

Set the grid size for this kernel (i.e., number of thread blocks along each direction).

Parameters:
  • x – Grid size along X.

  • y – Grid size along Y.

  • z – Grid size along Z.

Returns:

this

KernelBuilder &grid_divisors(TypedExpr<uint32_t> x, TypedExpr<uint32_t> y = 1, TypedExpr<uint32_t> z = 1)

Set the grid size for this kernel (i.e., number of thread blocks along each direction) by dividing the problem_size by the given divisors. For example, if the problem size is (100, 100) and the divisors are (5, 15) then the grid size will be (20, 7).

Parameters:
  • x – Grid divisor along X.

  • y – Grid divisor along Y.

  • z – Grid divisor along Z.

Returns:

this

KernelBuilder &shared_memory(TypedExpr<uint32_t> smem)

Set the amount of shared memory in bytes.

Returns:

this

KernelBuilder &define(std::string name, TypedExpr<std::string> value)

Add a preprocessor variable definition with the provided name and value.

Returns:

this

KernelBuilder &include_header(KernelSource source)

Add a header files that must be preincluded during compilation.

Returns:

this.

template<typename ...Ts>
inline KernelBuilder &template_args(TypedExpr<TemplateArg> first, Ts&&... rest)

Add one or more template arguments. Each argument must be convertible to an instance of TemplateArg.

Returns:

this

template<typename ...Ts>
inline KernelBuilder &template_types()

Add one or more types Ts... as template arguments to this kernel.

Short-hand for:

builder.template_args(type_of<Ts>()...);
Returns:

this

template<typename T>
inline KernelBuilder &template_type()

Add type T as a template argument to this kernel.

Short-hand for:

builder.template_arg(type_of<T>());
Returns:

this

template<typename ...Ts>
inline KernelBuilder &compiler_flags(TypedExpr<std::string> first, Ts&&... rest)

Add one ore more compilation flags that will be passed to the compiler. Each argument must be convertible to a string.

Returns:

this

inline std::array<TypedExpr<uint32_t>, 3> tune_block_size(std::vector<uint32_t> xs, std::vector<uint32_t> ys = {1u}, std::vector<uint32_t> zs = {1u})

Short-hand for:

builder.block_size(
     builder.tune("BLOCK_SIZE_X", xs),
     builder.tune("BLOCK_SIZE_Y", ys),
     builder.tune("BLOCK_SIZE_Z", zs));
template<typename T = Value>
inline TypedExpr<T> tune_define(std::string name, std::vector<T> values)

Short-hand for

builder.define(name, builder.tune(name, values));
KernelInstance compile(const Config &config, const std::vector<TypeInfo> &param_types, const ICompiler &compiler = default_compiler(), CudaContextHandle ctx = CudaContextHandle::current()) const

Compile an instance of this kernel using the given configuration.

Parameters:
  • config – The configuration.

  • param_types – The types of the parameters of this kernel.

  • compiler – The CUDA compiler to use.

  • ctx – The CUDA context for this CUDA kernel.

Returns:

inline const std::string &kernel_name() const

Returns the function name of this kernel.

inline const std::string &tuning_key() const

Returns the tuning key for this kernel.