KernelBuilder
-
struct KernelBuilder : public kernel_launcher::ConfigSpace
A
KernelBuilder
is essentially a blueprint that describes the information required to compile and run a CUDA kernel for a given configuration. Most methods take expressions that will be evaluated for a particularConfig
.The most important methods are:
tuning_key
: set the tuning key.problem_size
: set the problem size.block_size
: set the thread block size.grid_divisors
: calculate grid size by dividing the problem size.shared_mem
: set amount of shared memory in bytes.template_arg
: Add a template argument.compiler_flag
: Add a compiler_flag.define
: Define a preprocessor variable.
Public Functions
-
KernelBuilder(std::string kernel_name, KernelSource kernel_source, ConfigSpace space = {})
Construct a new
KernelBuilder
.- Parameters:
kernel_name – Function name of the kernel. This should be the fully qualified name of the function, i.e. including the namespace. This name should not contain template parameters (this can be added by calling
template_args
).kernel_source – The kernel source code. Can be either the file name as a string or a
KernelSource
instance.
-
KernelBuilder &tuning_key(std::string)
Set the tuning key that will be used to find wisdom files for this kernel.
- Returns:
this
-
KernelBuilder &problem_size(ProblemSize p)
Set problem size for this kernel by providing a
ProblemSize
instance.- Returns:
this
-
KernelBuilder &problem_size(TypedExpr<uint32_t> x, TypedExpr<uint32_t> y = 1, TypedExpr<uint32_t> z = 1)
Set the problem size for this kernel by providing an expression for each dimension. These expressions can contain
ArgExpr
expressions such asarg0
,arg1
, etc.- Returns:
this
-
KernelBuilder &problem_size(ProblemProcessor f)
Set the problem size for this kernel by providing a function to extract the problem size from the kernel arguments.
- Returns:
this
-
KernelBuilder &argument_processor(ArgumentsProcessor f)
Add a
ArgumentsProcessor
to the list of processors for this kernel. This processors are user-defined that can modify the kernel arguments before they are passed to the actual kernel.- Parameters:
f – The processors
- Returns:
this
-
KernelBuilder &buffer_size(ArgExpr arg, TypedExpr<size_t> len)
Set the size (in number of elements) for the given argument of this kernel. For example, the following example sets the length of the buffer given by the 5th argument (index=4) to the integer value given by the 1st argument (index=0).
builder.buffer_size(4, arg0);
Alternatively, it is recommended to use this function in combination with the
args
function for more readable variable names. For example, the following kernel takes three arguments (n
,A
,B
) where the size ofA
andB
is given by the variablen
.auto [n, A, B] = args<3>(); builder.buffer_size(A, n); builder.buffer_size(B, 2 * n);
- Parameters:
arg – The argument buffer that this size is applied to.
len – The length of the buffer in number of elements.
- Returns:
this
-
template<typename ...Ts>
inline KernelBuilder &buffers(Ts... buffers) Short-hand for using
KernelBuilder::buffer_size(...)
. For example, the following kernel takes three arguments (n
,A
,B
) where the size ofA
andB
is given by the expressionsn
and2 * n
.auto [n, A, B] = args<3>(); builder.buffers(A[n], B[2 * n]);
- Parameters:
buffers – Expressions of type
ArgBuffer
.- Returns:
this
-
KernelBuilder &block_size(TypedExpr<uint32_t> x, TypedExpr<uint32_t> y = 1, TypedExpr<uint32_t> z = 1)
Set the block size for this kernel (i.e., number of threads per thread block).
- Parameters:
x – Block size along X.
y – Block size along Y.
z – Block size along Z.
- Returns:
this
-
KernelBuilder &grid_size(TypedExpr<uint32_t> x, TypedExpr<uint32_t> y = 1, TypedExpr<uint32_t> z = 1)
Set the grid size for this kernel (i.e., number of thread blocks along each direction).
- Parameters:
x – Grid size along X.
y – Grid size along Y.
z – Grid size along Z.
- Returns:
this
-
KernelBuilder &grid_divisors(TypedExpr<uint32_t> x, TypedExpr<uint32_t> y = 1, TypedExpr<uint32_t> z = 1)
Set the grid size for this kernel (i.e., number of thread blocks along each direction) by dividing the
problem_size
by the givendivisors
. For example, if the problem size is(100, 100)
and the divisors are(5, 15)
then the grid size will be(20, 7)
.- Parameters:
x – Grid divisor along X.
y – Grid divisor along Y.
z – Grid divisor along Z.
- Returns:
this
Set the amount of shared memory in bytes.
- Returns:
this
-
KernelBuilder &define(std::string name, TypedExpr<std::string> value)
Add a preprocessor variable definition with the provided
name
andvalue
.- Returns:
this
-
KernelBuilder &include_header(KernelSource source)
Add a header files that must be preincluded during compilation.
- Returns:
this
.
-
template<typename ...Ts>
inline KernelBuilder &template_args(TypedExpr<TemplateArg> first, Ts&&... rest) Add one or more template arguments. Each argument must be convertible to an instance of
TemplateArg
.- Returns:
this
-
template<typename ...Ts>
inline KernelBuilder &template_types() Add one or more types
Ts...
as template arguments to this kernel.Short-hand for:
builder.template_args(type_of<Ts>()...);
- Returns:
this
-
template<typename T>
inline KernelBuilder &template_type() Add type
T
as a template argument to this kernel.Short-hand for:
builder.template_arg(type_of<T>());
- Returns:
this
-
template<typename ...Ts>
inline KernelBuilder &compiler_flags(TypedExpr<std::string> first, Ts&&... rest) Add one ore more compilation flags that will be passed to the compiler. Each argument must be convertible to a string.
- Returns:
this
-
inline std::array<TypedExpr<uint32_t>, 3> tune_block_size(std::vector<uint32_t> xs, std::vector<uint32_t> ys = {1u}, std::vector<uint32_t> zs = {1u})
Short-hand for:
builder.block_size( builder.tune("BLOCK_SIZE_X", xs), builder.tune("BLOCK_SIZE_Y", ys), builder.tune("BLOCK_SIZE_Z", zs));
-
template<typename T = Value>
inline TypedExpr<T> tune_define(std::string name, std::vector<T> values) Short-hand for
builder.define(name, builder.tune(name, values));
-
KernelInstance compile(const Config &config, const std::vector<TypeInfo> ¶m_types, const ICompiler &compiler = default_compiler(), CudaContextHandle ctx = CudaContextHandle::current()) const
Compile an instance of this kernel using the given configuration.
- Parameters:
config – The configuration.
param_types – The types of the parameters of this kernel.
compiler – The CUDA compiler to use.
ctx – The CUDA context for this CUDA kernel.
- Returns:
-
inline const std::string &kernel_name() const
Returns the function name of this kernel.
-
inline const std::string &tuning_key() const
Returns the tuning key for this kernel.