Getting started

Kernel Float is a header-only library that makes it easy to work with vector types and low-precision floating-point types, mainly focusing on CUDA kernel code.

Installation

The easiest way to use the library is to get the single header file from GitHub:

wget https://raw.githubusercontent.com/KernelTuner/kernel_float/main/single_include/kernel_float.h

Next, include this file in your program. It is convenient to define a namespace alias kf to shorten the full name kernel_float:

#include "kernel_float.h"
namespace kf = kernel_float;

Vector types

Kernel Float essentially offers a single data type kernel_float::vec<T, N> that stores N elements of type T. The simplest way to initialize a vector is using list-initialization:

kf::vec<float, 4> my_vector = {1.0f, 2.0f, 3.0f, 4.0f};

It is also possible to automatically derive the type using make_vec:

// The type will be vec<double, 3>
auto a = kf::make_vec(1.0, 2.0, 3.0);

// The type will be vec<int, 2>
auto b = kf::make_vec(7, 7);

// The type will be vec<bool, 4>
auto c = kf::make_vec(true, true, false, true);

// This does not compile!
auto d = kf::make_vec();

There are also many helper methods available to generate vectors; see the API reference. Some examples are range, fill, ones, and zeros.

// Generates [0, 1, 2, 3]
kf::vec<int, 4> a = kf::range<int, 4>();

// Generates [42.0, 42.0, 42.0, 42.0]
kf::vec<double, 4> b = kf::fill<4>(42.0);

// Generates [0, 0, 0, 0]
kf::vec<int, 4> c = kf::zeros<int, 4>();

// Generates [true, true, true, true]
kf::vec<bool, 4> d = kf::ones<bool, 4>();

You can also use the *_like functions to generate a vector based on another vector:

// Generates [1.0, 2.0, 3.0, 4.0]
kf::vec<float, 4> a = {1.0f, 2.0f, 3.0f, 4.0f};

// Generates [0.0, 0.0, 0.0, 0.0]
kf::vec<float, 4> b = kf::zeros_like(a);

// Generates [1.0, 1.0, 1.0, 1.0]
kf::vec<float, 4> c = kf::ones_like(a);

Accessing elements

Accessing elements can be done using the regular [] operator.

// Generate [0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
kf::vec<float, 6> a = kf::range<float, 6>();

// Returns 2.0
float x = a[2];

// Set element at 2 to 42.0
a[2] = 42.0;

// Returns 42.0
float y = a[2];

You can get a pointer to the vector buffer by calling data:

// Generate vector
kf::vec<float, 4> v = {1.0f, 2.0f, 3.0f, 4.0f};

float* address = v.data();

// Set element at 0 to element at 1
address[0] = address[1];

Iteration can be done by using a regular for-loop:

kf::vec<float, 4> vector = {1.0f, 2.0f, 3.0f, 4.0f};

for (float x : vector) {
  printf("x=%f\n", x);
}

Operator overloading

The arithmetic operators +, -, *, /, and % are overloaded to perform element-wise operations.

// Generate [1.0f, 2.0f, 3.0f]
kf::vec<float, 3> a = {1.0f, 2.0f, 3.0f};

// Generate [1.0f, 1.0f, 1.0f]
kf::vec<float, 3> b = kf::ones<float, 3>();

// Add them together to create [2.0f, 3.0f, 4.0f]
kf::vec<float, 3> c = a + b;

The comparison operators <, >, ==, !=, <=, >= are overloaded to perform element-wise operations. Note that the returned value is a vector containing 0s (false) and 1s (true). The element type and vector length will match the inputs.

// Generate doubles
kf::vec<double, 5> a = {4.0, -100.0, 0.0, 0.5, -3.0};

// Generate zeros
kf::vec<double, 5> zeros = kf::zeros_like(a);

// Generates [false, true, false, false, true]
kf::vec<bool, 5> result = a < zeros;

The logical operators && and || are NOT overloaded. This is because there is no method to simulate the short-circuiting behavior. Instead, the operators ! (not), & (and), | (or), and ^ (xor) are overloaded to behave as logical operators.

// Generate doubles
kf::vec<double, 5> a = {4.0, -100.0, 0.0, 0.5, -3.0};

// Generate zeros and ones
kf::vec<double, 5> zeros = kf::zeros_like(a);
kf::vec<double, 5> ones = kf::ones_like(a);

// Generates [false, false, true, true, false]
kf::vec<bool, 5> result = (a >= zeros) & (a <= ones);

// Using `&&` instead of `&` results in a compilation error!
// kf::vec<bool, 5> fail = (a >= zeros) && (a <= ones);

If the two inputs of a binary operator do not match (either element type and/or vector length), Kernel Float will automatically perform type promotion (described on the page Type Promotion). This allows our example to be simplified to just:

// Generate doubles
kf::vec<double, 5> a = {4.0, -100.0, 0.0, 0.5, -3.0};

// Generates [false, false, true, true, false]
kf::vec<bool, 5> result = (a >= 0.0) & (a <= 1.0);

Mathematical functions

Many mathematical functions (like log, sin, cos) are also available; see the API reference for the full list of functions. These always work element-wise:

// Input vector
kf::vec<float, 4> x = {0.0f, 1.0f, 2.0f, 3.0f};

// Gives [0.0, 0.84147098, 0.9092974, 0.14112001]
kf::vec<float, 4> a = kf::sin(x);

// Gives [1.0, 0.54030231, -0.41614684, -0.9899925]
kf::vec<float, 4> b = kf::cos(x);

// Gives [0.0, 1.0, 1.4142135, 1.7320508]
kf::vec<float, 4> c = kf::sqrt(x);

// Gives [1.0, 2.7182818, 7.3890561, 20.085537]
kf::vec<float, 4> d = kf::exp(x);

// Gives [0, 0, 0, 0]
kf::vec<bool, 4> e = kf::isnan(x);

In some cases, certain operations might not be natively supported by the platform for some floating-point types. In these cases, Kernel Float falls back to performing the operations in 32-bit precision.