Memory read/write


template<typename T, typename I, typename M = bool, typename E = broadcast_vector_extent_type<I, M>>
inline vector<T, E> kernel_float::read(const T *ptr, const I &indices, const M &mask = true)

Load the elements from the buffer ptr at the locations specified by indices.

The mask should be a vector of booleans where true indicates that the value should be loaded and false indicates that the value should be skipped. This can be used to prevent reading out of bounds.

// Load 2 elements at data[0] and data[8], skip data[2] and data[4]
vec<T, 4> values = = read(data, make_vec(0, 2, 4, 8), make_vec(true, false, false, true));


template<typename T, typename V, typename I, typename M = bool, typename E = broadcast_vector_extent_type<V, I, M>>
inline void kernel_float::write(T *ptr, const I &indices, const V &values, const M &mask = true)

Store the elements from the vector values in the buffer ptr at the locations specified by indices.

The mask should be a vector of booleans where true indicates that the value should be store and false indicates that the value should be skipped. This can be used to prevent writing out of bounds.

// Store 2 elements at data[0] and data[8], skip data[2] and data[4]
auto values = make_vec(42, 13, 87, 12);
auto mask = make_vec(true, false, false, true);
write(data, make_vec(0, 2, 4, 8), values, mask);


template<size_t N, typename T>
inline vector<T, extent<N>> kernel_float::read(const T *ptr)

Load N elements at the location ptr[0], ptr[1], ptr[2], ....

// Load 4 elements at locations data[0], data[1], data[2], data[3]
vec<T, 4> values = read<4>(data);

// Load 4 elements at locations data[10], data[11], data[12], data[13]
vec<T, 4> values = read<4>(values + 10, data);


template<typename V, typename T>
inline void kernel_float::write(T *ptr, const V &values)

Store N elements at the location ptr[0], ptr[1], ptr[2], ....

// Store 4 elements at locations data[0], data[1], data[2], data[3]
vec<float, 4> values = {1.0f, 2.0f, 3.0f, 4.0f};
write(data, values);

// Store 4 elements at locations data[10], data[11], data[12], data[13]
write(data + 10, values);


template<size_t Align, size_t N = Align, typename T>
inline vector<T, extent<N>> kernel_float::read_aligned(const T *ptr)

Load N elements at the locations ptr[0], ptr[1], ptr[2], ....

It is assumed that ptr is maximum aligned such that all N elements can be loaded at once using a vector operation. If the pointer is not aligned, undefined behavior will occur.

// Load 4 elements at locations data[0], data[1], data[2], data[3]
vec<T, 4> values = read_aligned<4>(data);

// Load 4 elements at locations data[10], data[11], data[12], data[13]
vec<T, 4> values2 = read_aligned<4>(data + 10);


template<size_t Align, typename V, typename T>
inline void kernel_float::write_aligned(T *ptr, const V &values)

Store N elements at the locations ptr[0], ptr[1], ptr[2], ....

It is assumed that ptr is maximum aligned such that all N elements can be loaded at once using a vector operation. If the pointer is not aligned, undefined behavior will occur.

// Store 4 elements at locations data[0], data[1], data[2], data[3]
vec<float, 4> values = {1.0f, 2.0f, 3.0f, 4.0f};
write_aligned(data, values);

// Load 4 elements at locations data[10], data[11], data[12], data[13]
write_aligned(data + 10, values);


template<typename T, size_t N = (32), typename U>
inline vector_ptr<T, N, U> kernel_float::assert_aligned(U *ptr)

Creates a vector_ptr<T, N> from a raw pointer U* by asserting a specific alignment N.

Template Parameters:
  • T – The type of the elements as viewed by the user. This type may differ from U.

  • N – The alignment constraint for the vector_ptr. Defaults to KERNEL_FLOAT_MAX_ALIGNMENT.

  • U – The type of the elements pointed to by the raw pointer.


template<typename T, size_t N, typename U = T>
struct vector_ptr

A wrapper for a pointer that enables vectorized access and supports type conversions..

The vector_ptr<T, N, U> type is designed to function as if its a vec<T, N>* pointer, allowing of reading and writing vec<T, N> elements. However, the actual type of underlying storage is a pointer of type U*, where automatic conversion is performed between T and U when reading/writing items.

For example, a vector_ptr<double, N, half> is useful where the data is stored in low precision (here 16 bit) but it should be accessed as if it was in a higher precision format (here 64 bit).

Template Parameters:
  • T – The type of the elements as viewed by the user.

  • N – The alignment of T in number of elements.

  • U – The underlying storage type, defaults to T.

Public Functions

vector_ptr() = default

Default constructor sets the pointer to NULL.

inline explicit vector_ptr(pointer_type p)

Constructor from a given pointer. It is up to the user to assert that the pointer is aligned to Align elements.

template<typename T2, size_t N2>
inline vector_ptr(vector_ptr<T2, N2, U> p, enable_if_t<(N2 % N == 0), int> = {})

Constructs a vector_ptr from another vector_ptr with potentially different alignment and type. This constructor only allows conversion if the alignment of the source is greater than or equal to the alignment of the target.

template<size_t K = N>
inline vector_ref<T, K, U, N> at(size_t index) const

Accesses a reference to a vector at a specific index with optional alignment considerations.

Template Parameters:

N – The number of elements in the vector to access, defaults to the alignment.


index – The index at which to access the vector.

template<size_t K = N>
inline vector<value_type, extent<K>> read(size_t index) const

Accesses a vector at a specific index.

Template Parameters:

K – The number of elements to read, defaults to N.


index – The index from which to read the data.

inline const vector<value_type, extent<N>> operator[](size_t index) const

Shorthand for read(index).

inline const vector<value_type, extent<N>> operator*() const

Shorthand for read(0).

template<size_t K = N, typename V>
inline void write(size_t index, const V &values) const

Writes data to a specific index.

Template Parameters:
  • K – The number of elements to write, defaults to N.

  • V – The type of the values being written.

  • index – The index at which to write the data.

  • values – The vector of values to write.

inline vector_ref<T, N, U, N> operator()(size_t index) const

Shorthand for at(index). Returns a vector reference to can be used to assign to this pointer, contrary to operator[] that does not allow assignment.

inline pointer_type get() const

Gets the raw data pointer managed by this vector_ptr.