CUDASIMDTypes.jl
CUDASIMDTypes.BFloat16x2
— Typestruct BFloat16x2
A SIMD type holding 2 BFloat16 in a combined 32-bit value.
CUDASIMDTypes.Float16x2
— Typestruct Float16x2
A SIMD type holding 2 Float16 in a combined 32-bit value.
CUDASIMDTypes.Int16x2
— Typestruct Int16x2
A SIMD type holding 2 16-bit integers in a combined 32-bit value.
CUDASIMDTypes.Int2x16
— Typestruct Int2x16
A SIMD type holding 16 2-bit integers in a combined 32-bit value.
CUDASIMDTypes.Int2x4
— Typestruct Int2x4
A SIMD type holding 4 2-bit integers in a combined 8-bit value.
CUDASIMDTypes.Int4x2
— Typestruct Int4x2
A SIMD type holding 2 4-bit integers in a combined 8-bit value.
CUDASIMDTypes.Int4x8
— Typestruct Int4x8
A SIMD type holding 8 4-bit integers in a combined 32-bit value.
CUDASIMDTypes.Int8x4
— Typestruct Int8x4
A SIMD type holding 4 8-bit integers in a combined 32-bit value.
CUDASIMDTypes.bitifelse
— Methodbitifelse(cond, x, y)
Bitwise version of ifelse
.
For each bit of the output, the respective bit in cond
determines whether the respective bit of x
or of y
is selected.
CUDASIMDTypes.cvt_pack_s16
— Methodd = cvt_pack_s16(a::Int32, b::Int32)
d::UInt32
d[1] = sat(b)
d[2] = sat(a)
CUDASIMDTypes.cvt_pack_s8
— Methodd = cvt_pack_s8(a::Int32, b::Int32, c::UInt32)
d::UInt32
d[1] = sat(b)
d[2] = sat(a)
d[3] = c[1]
d[4] = c[2]
CUDASIMDTypes.cvt_pack_s8
— Methodd = cvt_pack_s8(a::Int32, b::Int32)
d::UInt32
d[1] = sat(b)
d[2] = sat(a)
d[3] = 0
d[4] = 0
CUDASIMDTypes.dp4a
— Methodd = dp4a(a::UInt32, b::UInt32, c::Int32)
d::Int32
d = a[1] * b[1] + a[2] * b[2] + a[3] * b[3] + a[4] * b[4] + c
CUDASIMDTypes.lop3
— Methodlop3(a, b, c, lut)
Arbitrary logical operation on 3 inputs.
Call the PTX prmt
instruction. This computes a bitwise logical operation on the inputs a
, b
, and c
.
See make_lop3_lut
for creating the look-up table lut
.
CUDASIMDTypes.make_lop3_lut
— Methodmake_lop3_lut(f)
Create a look-up table for lop3
.
CUDASIMDTypes.prmt
— Methodprmt(a, b, op)
Permute bytes bytes from a pair of inputs.
Call the PTX prmt
instruction. This picks four arbitrary bytes from the input values a
and b
.