GPUifyLoops.contextualizeMethod
contextualize(::Dev, f)

This contexualizes the function f for a given device type Dev.

For the device CUDA(), contextualize replaces calls to math library functions. For example, cos, sin, are replaced with CUDAnative.cos, CUDAnative.sin, respectively.

The full list functions that are replaced is (:cos, :cospi, :sin, :sinpi, :tan, :acos, :asin, :atan, :cosh, :sinh, :tanh, :acosh, :asinh, :atanh, :log, :log10, :log1p, :log2, :exp, :exp2, :exp10, :expm1, :ldexp, :abs, :sqrt, :cbrt, :ceil, :floor).

Examples

function kernel!(::Dev, A, f) where {Dev}
    @setup Dev
    @loop for i in (1:size(A,1); threadIdx().x)
        A[i] = f(A[i])
    end
end

g(x) = sin(x)
kernel!(A::Array) = kernel!(CPU(), A, contextualize(CPU(), g))
kernel!(A::CuArray) =
    @cuda threads=length(A) kernel!(CUDA(), A, contextualize(CUDA(), g))

a = rand(Float32, 1024)
b, c = copy(a), CuArray(a)

kernel!(b)
kernel!(c)

@assert g.(a) ≈ b
@assert g.(a) ≈ c
GPUifyLoops.@loopMacro
@loop for i in (A; B)
    # body
end

Take a for i in (A; B) expression and on the CPU lowers it to:

for i in A
    # body
end

and on the GPU:

for i in B
    if !(i in A)
        continue
    end
    # body
end
GPUifyLoops.@scratchMacro

@scratch T Dims M

Allocates scratch memory.

  • T type of array
  • Dims is a tuple of array dimensions
  • M the number of dimensions at the tail that are implicit on the GPU
GPUifyLoops.launchMethod

launch(::Device, f, args..., kwargs...)

Launch a kernel on the GPU. kwargs are passed to @cudakwargs can be any of the compilation and runtime arguments normally passed to @cuda.

GPUifyLoops.launch_configMethod
launch_config(::F, maxthreads, args...; kwargs...)

Calculate a valid launch configuration based on the typeof(F), the maximum number of threads, the functions arguments and the particular launch configuration passed to the call.

Return a NamedTuple that has blocks, threads, shmem, and stream. All arguments are optional, but blocks and threads is recommended.