Reflection

Because of using a different compilation toolchain, CUDAnative.jl offers counterpart functions to the code_ functionality from Base:

Missing docstring.

Missing docstring for CUDAnative.code_llvm. Check Documenter's build log for details.

Missing docstring.

Missing docstring for CUDAnative.code_ptx. Check Documenter's build log for details.

CUDAnative.code_sass — Function

code_sass([io], f, types, cap::VersionNumber)

Prints the SASS code generated for the method matching the given generic function and type signature to io which defaults to stdout.

The following keyword arguments are supported:

cap which device to generate code for
kernel: treat the function as an entry-point kernel
verbose: enable verbose mode, which displays code generation statistics

Convenience macros

For ease of use, CUDAnative.jl also implements @device_code_ macros wrapping the above reflection functionality. These macros evaluate the expression argument, while tracing compilation and finally printing or returning the code for every invoked CUDA kernel. Do note that this evaluation can have side effects, as opposed to similarly-named @code_ macros in Base which are free of side effects.

CUDAnative.@device_code_lowered — Macro

@device_code_lowered ex

Evaluates the expression ex and returns the result of InteractiveUtils.code_lowered for every compiled GPU kernel.

See also: InteractiveUtils.@code_lowered

CUDAnative.@device_code_typed — Macro

@device_code_typed ex

Evaluates the expression ex and returns the result of InteractiveUtils.code_typed for every compiled GPU kernel.

See also: InteractiveUtils.@code_llvm

CUDAnative.@device_code_ptx — Macro

@device_code_native [io::IO=stdout, ...] ex

Evaluates the expression ex and prints the result of GPUCompiler.code_native to io for every compiled GPU kernel. For other supported keywords, see GPUCompiler.code_native.

CUDAnative.@device_code_sass — Macro

@device_code_sass [io::IO=stdout, ...] ex

Evaluates the expression ex and prints the result of CUDAnative.code_sass to io for every compiled CUDA kernel. For other supported keywords, see CUDAnative.code_sass.

CUDAnative.@device_code — Macro

@device_code dir::AbstractString=... [...] ex

Evaluates the expression ex and dumps all intermediate forms of code to the directory dir.

CUDAnative.version — Function

version()

Returns the version of the CUDA toolkit in use.

version(k::HostKernel)

Queries the PTX and SM versions a kernel was compiled for. Returns a named tuple.

CUDAnative.maxthreads — Function

maxthreads(k::HostKernel)

Queries the maximum amount of threads a kernel can use in a single block.

CUDAnative.registers — Function

registers(k::HostKernel)

Queries the register usage of a kernel.

CUDAnative.memory — Function

memory(k::HostKernel)

Queries the local, shared and constant memory usage of a compiled kernel in bytes. Returns a named tuple.

Reflection

Convenience macros

Version and related queries