Reflection
Because of using a different compilation toolchain, CUDAnative.jl offers counterpart functions to the code_
functionality from Base:
Missing docstring for CUDAnative.code_llvm
. Check Documenter's build log for details.
Missing docstring for CUDAnative.code_ptx
. Check Documenter's build log for details.
CUDAnative.code_sass
— Functioncode_sass([io], f, types, cap::VersionNumber)
Prints the SASS code generated for the method matching the given generic function and type signature to io
which defaults to stdout
.
The following keyword arguments are supported:
cap
which device to generate code forkernel
: treat the function as an entry-point kernelverbose
: enable verbose mode, which displays code generation statistics
See also: @device_code_sass
Convenience macros
For ease of use, CUDAnative.jl also implements @device_code_
macros wrapping the above reflection functionality. These macros evaluate the expression argument, while tracing compilation and finally printing or returning the code for every invoked CUDA kernel. Do note that this evaluation can have side effects, as opposed to similarly-named @code_
macros in Base which are free of side effects.
CUDAnative.@device_code_lowered
— Macro@device_code_lowered ex
Evaluates the expression ex
and returns the result of InteractiveUtils.code_lowered for every compiled GPU kernel.
See also: InteractiveUtils.@code_lowered
CUDAnative.@device_code_typed
— Macro@device_code_typed ex
Evaluates the expression ex
and returns the result of InteractiveUtils.code_typed for every compiled GPU kernel.
See also: InteractiveUtils.@code_typed
CUDAnative.@device_code_warntype
— Macro@device_code_warntype [io::IO=stdout] ex
Evaluates the expression ex
and prints the result of InteractiveUtils.code_warntype to io
for every compiled GPU kernel.
See also: InteractiveUtils.@code_warntype
CUDAnative.@device_code_llvm
— Macro@device_code_llvm [io::IO=stdout, ...] ex
Evaluates the expression ex
and prints the result of InteractiveUtils.codellvm to io
for every compiled GPU kernel. For other supported keywords, see [`GPUCompiler.codellvm`](@ref).
See also: InteractiveUtils.@code_llvm
CUDAnative.@device_code_ptx
— Macro@device_code_native [io::IO=stdout, ...] ex
Evaluates the expression ex
and prints the result of GPUCompiler.code_native
to io
for every compiled GPU kernel. For other supported keywords, see GPUCompiler.code_native
.
CUDAnative.@device_code_sass
— Macro@device_code_sass [io::IO=stdout, ...] ex
Evaluates the expression ex
and prints the result of CUDAnative.code_sass
to io
for every compiled CUDA kernel. For other supported keywords, see CUDAnative.code_sass
.
CUDAnative.@device_code
— Macro@device_code dir::AbstractString=... [...] ex
Evaluates the expression ex
and dumps all intermediate forms of code to the directory dir
.
Version and related queries
CUDAnative.version
— Functionversion()
Returns the version of the CUDA toolkit in use.
version(k::HostKernel)
Queries the PTX and SM versions a kernel was compiled for. Returns a named tuple.
CUDAnative.maxthreads
— Functionmaxthreads(k::HostKernel)
Queries the maximum amount of threads a kernel can use in a single block.
CUDAnative.registers
— Functionregisters(k::HostKernel)
Queries the register usage of a kernel.
CUDAnative.memory
— Functionmemory(k::HostKernel)
Queries the local, shared and constant memory usage of a compiled kernel in bytes. Returns a named tuple.