GPUCompiler.CompilerConfig
— TypeCompilerConfig(target, params; kernel=true, entry_abi=:specfunc, name=nothing,
always_inline=false)
Construct a CompilerConfig
that will be used to drive compilation for the given target
and params
.
Several keyword arguments can be used to customize the compilation process:
kernel
: specifies if the function should be compiled as a kernel, or as a regular function. This is used to determine the calling convention and for validation purposes.entry_abi
: can be either:specfunc
the default, or:func
.:specfunc
expects the arguments to be passed in registers, simple return values are returned in registers as well, and complex return values are returned on the stack usingsret
, the calling convention isfastcc
. The:func
abi is simpler with a calling convention of the first argument being the function itself (to support closures), the second argument being a pointer to a vector of boxed Julia values and the third argument being the number of values, the return value will also be boxed. The:func
abi will internally call the:specfunc
abi, but is generally easier to invoke directly.name
: the name that will be used for the entrypoint function. Ifnothing
(the default), the name will be generated automatically.always_inline
specifies if the Julia front-end should inline all functions into one if possible.
Base.precompile
— Methodprecompile(job::CompilerJob)
Compile the GPUCompiler job. In particular this will run inference using the foreign abstract interpreter.
GPUCompiler.cached_compilation
— Methodcached_compilation(cache::Dict{Any}, src::MethodInstance, cfg::CompilerConfig,
compiler, linker)
Compile a method instance src
with configuration cfg
, by invoking compiler
and linker
and storing the result in cache
.
The cache
argument should be a dictionary that can be indexed using any value and store whatever the linker
function returns. The compiler
function should take a CompilerJob
and return data that can be cached across sessions (e.g., LLVM IR). This data is then forwarded, along with the CompilerJob
, to the linker
function which is allowed to create session-dependent objects (e.g., a CuModule
).
GPUCompiler.code_llvm
— Methodcode_llvm([io], job; optimize=true, raw=false, dump_module=false)
Prints the device LLVM IR generated for the given compiler job to io
(default stdout
).
The following keyword arguments are supported:
optimize
: determines if the code is optimized, which includes kernel-specific optimizations ifkernel
is trueraw
: return the raw IR including all metadatadump_module
: display the entire module instead of just the function
See also: @device_code_llvm
, InteractiveUtils.code_llvm
GPUCompiler.code_native
— Methodcode_native([io], f, types; cap::VersionNumber, kernel=false, raw=false)
Prints the native assembly generated for the given compiler job to io
(default stdout
).
The following keyword arguments are supported:
cap
which device to generate code forkernel
: treat the function as an entry-point kernelraw
: return the raw code including all metadata
See also: @device_code_native
, InteractiveUtils.code_llvm
GPUCompiler.compile
— Methodcompile(target::Symbol, job::CompilerJob;
libraries=true, optimize=true, strip=false, ...)
Compile a function f
invoked with types tt
for device capability cap
to one of the following formats as specified by the target
argument: :julia
for Julia IR, :llvm
for LLVM IR and :asm
for machine code.
The following keyword arguments are supported:
libraries
: link the GPU runtime andlibdevice
libraries (if required)optimize
: optimize the code (default: true)cleanup
: run cleanup passes on the code (default: true)strip
: strip non-functional metadata and debug information (default: false)validate
: enable optional validation of input and outputs (default: true)only_entry
: only keep the entry function, remove all others (default: false). This option is only for internal use, to implement reflection'sdump_module
.
Other keyword arguments can be found in the documentation of cufunction
.
GPUCompiler.disk_cache_enabled
— Methoddisk_cache_enabled()
Query if caching to disk is enabled.
GPUCompiler.enable_disk_cache!
— Functionenable_disk_cache!(state::Bool=true)
Activate the GPUCompiler disk cache in the current environment. You will need to restart your Julia environment for it to take effect.
The cache functionality requires Julia 1.11
GPUCompiler.methodinstance
— Functionmethodinstance(ft::Type, tt::Type, [world::UInt])
Look up the method instance that corresponds to invoking the function with type ft
with argument typed tt
. If the world
argument is specified, the look-up is static and will always return the same result. If the world
argument is not specified, the look-up is dynamic and the returned method instance will depende on the current world age. If no method is found, a MethodError
is thrown.
This function is highly optimized, and results do not need to be cached additionally.
Only use this function with concrete signatures, i.e., using the types of values you would pass at run time. For non-concrete signatures, use generic_methodinstance
instead.
GPUCompiler.@device_code
— Macro@device_code dir::AbstractString=... [...] ex
Evaluates the expression ex
and dumps all intermediate forms of code to the directory dir
.
GPUCompiler.@device_code_llvm
— Macro@device_code_llvm [io::IO=stdout, ...] ex
Evaluates the expression ex
and prints the result of InteractiveUtils.code_llvm
to io
for every compiled GPU kernel. For other supported keywords, see GPUCompiler.code_llvm
.
See also: InteractiveUtils.@code_llvm
GPUCompiler.@device_code_lowered
— Macro@device_code_lowered ex
Evaluates the expression ex
and returns the result of InteractiveUtils.code_lowered
for every compiled GPU kernel.
See also: InteractiveUtils.@code_lowered
GPUCompiler.@device_code_native
— Macro@device_code_native [io::IO=stdout, ...] ex
Evaluates the expression ex
and prints the result of GPUCompiler.code_native
to io
for every compiled GPU kernel. For other supported keywords, see GPUCompiler.code_native
.
GPUCompiler.@device_code_typed
— Macro@device_code_typed ex
Evaluates the expression ex
and returns the result of InteractiveUtils.code_typed
for every compiled GPU kernel.
See also: InteractiveUtils.@code_typed
GPUCompiler.@device_code_warntype
— Macro@device_code_warntype [io::IO=stdout] ex
Evaluates the expression ex
and prints the result of InteractiveUtils.code_warntype
to io
for every compiled GPU kernel.
See also: InteractiveUtils.@code_warntype