GPUCompiler.CompilerConfigType
CompilerConfig(target, params; kernel=true, entry_abi=:specfunc, name=nothing,
                               always_inline=false)

Construct a CompilerConfig that will be used to drive compilation for the given target and params.

Several keyword arguments can be used to customize the compilation process:

  • kernel: specifies if the function should be compiled as a kernel, or as a regular function. This is used to determine the calling convention and for validation purposes.
  • entry_abi: can be either :specfunc the default, or :func. :specfunc expects the arguments to be passed in registers, simple return values are returned in registers as well, and complex return values are returned on the stack using sret, the calling convention is fastcc. The :func abi is simpler with a calling convention of the first argument being the function itself (to support closures), the second argument being a pointer to a vector of boxed Julia values and the third argument being the number of values, the return value will also be boxed. The :func abi will internally call the :specfunc abi, but is generally easier to invoke directly.
  • name: the name that will be used for the entrypoint function. If nothing (the default), the name will be generated automatically.
  • always_inline specifies if the Julia front-end should inline all functions into one if possible.
GPUCompiler.cached_compilationMethod
cached_compilation(cache::Dict{Any}, src::MethodInstance, cfg::CompilerConfig,
                   compiler, linker)

Compile a method instance src with configuration cfg, by invoking compiler and linker and storing the result in cache.

The cache argument should be a dictionary that can be indexed using any value and store whatever the linker function returns. The compiler function should take a CompilerJob and return data that can be cached across sessions (e.g., LLVM IR). This data is then forwarded, along with the CompilerJob, to the linker function which is allowed to create session-dependent objects (e.g., a CuModule).

GPUCompiler.code_llvmMethod
code_llvm([io], job; optimize=true, raw=false, dump_module=false)

Prints the device LLVM IR generated for the given compiler job to io (default stdout).

The following keyword arguments are supported:

  • optimize: determines if the code is optimized, which includes kernel-specific optimizations if kernel is true
  • raw: return the raw IR including all metadata
  • dump_module: display the entire module instead of just the function

See also: @device_code_llvm, InteractiveUtils.code_llvm

GPUCompiler.code_nativeMethod
code_native([io], f, types; cap::VersionNumber, kernel=false, raw=false)

Prints the native assembly generated for the given compiler job to io (default stdout).

The following keyword arguments are supported:

  • cap which device to generate code for
  • kernel: treat the function as an entry-point kernel
  • raw: return the raw code including all metadata

See also: @device_code_native, InteractiveUtils.code_llvm

GPUCompiler.compileMethod
compile(target::Symbol, job::CompilerJob;
        libraries=true, optimize=true, strip=false, ...)

Compile a function f invoked with types tt for device capability cap to one of the following formats as specified by the target argument: :julia for Julia IR, :llvm for LLVM IR and :asm for machine code.

The following keyword arguments are supported:

  • libraries: link the GPU runtime and libdevice libraries (if required)
  • optimize: optimize the code (default: true)
  • cleanup: run cleanup passes on the code (default: true)
  • strip: strip non-functional metadata and debug information (default: false)
  • validate: enable optional validation of input and outputs (default: true)
  • only_entry: only keep the entry function, remove all others (default: false). This option is only for internal use, to implement reflection's dump_module.

Other keyword arguments can be found in the documentation of cufunction.

GPUCompiler.methodinstanceFunction
methodinstance(ft::Type, tt::Type, [world::UInt])

Look up the method instance that corresponds to invoking the function with type ft with argument typed tt. If the world argument is specified, the look-up is static and will always return the same result. If the world argument is not specified, the look-up is dynamic and the returned method instance will depende on the current world age. If no method is found, a MethodError is thrown.

This function is highly optimized, and results do not need to be cached additionally.

Only use this function with concrete signatures, i.e., using the types of values you would pass at run time. For non-concrete signatures, use generic_methodinstance instead.

GPUCompiler.@device_codeMacro
@device_code dir::AbstractString=... [...] ex

Evaluates the expression ex and dumps all intermediate forms of code to the directory dir.

GPUCompiler.@device_code_llvmMacro
@device_code_llvm [io::IO=stdout, ...] ex

Evaluates the expression ex and prints the result of InteractiveUtils.code_llvm to io for every compiled GPU kernel. For other supported keywords, see GPUCompiler.code_llvm.

See also: InteractiveUtils.@code_llvm

GPUCompiler.@device_code_loweredMacro
@device_code_lowered ex

Evaluates the expression ex and returns the result of InteractiveUtils.code_lowered for every compiled GPU kernel.

See also: InteractiveUtils.@code_lowered

GPUCompiler.@device_code_typedMacro
@device_code_typed ex

Evaluates the expression ex and returns the result of InteractiveUtils.code_typed for every compiled GPU kernel.

See also: InteractiveUtils.@code_typed

GPUCompiler.@device_code_warntypeMacro
@device_code_warntype [io::IO=stdout] ex

Evaluates the expression ex and prints the result of InteractiveUtils.code_warntype to io for every compiled GPU kernel.

See also: InteractiveUtils.@code_warntype