CUDA driver

This section lists the package's public functionality that directly corresponds to functionality of the CUDA driver API. In general, the abstractions stay close to those of the CUDA driver API, so for more information on certain library calls you can consult the CUDA driver API reference.

The documentation is grouped according to the modules of the driver API.

Error Handling

CUDAdrv.CuErrorType
CuError(code)
CuError(code, meta)

Create a CUDA error object with error code code. The optional meta parameter indicates whether extra information, such as error logs, is known.

CUDAdrv.nameMethod
name(err::CuError)

Gets the string representation of an error code.

This name can often be used as a symbol in source code to get an instance of this error. For example:

julia> using CUDAdrv

julia> err = CuError(1)
CuError(1, ERROR_INVALID_VALUE)

julia> name(err)
"ERROR_INVALID_VALUE"

julia> CUDAdrv.ERROR_INVALID_VALUE
CuError(1, ERROR_INVALID_VALUE)
CUDAdrv.descriptionMethod
description(err::CuError)

Gets the string description of an error code.

Version Management

CUDAdrv.versionMethod
version()

Returns the CUDA version as reported by the driver.

Device Management

CUDAdrv.devicesFunction
devices()

Get an iterator for the compute devices.

CUDAdrv.nameMethod
name(dev::CuDevice)

Returns an identifier string for the device.

CUDAdrv.totalmemMethod
totalmem(dev::CuDevice)

Returns the total amount of memory (in bytes) on the device.

CUDAdrv.attributeFunction
attribute(dev::CuDevice, code)

Returns information about the device.

attribute(X, ptr::Union{Ptr,CuPtr}, attr)

Returns attribute attr about pointer ptr. The type of the returned value depends on the attribute, and as such must be passed as the X parameter.

Certain common attributes are exposed by additional convenience functions:

CUDAdrv.capabilityMethod
capability(dev::CuDevice)

Returns the compute capability of the device.

CUDAdrv.warpsizeMethod
warpsize(dev::CuDevice)

Returns the warp size (in threads) of the device.

Context Management

CUDAdrv.CuContextType
CuContext(dev::CuDevice, flags=CTX_SCHED_AUTO)
CuContext(f::Function, ...)

Create a CUDA context for device. A context on the GPU is analogous to a process on the CPU, with its own distinct address space and allocated resources. When a context is destroyed, the system cleans up the resources allocated to it.

When you are done using the context, call unsafe_destroy! to mark it for deletion, or use do-block syntax with this constructor.

Missing docstring.

Missing docstring for CUDAdrv.destroy!(::CuContext). Check Documenter's build log for details.

CUDAdrv.CuCurrentContextFunction
CuCurrentContext()

Return the current context, or nothing if there is no active context.

CUDAdrv.activateMethod
activate(ctx::CuContext)

Binds the specified CUDA context to the calling CPU thread.

CUDAdrv.deviceMethod
device()
device(ctx::CuContext)

Returns the device for a context.

Primary Context Management

CUDAdrv.CuPrimaryContextType
CuPrimaryContext(dev::CuDevice)

Create a primary CUDA context for a given device.

Each primary context is unique per device and is shared with CUDA runtime API. It is meant for interoperability with (applications using) the runtime API.

CUDAdrv.CuContextMethod
CuContext(pctx::CuPrimaryContext)

Retain the primary context on the GPU, returning a context compatible with the driver API. The primary context will be released when the returned driver context is finalized.

As these contexts are refcounted by CUDA, you should not call unsafe_destroy! on them but use unsafe_release! instead (available with do-block syntax as well).

CUDAdrv.isactiveMethod
isactive(pctx::CuPrimaryContext)

Query whether a primary context is active.

CUDAdrv.flagsMethod
flags(pctx::CuPrimaryContext)

Query the flags of a primary context.

CUDAdrv.setflags!Method
setflags!(pctx::CuPrimaryContext)

Set the flags of a primary context.

Missing docstring.

Missing docstring for CUDAdrv.unsafe_reset!(::CuPrimaryContext, ::Bool). Check Documenter's build log for details.

Module Management

CUDAdrv.CuModuleType
CuModule(data, options::Dict{CUjit_option,Any})
CuModuleFile(path, options::Dict{CUjit_option,Any})

Create a CUDA module from a data, or a file containing data. The data may be PTX code, a CUBIN, or a FATBIN.

The options is an optional dictionary of JIT options and their respective value.

Function Management

CUDAdrv.CuFunctionType
CuFunction(mod::CuModule, name::String)

Acquires a function handle from a named function in a module.

Global Variable Management

CUDAdrv.CuGlobalType
CuGlobal{T}(mod::CuModule, name::String)

Acquires a typed global variable handle from a named global in a module.

Base.eltypeMethod
eltype(var::CuGlobal)

Return the element type of a global variable object.

Base.getindexMethod
getindex(type[, elements...])

Construct a 1-d array of the specified type. This is usually called with the syntax Type[]. Element values can be specified using Type[a,b,c,...].

Examples

julia> Int8[1, 2, 3]
3-element Array{Int8,1}:
 1
 2
 3

julia> getindex(Int8, 1, 2, 3)
3-element Array{Int8,1}:
 1
 2
 3
source
Base.getindex(var::CuGlobal)

Return the current value of a global variable.

Base.setindex!Method
Base.setindex(var::CuGlobal{T}, val::T)

Set the value of a global variable to val

Linker

CUDAdrv.CuLinkType
CuLink()

Creates a pending JIT linker invocation.

CUDAdrv.add_data!Function
add_data!(link::CuLink, name::String, code::String)

Add PTX code to a pending link operation.

add_data!(link::CuLink, name::String, data::Vector{UInt8}, type::CUjitInputType)

Add object code to a pending link operation.

CUDAdrv.add_file!Function
add_file!(link::CuLink, path::String, typ::CUjitInputType)

Add data from a file to a link operation. The argument typ indicates the type of the contained data.

CUDAdrv.CuLinkImageType

The result of a linking operation.

This object keeps its parent linker object alive, as destroying a linker destroys linked images too.

CUDAdrv.completeFunction
complete(link::CuLink)

Complete a pending linker invocation, returning an output image.

CUDAdrv.CuModuleMethod
CuModule(img::CuLinkImage, ...)

Create a CUDA module from a completed linking operation. Options from CuModule apply.

Memory Management

Three kinds of memory buffers can be allocated: device memory, host memory, and unified memory. Each of these buffers can be allocated by calling alloc with the type of buffer as first argument, and freed by calling free. Certain buffers have specific methods defined.

CUDAdrv.Mem.allocMethod
Mem.alloc(DeviceBuffer, bytesize::Integer)

Allocate bytesize bytes of memory on the device. This memory is only accessible on the GPU, and requires explicit calls to unsafe_copyto!, which wraps cuMemcpy, for access on the CPU.

CUDAdrv.Mem.HostBufferType
Mem.HostBuffer
Mem.Host

A buffer of pinned memory on the CPU, possible accessible on the GPU.

CUDAdrv.Mem.allocMethod
Mem.alloc(HostBuffer, bytesize::Integer, [flags])

Allocate bytesize bytes of page-locked memory on the host. This memory is accessible from the CPU, and makes it possible to perform faster memory copies to the GPU. Furthermore, if flags is set to HOSTALLOC_DEVICEMAP the memory is also accessible from the GPU. These accesses are direct, and go through the PCI bus. If flags is set to HOSTALLOC_PORTABLE, the memory is considered mapped by all CUDA contexts, not just the one that created the memory, which is useful if the memory needs to be accessed from multiple devices. Multiple flags can be set at one time using a bytewise OR:

flags = HOSTALLOC_PORTABLE | HOSTALLOC_DEVICEMAP
CUDAdrv.Mem.registerMethod
Mem.register(HostBuffer, ptr::Ptr, bytesize::Integer, [flags])

Page-lock the host memory pointed to by ptr. Subsequent transfers to and from devices will be faster, and can be executed asynchronously. If the HOSTREGISTER_DEVICEMAP flag is specified, the buffer will also be accessible directly from the GPU. These accesses are direct, and go through the PCI bus. If the HOSTREGISTER_PORTABLE flag is specified, any CUDA context can access the memory.

CUDAdrv.Mem.allocMethod
Mem.alloc(UnifiedBuffer, bytesize::Integer, [flags::CUmemAttach_flags])

Allocate bytesize bytes of unified memory. This memory is accessible from both the CPU and GPU, with the CUDA driver automatically copying upon first access.

CUDAdrv.Mem.prefetchMethod
prefetch(::UnifiedBuffer, [bytes::Integer]; [device::CuDevice], [stream::CuStream])

Prefetches memory to the specified destination device.

CUDAdrv.Mem.adviseMethod
advise(::UnifiedBuffer, advice::CUDAdrv.CUmem_advise, [bytes::Integer]; [device::CuDevice])

Advise about the usage of a given memory range.

To work with these buffers, you need to convert them to a Ptr or CuPtr. Several methods then work with these raw pointers:

Memory info

CUDAdrv.available_memoryFunction
available_memory()

Returns the available_memory amount of memory (in bytes), available for allocation by the CUDA context.

CUDAdrv.total_memoryFunction
total_memory()

Returns the total amount of memory (in bytes), available for allocation by the CUDA context.

Stream Management

CUDAdrv.CuStreamType
CuStream(; flags=STREAM_DEFAULT, priority=nothing)

Create a CUDA stream.

CUDAdrv.synchronizeMethod
synchronize(s::CuStream)

Wait until a stream's tasks are completed.

Event Management

CUDAdrv.recordFunction
record(e::CuEvent, stream=CuDefaultStream())

Record an event on a stream.

CUDAdrv.elapsedFunction
elapsed(start::CuEvent, stop::CuEvent)

Computes the elapsed time between two events (in seconds).

CUDAdrv.@elapsedMacro
@elapsed stream ex
@elapsed ex

A macro to evaluate an expression, discarding the resulting value, instead returning the number of seconds it took to execute on the GPU, as a floating-point number.

Execution Control

CUDAdrv.CuDim3Type
CuDim3(x)

CuDim3((x,))
CuDim3((x, y))
CuDim3((x, y, x))

A type used to specify dimensions, consisting of 3 integers for respectively the x, y and z dimension. Unspecified dimensions default to 1.

Often accepted as argument through the CuDim type alias, eg. in the case of cudacall or launch, allowing to pass dimensions as a plain integer or a tuple without having to construct an explicit CuDim3 object.

CUDAdrv.cudacallFunction
cudacall(f::CuFunction, types, values...; blocks::CuDim, threads::CuDim,
         cooperative=false, shmem=0, stream=CuDefaultStream())

ccall-like interface for launching a CUDA function f on a GPU.

For example:

vadd = CuFunction(md, "vadd")
a = rand(Float32, 10)
b = rand(Float32, 10)
ad = Mem.alloc(DeviceBuffer, 10*sizeof(Float32))
unsafe_copyto!(ad, convert(Ptr{Cvoid}, a), 10*sizeof(Float32)))
bd = Mem.alloc(DeviceBuffer, 10*sizeof(Float32))
unsafe_copyto!(bd, convert(Ptr{Cvoid}, b), 10*sizeof(Float32)))
c = zeros(Float32, 10)
cd = Mem.alloc(DeviceBuffer, 10*sizeof(Float32))

cudacall(vadd, (CuPtr{Cfloat},CuPtr{Cfloat},CuPtr{Cfloat}), ad, bd, cd; threads=10)
unsafe_copyto!(convert(Ptr{Cvoid}, c), cd, 10*sizeof(Float32)))

The blocks and threads arguments control the launch configuration, and should both consist of either an integer, or a tuple of 1 to 3 integers (omitted dimensions default to 1). The types argument can contain both a tuple of types, and a tuple type, the latter being slightly faster.

CUDAdrv.launchFunction
launch(f::CuFunction; args...; blocks::CuDim=1, threads::CuDim=1,
       cooperative=false, shmem=0, stream=CuDefaultStream())

Low-level call to launch a CUDA function f on the GPU, using blocks and threads as respectively the grid and block configuration. Dynamic shared memory is allocated according to shmem, and the kernel is launched on stream stream.

Arguments to a kernel should either be bitstype, in which case they will be copied to the internal kernel parameter buffer, or a pointer to device memory.

This is a low-level call, prefer to use cudacall instead.

Profiler Control

CUDAdrv.@profileMacro
@profile ex

Run expressions while activating the CUDA profiler.

Note that this API is used to programmatically control the profiling granularity by allowing profiling to be done only on selective pieces of code. It does not perform any profiling on itself, you need external tools for that.

CUDAdrv.Profile.startFunction
start()

Enables profile collection by the active profiling tool for the current context. If profiling is already enabled, then this call has no effect.

CUDAdrv.Profile.stopFunction
stop()

Disables profile collection by the active profiling tool for the current context. If profiling is already disabled, then this call has no effect.