ClimaComms

ClimaComms.ClimaComms — Module

ClimaComms

Abstracts the communications interface for the various CliMA components in order to:

support different computational backends (CPUs, GPUs)
enable the use of different backends as transports (MPI, SharedArrays, etc.), and
transparently support single or double buffering for GPUs, depending on whether the transport has the ability to access GPU memory.

Loading

Missing docstring.

Missing docstring for ClimaComms.import_required_backends. Check Documenter's build log for details.

ClimaComms.cuda_is_required — Function

cuda_is_required()

Returns a Bool indicating if CUDA should be loaded, based on the ENV["CLIMACOMMS_DEVICE"]. See ClimaComms.device for more information.

cuda_is_required() && using CUDA

ClimaComms.mpi_is_required — Function

mpi_is_required()

Returns a Bool indicating if MPI should be loaded, based on the ENV["CLIMACOMMS_CONTEXT"]. See ClimaComms.context for more information.

mpi_is_required() && using MPI

Devices

ClimaComms.AbstractDevice — Type

AbstractDevice

The base type for a device.

ClimaComms.AbstractCPUDevice — Type

AbstractCPUDevice()

Abstract device type for single-threaded and multi-threaded CPU runs.

Missing docstring.

Missing docstring for ClimaComms.CPUSingleThreading. Check Documenter's build log for details.

Missing docstring.

Missing docstring for ClimaComms.CPUMultiThreading. Check Documenter's build log for details.

ClimaComms.CUDADevice — Type

CUDADevice()

Use NVIDIA GPU accelarator

ClimaComms.device — Function

ClimaComms.device()

Determine the device to use depending on the CLIMACOMMS_DEVICE environment variable.

Allowed values:

CPU, single-threaded or multi-threaded depending on the number of threads;
CPUSingleThreaded,
CPUMultiThreaded,
CUDA.

The default is CPU.

ClimaComms.device_functional — Function

ClimaComms.device_functional(device)

Return true when the device is correctly set up.

ClimaComms.array_type — Function

ClimaComms.array_type(::AbstractDevice)

The base array type used by the specified device (currently Array or CuArray).

ClimaComms.allowscalar — Function

allowscalar(f, ::AbstractDevice, args...; kwargs...)

Device-flexible version of CUDA.@allowscalar.

Lowers to

f(args...)

for CPU devices and

CUDA.@allowscalar f(args...)

for CUDA devices.

This is usefully written with closures via

allowscalar(device) do
    f()
end

ClimaComms.@threaded — Macro

@threaded device for ... end

A threading macro that uses Julia native threading if the device is a CPUMultiThreaded type, otherwise return the original expression without Threads.@threads. This is done to avoid overhead from Threads.@threads, and the device is used (instead of checking Threads.nthreads() == 1) so that this is statically inferred.

References

https://discourse.julialang.org/t/threads-threads-with-one-thread-how-to-remove-the-overhead/58435
https://discourse.julialang.org/t/overhead-of-threads-threads/53964

ClimaComms.@time — Macro

@time device expr

Device-flexible @time.

Lowers to

@time expr

for CPU devices and

CUDA.@time expr

for CUDA devices.

ClimaComms.@elapsed — Macro

@elapsed device expr

Device-flexible @elapsed.

Lowers to

@elapsed expr

for CPU devices and

CUDA.@elapsed expr

for CUDA devices.

ClimaComms.@sync — Macro

@sync device expr

Device-flexible @sync.

Lowers to

@sync expr

for CPU devices and

CUDA.@sync expr

for CUDA devices.

An example use-case of this might be:

BenchmarkTools.@benchmark begin
    if ClimaComms.device() isa ClimaComms.CUDADevice
        CUDA.@sync begin
            launch_cuda_kernels_or_spawn_tasks!(...)
        end
    elseif ClimaComms.device() isa ClimaComms.CPUMultiThreading
        Base.@sync begin
            launch_cuda_kernels_or_spawn_tasks!(...)
        end
    end
end

If the CPU version of the above example does not leverage spawned tasks (which require using Base.sync or Threads.wait to synchronize), then you may want to simply use @cuda_sync.

ClimaComms.@cuda_sync — Macro

@cuda_sync device expr

Device-flexible CUDA.@sync.

Lowers to

expr

for CPU devices and

CUDA.@sync expr

for CUDA devices.

Contexts

ClimaComms.AbstractCommsContext — Type

AbstractCommsContext

The base type for a communications context. Each backend defines a concrete subtype of this.

ClimaComms.SingletonCommsContext — Type

SingletonCommsContext(device=device())

A singleton communications context, used for single-process runs. ClimaComms.CPU and ClimaComms.CUDA device options are currently supported.

ClimaComms.MPICommsContext — Type

MPICommsContext()
MPICommsContext(device)
MPICommsContext(device, comm)

A MPI communications context, used for distributed runs. AbstractCPUDevice and CUDADevice device options are currently supported.

ClimaComms.AbstractGraphContext — Type

AbstractGraphContext

A context for communicating between processes in a graph.

ClimaComms.context — Function

ClimaComms.context(device=device())

Construct a default communication context.

By default, it will try to determine if it is running inside an MPI environment variables are set; if so it will return a MPICommsContext; otherwise it will return a SingletonCommsContext.

Behavior can be overridden by setting the CLIMACOMMS_CONTEXT environment variable to either MPI or SINGLETON.

ClimaComms.graph_context — Function

graph_context(context::AbstractCommsContext,
              sendarray, sendlengths, sendpids,
              recvarray, recvlengths, recvpids)

Construct a communication context for exchanging neighbor data via a graph.

Arguments:

context: the communication context on which to construct the graph context.
sendarray: array containing data to send
sendlengths: list of lengths of data to send to each process ID
sendpids: list of processor IDs to send
recvarray: array to receive data into
recvlengths: list of lengths of data to receive from each process ID
recvpids: list of processor IDs to receive from

This should return an AbstractGraphContext object.

Context operations

ClimaComms.init — Function

(pid, nprocs) = init(ctx::AbstractCommsContext)

Perform any necessary initialization for the specified backend. Return a tuple of the processor ID and the number of participating processors.

ClimaComms.mypid — Function

mypid(ctx::AbstractCommsContext)

Return the processor ID.

ClimaComms.iamroot — Function

iamroot(ctx::AbstractCommsContext)

Return true if the calling processor is the root processor.

ClimaComms.nprocs — Function

nprocs(ctx::AbstractCommsContext)

Return the number of participating processors.

ClimaComms.abort — Function

abort(ctx::CC, status::Int) where {CC <: AbstractCommsContext}

Terminate the caller and all participating processors with the specified status.

Collective operations

ClimaComms.barrier — Function

barrier(ctx::CC) where {CC <: AbstractCommsContext}

Perform a global synchronization across all participating processors.

ClimaComms.reduce — Function

reduce(ctx::CC, val, op) where {CC <: AbstractCommsContext}

Perform a reduction across all participating processors, using op as the reduction operator and val as this rank's reduction value. Return the result to the first processor only.

ClimaComms.reduce! — Function

reduce!(ctx::CC, sendbuf, recvbuf, op)
reduce!(ctx::CC, sendrecvbuf, op)

Performs elementwise reduction using the operator op on the buffer sendbuf, storing the result in the recvbuf of the process. If only one sendrecvbuf buffer is provided, then the operation is performed in-place.

ClimaComms.allreduce — Function

allreduce(ctx::CC, sendbuf, op)

Performs elementwise reduction using the operator op on the buffer sendbuf, allocating a new array for the result. sendbuf can also be a scalar, in which case recvbuf will be a value of the same type.

ClimaComms.allreduce! — Function

allreduce!(ctx::CC, sendbuf, recvbuf, op)
allreduce!(ctx::CC, sendrecvbuf, op)

Performs elementwise reduction using the operator op on the buffer sendbuf, storing the result in the recvbuf of all processes in the group. Allreduce! is equivalent to a Reduce! operation followed by a Bcast!, but can lead to better performance. If only one sendrecvbuf buffer is provided, then the operation is performed in-place.

ClimaComms.bcast — Function

bcast(ctx::AbstractCommsContext, object)

Broadcast object from the root process to all other processes. The value of object on non-root processes is ignored.

Graph exchange

ClimaComms.start — Function

start(ctx::AbstractGraphContext)

Initiate graph data exchange.

ClimaComms.progress — Function

progress(ctx::AbstractGraphContext)

Drive communication. Call after start to ensure that communication proceeds asynchronously.

ClimaComms.finish — Function

finish(ctx::AbstractGraphContext)

Complete the communications step begun by start(). After this returns, data received from all neighbors will be available in the stage areas of each neighbor's receive buffer.