High level interface to cuDNN functions
Deniz Yuret, Nov 6, 2020
The goal of the high-level interface is to map the low level cuDNN calls to more natural Julia functions. Here are some design choices I followed:
Naming: We try to keep the same function, argument, and type names from the cuDNN
library in the high level interface. The wrappers for descriptors drop the _t
suffix,
e.g. cudnnPoolingDescriptor_t => cudnnPoolingDescriptor
.
Descriptors: The cuDNN functions take data and operator descriptors. Most of these
descriptors are relatively fast to create (~500 ns for a cudnnTensorDescriptor) so they may
not be worth preallocating for the user but we provide keyword options anyway. We cache
descriptors (~100 ns) so we can use them as hash keys for memoization, which also saves a
bit of memory and speed. All descriptor fields are isbits
types with the exception of the
cudnnDropoutDescriptor
which points to a random number generator state and is used as a
field of some other descriptors.
Operator descriptors: Descriptors such as cudnnPoolingDescriptor
specify the options
for an operator such as stride and padding. For operators with descriptors we have one
method that takes keyword arguments with reasonable defaults to construct the descriptor and
another method that takes a pre-initialized descriptor as its last argument. This way a
casual user can call the first method without worrying about the descriptor format, only
specifying non-default options, whereas a layer architect can keep a preset descriptor in
the layer that gets passed to the function using the second method. We try to use generic
Julia types for keyword arguments that specify default descriptor fields and convert these
to the appropriate cudnn types during descriptor construction.
Output arrays: The low level cuDNN functions take pre-allocated output arrays. The high
level interface has one Julia function that allocates its own output array
(e.g. cudnnPoolingForward
) and another with an exclamation mark that takes a pre-allocated
output array as its first argument (e.g. cudnnPoolingForward!
).
Methods: Each cuDNN forward function may have up to four methods depending on whether the descriptor and the output array are specified:
cudnnPoolingForward(x; kwargs...)
cudnnPoolingForward(x, d::cudnnPoolingDescriptor; kwargs...)
cudnnPoolingForward!(y, x; kwargs...)
cudnnPoolingForward!(y, x, d::cudnnPoolingDescriptor; kwargs...)
The conventional order of arguments for these public methods is:
([output], weights, inputs, [descriptor]; kwargs...)
AD method: Neither the high level nor the low level interface is sometimes
appropriate for gradient definitions, e.g. the low level API may not return a value, the
high level API may have some gradient target parameters as keyword arguments. To solve this
issue the API exposes an intermediate function with an AD suffix,
e.g. cudnnPoolingForwardAD
, that is called by the high level method and that makes
the low level library call. These methods may not seem like they are doing anything useful,
but they should not be removed so automatic gradient packages may make use of them.
Backward functions: The point of a high level interface is to give the user appropriate defaults for the many options of typical cudnn functions. Backward functions do not have meaningful defaults because they need to copy their options from the corresponding forward function. Therefore we do not need high level APIs for backward functions unless they are useful in some other way. See Knet/src/cudnn for example uses.
Types: Do not specify types for array arguments. Leave the high level functions generic
so they can be called with CuArray, KnetArray, AutoGrad.Param etc. Types can and should be
specified for non-array arguments. In the API we use nothing
to indicate unspecified array
argument values, convert these to C_NULL
or CU_NULL
as appropriate only at the low-level
call. Similarly for numbers the API should accept generic types like Integer
or Real
and
convert these to the appropriate specific type, e.g. Cint
or Cdouble
only at the
low-level call.
Workspace: Some functions need a temporary allocated workspace whose required size is
determined by another cudnn call. Unfortunately, the required size may depend on factors
other than the current inputs (see this
issue), so the usage
of the @workspace
macro is used at a point as close to the library call as possible. One
exception to this is cases where the same workspace will be passed to the backward call, in
which case we allocate a regular CuArray.
Training vs Inference: There is no consistent way cuDNN distinguishes training vs inference calls:
- BatchNormalization and Normalization have two separate functions:
cudnnNormalizationForwardTraining / Inference
- RNN has an indicator argument:
fwdMode
incudnnRNNForward
- MultiHeadAttn looks at the
reserveSpace
argument to decide: ifNULL
inference mode, otherwise training mode - Dropout always runs in training mode with a non-NULL
reserveSpace
(it doesn't make sense in inference mode) - Activation, convolution, pooling, softmax, optensor, addtensor, reducetensor do not make a distinction between the two modes
In the high level API we assume inference by default and let the gradient packages override when necessary. See the gradient implementations in Knet/src/cudnn for examples.
TODO:
- Keyword arg descriptor constructors.
- Test forw fns with descriptors: check for desc vs kwarg incompatibility.
- Find out about cudnnRNNSetClip_v8.
- Test with Knet.Ops20.
- Command used to test: julia17 --project -e 'using Pkg; Pkg.API.test(; test_args=
--memcheck --jobs=1 cudnn
)'