Containers

Lux.BranchLayerType
BranchLayer(layers...)

Takes an input x and passes it through all the layers and returns a tuple of the outputs.

Arguments

  • layers: A list of N Lux layers

Inputs

  • x: Will be directly passed to each of the layers

Returns

  • Tuple: (layer_1(x), layer_2(x), ..., layer_N(x))
  • Updated state of the layers

Parameters

  • Parameters of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

States

  • States of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

Comparison with Parallel

This is slightly different from Parallel(nothing, layers...)

  • If the input is a tuple, Parallel will pass each element individually to each layer

  • BranchLayer essentially assumes 1 input comes in and is branched out into N outputs

Example

An easy way to replicate an input to an NTuple is to do

l = BranchLayer(NoOpLayer(), NoOpLayer(), NoOpLayer())
Lux.ChainType
Chain(layers...; disable_optimizations::Bool = false)

Collects multiple layers / functions to be called in sequence on a given input.

Arguments

  • layers: A list of N Lux layers

Keyword Arguments

  • disable_optimizations: Prevents any structural optimization

Inputs

Input x is passed sequentially to each layer, and must conform to the input requirements of the internal layers.

Returns

  • Output after sequentially applying all the layers to x
  • Updated model states

Parameters

  • Parameters of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

States

  • States of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

Optimizations

Performs a few optimizations to generate reasonable architectures. Can be disabled using keyword argument disable_optimizations.

  • All sublayers are recursively optimized.
  • If a function f is passed as a layer and it doesn't take 3 inputs, it is converted to a WrappedFunction(f) which takes only one input.
  • If the layer is a Chain, it is flattened.
  • NoOpLayers are removed.
  • If there is only 1 layer (left after optimizations), then it is returned without the Chain wrapper.
  • If there are no layers (left after optimizations), a NoOpLayer is returned.

Example

c = Chain(Dense(2, 3, relu), BatchNorm(3), Dense(3, 2))
Lux.PairwiseFusionType
PairwiseFusion(connection, layers...)
x1 → layer1 → y1 ↘
                  connection → layer2 → y2 ↘
              x2 ↗                          connection → layer3 → y3
                                        x3 ↗

Arguments

Inputs

Layer behaves differently based on input type:

  1. If the input x is a tuple of length N + 1, then the layers must be a tuple of length N. The computation is as follows
y = x[1]
for i in 1:N
    y = connection(x[i + 1], layers[i](y))
end
  1. Any other kind of input
y = x
for i in 1:N
    y = connection(x, layers[i](y))
end

Returns

  • See Inputs section for how the return value is computed
  • Updated model state for all the contained layers

Parameters

  • Parameters of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

States

  • States of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N
Lux.ParallelType
Parallel(connection, layers...)

Create a layer which passes an input to each path in layers, before reducing the output with connection.

Arguments

  • layers: A list of N Lux layers
  • connection: An N-argument function that is called after passing the input through each layer. If connection = nothing, we return a tuple Parallel(nothing, f, g)(x, y) = (f(x), g(y))

Inputs

  • x: If x is not a tuple, then return is computed as connection([l(x) for l in layers]...). Else one is passed to each layer, thus Parallel(+, f, g)(x, y) = f(x) + g(y).

Returns

  • See the Inputs section for how the output is computed
  • Updated state of the layers

Parameters

  • Parameters of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

States

  • States of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

See also SkipConnection which is Parallel with one identity.

Lux.SkipConnectionType
SkipConnection(layer, connection)

Create a skip connection which consists of a layer or Chain of consecutive layers and a shortcut connection linking the block's input to the output through a user-supplied 2-argument callable. The first argument to the callable will be propagated through the given layer while the second is the unchanged, "skipped" input.

The simplest "ResNet"-type connection is just SkipConnection(layer, +).

Arguments

  • layer: Layer or Chain of layers to be applied to the input
  • connection: A 2-argument function that takes layer(input) and the input

Inputs

  • x: Will be passed directly to layer

Returns

  • Output of connection(layer(input), input)
  • Updated state of layer

Parameters

  • Parameters of layer

States

  • States of layer

See Parallel for a more general implementation.

Convolutional Layers

Lux.ConvType
Conv(k::NTuple{N,Integer}, (in_chs => out_chs)::Pair{<:Integer,<:Integer},
     activation=identity; init_weight=glorot_uniform, init_bias=zeros32, stride=1,
     pad=0, dilation=1, groups=1, use_bias=true)

Standard convolutional layer.

Image data should be stored in WHCN order (width, height, channels, batch). In other words, a 100 x 100 RGB image would be a 100 x 100 x 3 x 1 array, and a batch of 50 would be a 100 x 100 x 3 x 50 array. This has N = 2 spatial dimensions, and needs a kernel size like (5, 5), a 2-tuple of integers. To take convolutions along N feature dimensions, this layer expects as input an array with ndims(x) == N + 2, where size(x, N + 1) == in_chs is the number of input channels, and size(x, ndims(x)) is the number of observations in a batch.

Note

Frameworks like Pytorch perform cross-correlation in their convolution layers

Arguments

  • k: Tuple of integers specifying the size of the convolutional kernel. Eg, for 2D convolutions length(k) == 2
  • in_chs: Number of input channels
  • out_chs: Number of input and output channels
  • activation: Activation Function

Keyword Arguments

  • init_weight: Controls the initialization of the weight parameter

  • init_bias: Controls the initialization of the bias parameter

  • stride: Should each be either single integer, or a tuple with N integers

  • dilation: Should each be either single integer, or a tuple with N integers

  • pad: Specifies the number of elements added to the borders of the data array. It can be

    • a single integer for equal padding all around,
    • a tuple of N integers, to apply the same padding at begin/end of each spatial dimension,
    • a tuple of 2*N integers, for asymmetric padding, or
    • the singleton SamePad(), to calculate padding such that size(output,d) == size(x,d) / stride (possibly rounded) for each spatial dimension.
  • groups: Expected to be an Int. It specifies the number of groups to divide a convolution into (set groups = in_chs for Depthwise Convolutions). in_chs and out_chs must be divisible by groups.

  • use_bias: Trainable bias can be disabled entirely by setting this to false.

Inputs

  • x: Data satisfying ndims(x) == N + 2 && size(x, N - 1) == in_chs, i.e. size(x) = (I_N, ..., I_1, C_in, N)

Returns

  • Output of the convolution y of size (O_N, ..., O_1, C_out, N) where

\[O_i = floor\left(\frac{I_i + pad[i] + pad[(i + N) \% length(pad)] - dilation[i] \times (k[i] - 1)}{stride[i]} + 1\right)\]

  • Empty NamedTuple()

Parameters

  • weight: Convolution kernel
  • bias: Bias (present if bias=true)

Dropout Layers

Lux.DropoutType
Dropout(p; dims=:)

Dropout layer.

Arguments

  • p: Probability of Dropout (if p = 0 then NoOpLayer is returned)

Keyword Arguments

  • To apply dropout along certain dimension(s), specify the dims keyword. e.g. Dropout(p; dims = 3) will randomly zero out entire channels on WHCN input (also called 2D dropout).

Inputs

  • x: Must be an AbstractArray

Returns

  • x with dropout mask applied if training=Val(true) else just x
  • State with updated rng

States

  • rng: Pseudo Random Number Generator
  • training: Used to check if training/inference mode

Call Lux.testmode to switch to test mode.

See also VariationalHiddenDropout

Lux.VariationalHiddenDropoutType
VariationalHiddenDropout(p; dims=:)

VariationalHiddenDropout layer. The only difference from Dropout is that the mask is retained until Lux.update_state(l, :update_mask, Val(true)) is called.

Arguments

  • p: Probability of Dropout (if p = 0 then NoOpLayer is returned)

Keyword Arguments

  • To apply dropout along certain dimension(s), specify the dims keyword. e.g. VariationalHiddenDropout(p; dims = 3) will randomly zero out entire channels on WHCN input (also called 2D dropout).

Inputs

  • x: Must be an AbstractArray

Returns

  • x with dropout mask applied if training=Val(true) else just x
  • State with updated rng

States

  • rng: Pseudo Random Number Generator
  • training: Used to check if training/inference mode
  • mask: Dropout mask. Initilly set to nothing. After every run, contains the mask applied in that call
  • update_mask: Stores whether new mask needs to be generated in the current call

Call Lux.testmode to switch to test mode.

See also Dropout

Pooling Layers

Lux.AdaptiveMaxPoolType
AdaptiveMaxPool(out::NTuple)

Adaptive Max Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == out.

Arguments

  • out: Size of the first N dimensions for the output

Inputs

  • x: Expects as input an array with ndims(x) == N+2, i.e. channel and batch dimensions, after the N feature dimensions, where N = length(out).

Returns

  • Output of size (out..., C, N)
  • Empty NamedTuple()

See also MaxPool, AdaptiveMeanPool.

Lux.AdaptiveMeanPoolType
AdaptiveMeanPool(out::NTuple)

Adaptive Mean Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == out.

Arguments

  • out: Size of the first N dimensions for the output

Inputs

  • x: Expects as input an array with ndims(x) == N+2, i.e. channel and batch dimensions, after the N feature dimensions, where N = length(out).

Returns

  • Output of size (out..., C, N)
  • Empty NamedTuple()

See also MeanPool, AdaptiveMaxPool.

Lux.GlobalMaxPoolType
GlobalMaxPool()

Global Mean Pooling layer. Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing max pooling on the complete (w,h)-shaped feature maps.

Inputs

  • x: Data satisfying ndims(x) > 2, i.e. size(x) = (I_N, ..., I_1, C, N)

Returns

  • Output of the pooling y of size (1, ..., 1, C, N)
  • Empty NamedTuple()

See also MaxPool, AdaptiveMaxPool, GlobalMeanPool

Lux.GlobalMeanPoolType
GlobalMeanPool()

Global Mean Pooling layer. Transforms (w,h,c,b)-shaped input into (1,1,c,b)-shaped output, by performing mean pooling on the complete (w,h)-shaped feature maps.

Inputs

  • x: Data satisfying ndims(x) > 2, i.e. size(x) = (I_N, ..., I_1, C, N)

Returns

  • Output of the pooling y of size (1, ..., 1, C, N)
  • Empty NamedTuple()

See also MeanPool, AdaptiveMeanPool, GlobalMaxPool

Lux.MaxPoolType
MaxPool(window::NTuple; pad=0, stride=window)

Max pooling layer, which replaces all pixels in a block of size window with the maximum value.

Arguments

  • window: Tuple of integers specifying the size of the window. Eg, for 2D pooling length(window) == 2

Keyword Arguments

  • stride: Should each be either single integer, or a tuple with N integers

  • pad: Specifies the number of elements added to the borders of the data array. It can be

    • a single integer for equal padding all around,
    • a tuple of N integers, to apply the same padding at begin/end of each spatial dimension,
    • a tuple of 2*N integers, for asymmetric padding, or
    • the singleton SamePad(), to calculate padding such that size(output,d) == size(x,d) / stride (possibly rounded) for each spatial dimension.

Inputs

  • x: Data satisfying ndims(x) == N + 2, i.e. size(x) = (I_N, ..., I_1, C, N)

Returns

  • Output of the pooling y of size (O_N, ..., O_1, C, N) where

\[ O_i = floor\left(\frac{I_i + pad[i] + pad[(i + N) \% length(pad)] - dilation[i] \times (k[i] - 1)}{stride[i]} + 1\right)\]

  • Empty NamedTuple()

See also Conv, MeanPool, GlobalMaxPool, AdaptiveMaxPool

Lux.MeanPoolType
MeanPool(window::NTuple; pad=0, stride=window)

Mean pooling layer, which replaces all pixels in a block of size window with the mean value.

Arguments

  • window: Tuple of integers specifying the size of the window. Eg, for 2D pooling length(window) == 2

Keyword Arguments

  • stride: Should each be either single integer, or a tuple with N integers

  • pad: Specifies the number of elements added to the borders of the data array. It can be

    • a single integer for equal padding all around,
    • a tuple of N integers, to apply the same padding at begin/end of each spatial dimension,
    • a tuple of 2*N integers, for asymmetric padding, or
    • the singleton SamePad(), to calculate padding such that size(output,d) == size(x,d) / stride (possibly rounded) for each spatial dimension.

Inputs

  • x: Data satisfying ndims(x) == N + 2, i.e. size(x) = (I_N, ..., I_1, C, N)

Returns

  • Output of the pooling y of size (O_N, ..., O_1, C, N) where

\[ O_i = floor\left(\frac{I_i + pad[i] + pad[(i + N) \% length(pad)] - dilation[i] \times (k[i] - 1)}{stride[i]} + 1\right)\]

  • Empty NamedTuple()

See also Conv, MaxPool, GlobalMeanPool, AdaptiveMeanPool

Recurrent Layers

Warning

Recurrent Layers API should be considered Experimental at this point

Lux.GRUCellType
GRUCell((in_dims, out_dims)::Pair{<:Int,<:Int};
        init_weight::Tuple{Function,Function,Function}=(glorot_uniform, glorot_uniform,
                                                        glorot_uniform),
        init_bias::Tuple{Function,Function,Function}=(zeros32, zeros32, zeros32),
        init_state::Function=zeros32)

Gated Recurrent Unit (GRU) Cell

\[\begin{align} r &= \sigma(W_{ir} \times x + W_{hr} \times h_{prev} + b_{hr})\\ z &= \sigma(W_{iz} \times x + W_{hz} \times h_{prev} + b_{hz})\\ n &= \sigma(W_{in} \times x + b_{in} + r \cdot (W_{hn} \times h_{prev} + b_{hn}))\\ h_{new} &= (1 - z) \cdot n + z \cdot h_{prev} \end{align}\]

Arguments

  • in_dims: Input Dimension
  • out_dims: Output (Hidden State) Dimension
  • init_bias: Initializer for bias. Must be a tuple containing 3 functions
  • init_weight: Initializer for weight. Must be a tuple containing 3 functions
  • init_state: Initializer for hidden state

Inputs

  • Case 1: Only a single input x of shape (in_dims, batch_size) - Creates a hidden state using init_state and proceeds to Case 2.
  • Case 2: Tuple (x, h) is provided, then the updated hidden state is returned.

Returns

  • New hidden state $h_{new}$ of shape (out_dims, batch_size)
  • Updated model state

Parameters

  • weight_i: Concatenated Weights to map from input space $\\left\\{ W_{ir}, W_{iz}, W_{in} \\right\\}$.
  • weight_h: Concatenated Weights to map from hidden space $\\left\\{ W_{hr}, W_{hz}, W_{hn} \\right\\}$
  • bias_i: Bias vector ($b_{in}$)
  • bias_h: Concatenated Bias vector for the hidden space $\\left\\{ b_{hr}, b_{hz}, b_{hn} \\right\\}$

States

  • rng: Controls the randomness (if any) in the initial state generation
Lux.LSTMCellType
LSTMCell(in_dims => out_dims; init_weight=(glorot_uniform, glorot_uniform,
                                           glorot_uniform, glorot_uniform),
         init_bias=(zeros32, zeros32, ones32, zeros32), init_state=zeros32)

Long Short-Term (LSTM) Cell

\[\begin{align} i &= \sigma(W_{ii} \times x + W_{hi} \times h_{prev} + b_{i})\\ f &= \sigma(W_{if} \times x + W_{hf} \times h_{prev} + b_{f})\\ g &= tanh(W_{ig} \times x + W_{hg} \times h_{prev} + b_{g})\\ o &= \sigma(W_{io} \times x + W_{ho} \times h_{prev} + b_{o})\\ c_{new} &= f \cdot c_{prev} + i \cdot g\\ h_{new} &= o \cdot tanh(c_{new}) \end{align}\]

Arguments

  • in_dims: Input Dimension
  • out_dims: Output (Hidden State & Memory) Dimension
  • init_bias: Initializer for bias. Must be a tuple containing 4 functions
  • init_weight: Initializer for weight. Must be a tuple containing 4 functions
  • init_state: Initializer for hidden state and memory

Inputs

  • Case 1: Only a single input x of shape (in_dims, batch_size) - Creates a hidden

state and memory using init_state and proceeds to Case 2.

  • Case 2: Tuple (x, h, c) is provided, then the updated hidden state and memory is

returned.

Returns

  • Tuple Containing

    • New hidden state $h_{new}$ of shape (out_dims, batch_size)
    • Updated Memory $c_{new}$ of shape (out_dims, batch_size)
  • Updated model state

Parameters

  • weight_i: Concatenated Weights to map from input space $\left\{ W_{ii}, W_{if}, W_{ig}, W_{io} \right\}$.
  • weight_h: Concatenated Weights to map from hidden space $\left\{ W_{hi}, W_{hf}, W_{hg}, W_{ho} \right\}$
  • bias: Bias vector

States

  • rng: Controls the randomness (if any) in the initial state generation
Lux.RNNCellType
RNNCell(in_dims => out_dims, activation=tanh; bias::Bool=true, init_bias=zeros32,
        init_weight=glorot_uniform, init_state=ones32)

An Elman RNNCell cell with activation (typically set to tanh or relu).

$h_{new} = activation(weight_{ih} \times x + weight_{hh} \times h_{prev} + bias)$

Arguments

  • in_dims: Input Dimension
  • out_dims: Output (Hidden State) Dimension
  • activation: Activation function
  • bias: Set to false to deactivate bias
  • init_bias: Initializer for bias
  • init_weight: Initializer for weight
  • init_state: Initializer for hidden state

Inputs

  • Case 1: Only a single input x of shape (in_dims, batch_size) - Creates a hidden state using init_state and proceeds to Case 2.
  • Case 2: Tuple (x, h) is provided, then the updated hidden state is returned.

Returns

  • New hidden state $h_{new}$ of shape (out_dims, batch_size)
  • Updated model state

Parameters

  • weight_ih: Maps the input to the hidden state.
  • weight_hh: Maps the hidden state to the hidden state.
  • bias: Bias vector (not present if bias=false)

States

  • rng: Controls the randomness (if any) in the initial state generation

Linear Layers

Lux.DenseType
Dense(in_dims => out_dims, activation=identity; init_weight=glorot_uniform,
      init_bias=zeros32, bias::Bool=true)

Create a traditional fully connected layer, whose forward pass is given by: y = activation.(weight * x .+ bias)

Arguments

  • in_dims: number of input dimensions
  • out_dims: number of output dimensions
  • activation: activation function

Keyword Arguments

  • init_weight: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims))
  • init_bias: initializer for the bias vector (ignored if bias=false)
  • bias: whether to include a bias vector

Input

  • x must be a Matrix of size in_dims × B or a Vector of length in_dims

Returns

  • Matrix of size out_dims × B or a Vector of length out_dims
  • Empty NamedTuple()

Parameters

  • weight: Weight Matrix of size out_dims × in_dims
  • bias: Bias of size out_dims × 1 (present if bias=true)
Lux.ScaleType
Scale(dims, activation=identity; init_weight=ones32, init_bias=zeros32, bias::Bool=true)

Create a Sparsely Connected Layer with a very specific structure (only Diagonal Elements are non-zero). The forward pass is given by: y = activation.(weight .* x .+ bias)

Arguments

  • dims: size of the learnable scale and bias parameters.
  • activation: activation function

Keyword Arguments

  • init_weight: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims))
  • init_bias: initializer for the bias vector (ignored if bias=false)
  • bias: whether to include a bias vector

Input

  • x must be an Array of size (dims..., B) or (dims...[0], ..., dims[k]) for k ≤ size(dims)

Returns

  • Array of size (dims..., B) or (dims...[0], ..., dims[k]) for k ≤ size(dims)
  • Empty NamedTuple()

Parameters

  • weight: Weight Array of size (dims...)
  • bias: Bias of size (dims...)
Lux 0.4.3

Scale with multiple dimensions requires at least Lux 0.4.3.

Misc. Helper Layers

Lux.ActivationFunctionFunction
ActivationFunction(f)

Broadcast f on the input.

Arguments

  • f: Activation function

Inputs

  • x: Any array type s.t. f can be broadcasted over it

Returns

  • Broadcasted Activation f.(x)
  • Empty NamedTuple()
Warning

This layer is deprecated and will be removed in v0.5. Use WrappedFunction with manual broadcasting

Lux.FlattenLayerType
FlattenLayer()

Flattens the passed array into a matrix.

Inputs

  • x: AbstractArray

Returns

  • AbstractMatrix of size (:, size(x, ndims(x)))
  • Empty NamedTuple()
Lux.NoOpLayerType
NoOpLayer()

As the name suggests does nothing but allows pretty printing of layers. Whatever input is passed is returned.

Lux.ReshapeLayerType
ReshapeLayer(dims)

Reshapes the passed array to have a size of (dims..., :)

Arguments

  • dims: The new dimensions of the array (excluding the last dimension).

Inputs

  • x: AbstractArray of any shape which can be reshaped in (dims..., size(x, ndims(x)))

Returns

  • AbstractArray of size (dims..., size(x, ndims(x)))
  • Empty NamedTuple()
Lux.SelectDimType
SelectDim(dim, i)

Return a view of all the data of the input x where the index for dimension dim equals i. Equivalent to view(x,:,:,...,i,:,:,...) where i is in position d.

Arguments

  • dim: Dimension for indexing
  • i: Index for dimension dim

Inputs

  • x: AbstractArray that can be indexed with view(x,:,:,...,i,:,:,...)

Returns

  • view(x,:,:,...,i,:,:,...) where i is in position d
  • Empty NamedTuple()
Lux.WrappedFunctionType
WrappedFunction(f)

Wraps a stateless and parameter less function. Might be used when a function is added to Chain. For example, Chain(x -> relu.(x)) would not work and the right thing to do would be Chain((x, ps, st) -> (relu.(x), st)). An easier thing to do would be Chain(WrappedFunction(Base.Fix1(broadcast, relu)))

Arguments

  • f::Function: A stateless and parameterless function

Inputs

  • x: s.t hasmethod(f, (typeof(x),)) is true

Returns

  • Output of f(x)
  • Empty NamedTuple()

Normalization Layers

Lux.BatchNormType
BatchNorm(chs::Integer, activation=identity; init_bias=zeros32, init_scale=ones32,
          affine=true, track_stats=true, epsilon=1f-5, momentum=0.1f0)

Batch Normalization layer.

BatchNorm computes the mean and variance for each D_1 × ... × D_{N-2} × 1 × D_N input slice and normalises the input accordingly.

Arguments

  • chs: Size of the channel dimension in your data. Given an array with N dimensions, call the N-1th the channel dimension. For a batch of feature vectors this is just the data dimension, for WHCN images it's the usual channel dimension.
  • activation: After normalisation, elementwise activation activation is applied.

Keyword Arguments

  • If affine=true, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.

    • init_bias: Controls how the bias is initiliazed
    • init_scale: Controls how the scale is initiliazed
  • If track_stats=true, accumulates mean and variance statistics in training phase that will be used to renormalize the input in test phase.

  • epsilon: a value added to the denominator for numerical stability

  • momentum: the value used for the running_mean and running_var computation

Inputs

  • x: Array where size(x, N - 1) = chs and ndims(x) > 2

Returns

  • y: Normalized Array
  • Update model state

Parameters

  • affine=true

    • bias: Bias of shape (chs,)
    • scale: Scale of shape (chs,)
  • affine=false - Empty NamedTuple()

States

  • Statistics if track_stats=true

    • running_mean: Running mean of shape (chs,)
    • running_var: Running variance of shape (chs,)
  • Statistics if track_stats=false

    • running_mean: nothing
    • running_var: nothing
  • training: Used to check if training/inference mode

Use Lux.testmode during inference.

Example

m = Chain(Dense(784 => 64), BatchNorm(64, relu), Dense(64 => 10), BatchNorm(10))

See also GroupNorm

Lux.GroupNormType
GroupNorm(chs::Integer, groups::Integer, activation=identity; init_bias=zeros32,
          init_scale=ones32, affine=true, track_stats=true, epsilon=1f-5,
          momentum=0.1f0)

Group Normalization layer.

Arguments

  • chs: Size of the channel dimension in your data. Given an array with N dimensions, call the N-1th the channel dimension. For a batch of feature vectors this is just the data dimension, for WHCN images it's the usual channel dimension.
  • groups is the number of groups along which the statistics are computed. The number of channels must be an integer multiple of the number of groups.
  • activation: After normalisation, elementwise activation activation is applied.

Keyword Arguments

  • If affine=true, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.

    • init_bias: Controls how the bias is initiliazed
    • init_scale: Controls how the scale is initiliazed
  • If track_stats=true, accumulates mean and variance statistics in training phase that will be used to renormalize the input in test phase. (This feature has been deprecated and will be removed in v0.5)

  • epsilon: a value added to the denominator for numerical stability

  • momentum: the value used for the running_mean and running_var computation (This feature has been deprecated and will be removed in v0.5)

Inputs

  • x: Array where size(x, N - 1) = chs and ndims(x) > 2

Returns

  • y: Normalized Array
  • Update model state

Parameters

  • affine=true

    • bias: Bias of shape (chs,)
    • scale: Scale of shape (chs,)
  • affine=false - Empty NamedTuple()

States

  • Statistics if track_stats=true (DEPRECATED)

    • running_mean: Running mean of shape (groups,)
    • running_var: Running variance of shape (groups,)
  • Statistics if track_stats=false

    • running_mean: nothing
    • running_var: nothing
  • training: Used to check if training/inference mode

Use Lux.testmode during inference.

Example

m = Chain(Dense(784 => 64), GroupNorm(64, 4, relu), Dense(64 => 10), GroupNorm(10, 5))
Warning

GroupNorm doesn't have CUDNN support. The GPU fallback is not very efficient.

See also BatchNorm

Lux.WeightNormType
WeightNorm(layer::AbstractExplicitLayer, which_params::NTuple{N,Symbol},
           dims::Union{Tuple,Nothing}=nothing)

Applies weight normalization to a parameter in the given layer.

$w = g\frac{v}{\|v\|}$

Weight normalization is a reparameterization that decouples the magnitude of a weight tensor from its direction. This updates the parameters in which_params (e.g. weight) using two parameters: one specifying the magnitude (e.g. weight_g) and one specifying the direction (e.g. weight_v).

Arguments

  • layer whose parameters are being reparameterized
  • which_params: parameter names for the parameters being reparameterized
  • By default, a norm over the entire array is computed. Pass dims to modify the dimension.

Inputs

  • x: Should be of valid type for input to layer

Returns

  • Output from layer
  • Updated model state of layer

Parameters

  • normalized: Parameters of layer that are being normalized
  • unnormalized: Parameters of layer that are not being normalized

States

  • Same as that of layer

Upsampling

Lux.UpsampleType
Upsample(mode = :nearest; [scale, size]) 
Upsample(scale, mode = :nearest)

Upsampling Layer.

Layer Construction

Option 1

  • mode: Set to :nearest, :linear, :bilinear or :trilinear

Exactly one of two keywords must be specified:

  • If scale is a number, this applies to all but the last two dimensions (channel and batch) of the input. It may also be a tuple, to control dimensions individually.
  • Alternatively, keyword size accepts a tuple, to directly specify the leading dimensions of the output.

Option 2

  • If scale is a number, this applies to all but the last two dimensions (channel and batch) of the input. It may also be a tuple, to control dimensions individually.
  • mode: Set to :nearest, :bilinear or :trilinear

Currently supported upsampling modes and corresponding NNlib's methods are:

  • :nearest -> NNlib.upsample_nearest
  • :bilinear -> NNlib.upsample_bilinear
  • :trilinear -> NNlib.upsample_trilinear

Inputs

  • x: For the input dimensions look into the documentation for the corresponding NNlib function
    • As a rule of thumb, :nearest should work with arrays of arbitrary dimensions
    • :bilinear works with 4D Arrays
    • :trilinear works with 5D Arrays

Returns

  • Upsampled Input of size size or of size (I_1 x scale[1], ..., I_N x scale[N], C, N)
  • Empty NamedTuple()

Index