- · Lux

Containers

Lux.BranchLayer — Type

BranchLayer(layers...)

Takes an input x and passes it through all the layers and returns a tuple of the outputs.

Arguments

layers: A list of N Lux layers

Inputs

x: Will be directly passed to each of the layers

Returns

Tuple: (layer_1(x), layer_2(x), ..., layer_N(x))
Updated state of the layers

Parameters

Parameters of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

States

States of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

Comparison with Parallel

This is slightly different from Parallel(nothing, layers...)

If the input is a tuple, Parallel will pass each element individually to each layer
BranchLayer essentially assumes 1 input comes in and is branched out into N outputs

Example

An easy way to replicate an input to an NTuple is to do

l = BranchLayer(NoOpLayer(), NoOpLayer(), NoOpLayer())

Lux.Chain — Type

Chain(layers...; disable_optimizations::Bool = false)

Collects multiple layers / functions to be called in sequence on a given input.

Arguments

layers: A list of N Lux layers

Keyword Arguments

disable_optimizations: Prevents any structural optimization

Inputs

Input x is passed sequentially to each layer, and must conform to the input requirements of the internal layers.

Returns

Output after sequentially applying all the layers to x
Updated model states

Parameters

Parameters of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

States

States of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

Optimizations

Performs a few optimizations to generate reasonable architectures. Can be disabled using keyword argument disable_optimizations.

All sublayers are recursively optimized.
If a function f is passed as a layer and it doesn't take 3 inputs, it is converted to a WrappedFunction(f) which takes only one input.
If the layer is a Chain, it is flattened.
NoOpLayers are removed.
If there is only 1 layer (left after optimizations), then it is returned without the Chain wrapper.
If there are no layers (left after optimizations), a NoOpLayer is returned.

Example

c = Chain(Dense(2, 3, relu), BatchNorm(3), Dense(3, 2))

Lux.PairwiseFusion — Type

PairwiseFusion(connection, layers...)

x1 → layer1 → y1 ↘
                  connection → layer2 → y2 ↘
              x2 ↗                          connection → layer3 → y3
                                        x3 ↗

Arguments

connection: Takes 2 inputs and combines them
layers: AbstractExplicitLayers

Inputs

Layer behaves differently based on input type:

If the input x is a tuple of length N + 1, then the layers must be a tuple of length N. The computation is as follows

y = x[1]
for i in 1:N
    y = connection(x[i + 1], layers[i](y))
end

Any other kind of input

y = x
for i in 1:N
    y = connection(x, layers[i](y))
end

Returns

See Inputs section for how the return value is computed
Updated model state for all the contained layers

Parameters

Parameters of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

States

States of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

Lux.Parallel — Type

Parallel(connection, layers...)

Create a layer which passes an input to each path in layers, before reducing the output with connection.

Arguments

layers: A list of N Lux layers
connection: An N-argument function that is called after passing the input through each layer. If connection = nothing, we return a tuple Parallel(nothing, f, g)(x, y) = (f(x), g(y))

Inputs

x: If x is not a tuple, then return is computed as connection([l(x) for l in layers]...). Else one is passed to each layer, thus Parallel(+, f, g)(x, y) = f(x) + g(y).

Returns

See the Inputs section for how the output is computed
Updated state of the layers

Parameters

Parameters of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

States

States of each layer wrapped in a NamedTuple with fields = layer_1, layer_2, ..., layer_N

See also SkipConnection which is Parallel with one identity.

Lux.SkipConnection — Type

SkipConnection(layer, connection)

Create a skip connection which consists of a layer or Chain of consecutive layers and a shortcut connection linking the block's input to the output through a user-supplied 2-argument callable. The first argument to the callable will be propagated through the given layer while the second is the unchanged, "skipped" input.

The simplest "ResNet"-type connection is just SkipConnection(layer, +).

Arguments

layer: Layer or Chain of layers to be applied to the input
connection: A 2-argument function that takes layer(input) and the input

Inputs

x: Will be passed directly to layer

Returns

Output of connection(layer(input), input)
Updated state of layer

Parameters

Parameters of layer

States

States of layer

See Parallel for a more general implementation.

Convolutional Layers

Lux.Conv — Type

Conv(k::NTuple{N,Integer}, (in_chs => out_chs)::Pair{<:Integer,<:Integer},
     activation=identity; init_weight=glorot_uniform, init_bias=zeros32, stride=1,
     pad=0, dilation=1, groups=1, use_bias=true)

Standard convolutional layer.

Image data should be stored in WHCN order (width, height, channels, batch). In other words, a 100 x 100 RGB image would be a 100 x 100 x 3 x 1 array, and a batch of 50 would be a 100 x 100 x 3 x 50 array. This has N = 2 spatial dimensions, and needs a kernel size like (5, 5), a 2-tuple of integers. To take convolutions along N feature dimensions, this layer expects as input an array with ndims(x) == N + 2, where size(x, N + 1) == in_chs is the number of input channels, and size(x, ndims(x)) is the number of observations in a batch.

Note

Frameworks like Pytorch perform cross-correlation in their convolution layers

Arguments

k: Tuple of integers specifying the size of the convolutional kernel. Eg, for 2D convolutions length(k) == 2
in_chs: Number of input channels
out_chs: Number of input and output channels
activation: Activation Function

Keyword Arguments

init_weight: Controls the initialization of the weight parameter
init_bias: Controls the initialization of the bias parameter
stride: Should each be either single integer, or a tuple with N integers
dilation: Should each be either single integer, or a tuple with N integers
pad: Specifies the number of elements added to the borders of the data array. It can be
- a single integer for equal padding all around,
- a tuple of N integers, to apply the same padding at begin/end of each spatial dimension,
- a tuple of 2*N integers, for asymmetric padding, or
- the singleton SamePad(), to calculate padding such that size(output,d) == size(x,d) / stride (possibly rounded) for each spatial dimension.
groups: Expected to be an Int. It specifies the number of groups to divide a convolution into (set groups = in_chs for Depthwise Convolutions). in_chs and out_chs must be divisible by groups.
use_bias: Trainable bias can be disabled entirely by setting this to false.

Inputs

x: Data satisfying ndims(x) == N + 2 && size(x, N - 1) == in_chs, i.e. size(x) = (I_N, ..., I_1, C_in, N)

Returns

Output of the convolution y of size (O_N, ..., O_1, C_out, N) where

\[O_i = floor\left(\frac{I_i + pad[i] + pad[(i + N) \% length(pad)] - dilation[i] \times (k[i] - 1)}{stride[i]} + 1\right)\]

Empty NamedTuple()

Parameters

weight: Convolution kernel
bias: Bias (present if bias=true)

Dropout Layers

Lux.Dropout — Type

Dropout(p; dims=:)

Dropout layer.

Arguments

p: Probability of Dropout (if p = 0 then NoOpLayer is returned)

Keyword Arguments

To apply dropout along certain dimension(s), specify the dims keyword. e.g. Dropout(p; dims = 3) will randomly zero out entire channels on WHCN input (also called 2D dropout).

Inputs

x: Must be an AbstractArray

Returns

x with dropout mask applied if training=Val(true) else just x
State with updated rng

States

rng: Pseudo Random Number Generator
training: Used to check if training/inference mode

Call Lux.testmode to switch to test mode.

Pooling Layers

Lux.AdaptiveMaxPool — Type

AdaptiveMaxPool(out::NTuple)

Adaptive Max Pooling layer. Calculates the necessary window size such that its output has size(y)[1:N] == out.

Arguments

out: Size of the first N dimensions for the output

Inputs

x: Expects as input an array with ndims(x) == N+2, i.e. channel and batch dimensions, after the N feature dimensions, where N = length(out).

Returns

Output of size (out..., C, N)
Empty NamedTuple()

Recurrent Layers

Warning

Recurrent Layers API should be considered Experimental at this point

Lux.GRUCell — Type

GRUCell((in_dims, out_dims)::Pair{<:Int,<:Int};
        init_weight::Tuple{Function,Function,Function}=(glorot_uniform, glorot_uniform,
                                                        glorot_uniform),
        init_bias::Tuple{Function,Function,Function}=(zeros32, zeros32, zeros32),
        init_state::Function=zeros32)

Gated Recurrent Unit (GRU) Cell

\[\begin{align} r &= \sigma(W_{ir} \times x + W_{hr} \times h_{prev} + b_{hr})\\ z &= \sigma(W_{iz} \times x + W_{hz} \times h_{prev} + b_{hz})\\ n &= \sigma(W_{in} \times x + b_{in} + r \cdot (W_{hn} \times h_{prev} + b_{hn}))\\ h_{new} &= (1 - z) \cdot n + z \cdot h_{prev} \end{align}\]

Arguments

in_dims: Input Dimension
out_dims: Output (Hidden State) Dimension
init_bias: Initializer for bias. Must be a tuple containing 3 functions
init_weight: Initializer for weight. Must be a tuple containing 3 functions
init_state: Initializer for hidden state

Inputs

Case 1: Only a single input x of shape (in_dims, batch_size) - Creates a hidden state using init_state and proceeds to Case 2.
Case 2: Tuple (x, h) is provided, then the updated hidden state is returned.

Returns

New hidden state $h_{new}$ of shape (out_dims, batch_size)
Updated model state

Parameters

weight_i: Concatenated Weights to map from input space $\\left\\{ W_{ir}, W_{iz}, W_{in} \\right\\}$.
weight_h: Concatenated Weights to map from hidden space $\\left\\{ W_{hr}, W_{hz}, W_{hn} \\right\\}$
bias_i: Bias vector ($b_{in}$)
bias_h: Concatenated Bias vector for the hidden space $\\left\\{ b_{hr}, b_{hz}, b_{hn} \\right\\}$

States

rng: Controls the randomness (if any) in the initial state generation

Lux.LSTMCell — Type

LSTMCell(in_dims => out_dims; init_weight=(glorot_uniform, glorot_uniform,
                                           glorot_uniform, glorot_uniform),
         init_bias=(zeros32, zeros32, ones32, zeros32), init_state=zeros32)

Long Short-Term (LSTM) Cell

\[\begin{align} i &= \sigma(W_{ii} \times x + W_{hi} \times h_{prev} + b_{i})\\ f &= \sigma(W_{if} \times x + W_{hf} \times h_{prev} + b_{f})\\ g &= tanh(W_{ig} \times x + W_{hg} \times h_{prev} + b_{g})\\ o &= \sigma(W_{io} \times x + W_{ho} \times h_{prev} + b_{o})\\ c_{new} &= f \cdot c_{prev} + i \cdot g\\ h_{new} &= o \cdot tanh(c_{new}) \end{align}\]

Arguments

in_dims: Input Dimension
out_dims: Output (Hidden State & Memory) Dimension
init_bias: Initializer for bias. Must be a tuple containing 4 functions
init_weight: Initializer for weight. Must be a tuple containing 4 functions
init_state: Initializer for hidden state and memory

Inputs

Case 1: Only a single input x of shape (in_dims, batch_size) - Creates a hidden

state and memory using init_state and proceeds to Case 2.

Case 2: Tuple (x, h, c) is provided, then the updated hidden state and memory is

returned.

Returns

Tuple Containing
- New hidden state $h_{new}$ of shape (out_dims, batch_size)
- Updated Memory $c_{new}$ of shape (out_dims, batch_size)
Updated model state

Parameters

weight_i: Concatenated Weights to map from input space $\left\{ W_{ii}, W_{if}, W_{ig}, W_{io} \right\}$.
weight_h: Concatenated Weights to map from hidden space $\left\{ W_{hi}, W_{hf}, W_{hg}, W_{ho} \right\}$
bias: Bias vector

States

rng: Controls the randomness (if any) in the initial state generation

Lux.RNNCell — Type

RNNCell(in_dims => out_dims, activation=tanh; bias::Bool=true, init_bias=zeros32,
        init_weight=glorot_uniform, init_state=ones32)

An Elman RNNCell cell with activation (typically set to tanh or relu).

$h_{new} = activation(weight_{ih} \times x + weight_{hh} \times h_{prev} + bias)$

Arguments

in_dims: Input Dimension
out_dims: Output (Hidden State) Dimension
activation: Activation function
bias: Set to false to deactivate bias
init_bias: Initializer for bias
init_weight: Initializer for weight
init_state: Initializer for hidden state

Inputs

Case 1: Only a single input x of shape (in_dims, batch_size) - Creates a hidden state using init_state and proceeds to Case 2.
Case 2: Tuple (x, h) is provided, then the updated hidden state is returned.

Returns

New hidden state $h_{new}$ of shape (out_dims, batch_size)
Updated model state

Parameters

weight_ih: Maps the input to the hidden state.
weight_hh: Maps the hidden state to the hidden state.
bias: Bias vector (not present if bias=false)

States

rng: Controls the randomness (if any) in the initial state generation

Linear Layers

Lux.Dense — Type

Dense(in_dims => out_dims, activation=identity; init_weight=glorot_uniform,
      init_bias=zeros32, bias::Bool=true)

Create a traditional fully connected layer, whose forward pass is given by: y = activation.(weight * x .+ bias)

Arguments

in_dims: number of input dimensions
out_dims: number of output dimensions
activation: activation function

Keyword Arguments

init_weight: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims))
init_bias: initializer for the bias vector (ignored if bias=false)
bias: whether to include a bias vector

Input

x must be a Matrix of size in_dims × B or a Vector of length in_dims

Returns

Matrix of size out_dims × B or a Vector of length out_dims
Empty NamedTuple()

Parameters

weight: Weight Matrix of size out_dims × in_dims
bias: Bias of size out_dims × 1 (present if bias=true)

Lux.Scale — Type

Scale(dims, activation=identity; init_weight=ones32, init_bias=zeros32, bias::Bool=true)

Create a Sparsely Connected Layer with a very specific structure (only Diagonal Elements are non-zero). The forward pass is given by: y = activation.(weight .* x .+ bias)

Arguments

dims: size of the learnable scale and bias parameters.
activation: activation function

Keyword Arguments

init_weight: initializer for the weight matrix (weight = init_weight(rng, out_dims, in_dims))
init_bias: initializer for the bias vector (ignored if bias=false)
bias: whether to include a bias vector

Input

x must be an Array of size (dims..., B) or (dims...[0], ..., dims[k]) for k ≤ size(dims)

Returns

Array of size (dims..., B) or (dims...[0], ..., dims[k]) for k ≤ size(dims)
Empty NamedTuple()

Parameters

weight: Weight Array of size (dims...)
bias: Bias of size (dims...)

Lux 0.4.3

Scale with multiple dimensions requires at least Lux 0.4.3.

Misc. Helper Layers

Lux.ActivationFunction — Function

ActivationFunction(f)

Broadcast f on the input.

Arguments

f: Activation function

Inputs

x: Any array type s.t. f can be broadcasted over it

Returns

Broadcasted Activation f.(x)
Empty NamedTuple()

Warning

This layer is deprecated and will be removed in v0.5. Use WrappedFunction with manual broadcasting

Lux.FlattenLayer — Type

FlattenLayer()

Flattens the passed array into a matrix.

Inputs

x: AbstractArray

Returns

AbstractMatrix of size (:, size(x, ndims(x)))
Empty NamedTuple()

Lux.NoOpLayer — Type

NoOpLayer()

As the name suggests does nothing but allows pretty printing of layers. Whatever input is passed is returned.

Lux.ReshapeLayer — Type

ReshapeLayer(dims)

Reshapes the passed array to have a size of (dims..., :)

Arguments

dims: The new dimensions of the array (excluding the last dimension).

Inputs

x: AbstractArray of any shape which can be reshaped in (dims..., size(x, ndims(x)))

Returns

AbstractArray of size (dims..., size(x, ndims(x)))
Empty NamedTuple()

Lux.SelectDim — Type

SelectDim(dim, i)

Return a view of all the data of the input x where the index for dimension dim equals i. Equivalent to view(x,:,:,...,i,:,:,...) where i is in position d.

Arguments

dim: Dimension for indexing
i: Index for dimension dim

Inputs

x: AbstractArray that can be indexed with view(x,:,:,...,i,:,:,...)

Returns

view(x,:,:,...,i,:,:,...) where i is in position d
Empty NamedTuple()

Lux.WrappedFunction — Type

WrappedFunction(f)

Wraps a stateless and parameter less function. Might be used when a function is added to Chain. For example, Chain(x -> relu.(x)) would not work and the right thing to do would be Chain((x, ps, st) -> (relu.(x), st)). An easier thing to do would be Chain(WrappedFunction(Base.Fix1(broadcast, relu)))

Arguments

f::Function: A stateless and parameterless function

Inputs

x: s.t hasmethod(f, (typeof(x),)) is true

Returns

Output of f(x)
Empty NamedTuple()

Normalization Layers

Lux.BatchNorm — Type

BatchNorm(chs::Integer, activation=identity; init_bias=zeros32, init_scale=ones32,
          affine=true, track_stats=true, epsilon=1f-5, momentum=0.1f0)

Batch Normalization layer.

BatchNorm computes the mean and variance for each D_1 × ... × D_{N-2} × 1 × D_N input slice and normalises the input accordingly.

Arguments

chs: Size of the channel dimension in your data. Given an array with N dimensions, call the N-1th the channel dimension. For a batch of feature vectors this is just the data dimension, for WHCN images it's the usual channel dimension.
activation: After normalisation, elementwise activation activation is applied.

Keyword Arguments

If affine=true, it also applies a shift and a rescale to the input through to learnable per-channel bias and scale parameters.
- init_bias: Controls how the bias is initiliazed
- init_scale: Controls how the scale is initiliazed
If track_stats=true, accumulates mean and variance statistics in training phase that will be used to renormalize the input in test phase.
epsilon: a value added to the denominator for numerical stability
momentum: the value used for the running_mean and running_var computation

Inputs

x: Array where size(x, N - 1) = chs and ndims(x) > 2

Returns

y: Normalized Array
Update model state

Parameters

affine=true
- bias: Bias of shape (chs,)
- scale: Scale of shape (chs,)
affine=false - Empty NamedTuple()

States

Statistics if track_stats=true
- running_mean: Running mean of shape (chs,)
- running_var: Running variance of shape (chs,)
Statistics if track_stats=false
- running_mean: nothing
- running_var: nothing
training: Used to check if training/inference mode

Use Lux.testmode during inference.

Example

m = Chain(Dense(784 => 64), BatchNorm(64, relu), Dense(64 => 10), BatchNorm(10))

Upsampling

Lux.Upsample — Type

Upsample(mode = :nearest; [scale, size]) 
Upsample(scale, mode = :nearest)

Upsampling Layer.

Layer Construction

Option 1

mode: Set to :nearest, :linear, :bilinear or :trilinear

Exactly one of two keywords must be specified:

If scale is a number, this applies to all but the last two dimensions (channel and batch) of the input. It may also be a tuple, to control dimensions individually.
Alternatively, keyword size accepts a tuple, to directly specify the leading dimensions of the output.

Option 2

If scale is a number, this applies to all but the last two dimensions (channel and batch) of the input. It may also be a tuple, to control dimensions individually.
mode: Set to :nearest, :bilinear or :trilinear

Currently supported upsampling modes and corresponding NNlib's methods are:

:nearest -> NNlib.upsample_nearest
:bilinear -> NNlib.upsample_bilinear
:trilinear -> NNlib.upsample_trilinear

Inputs

x: For the input dimensions look into the documentation for the corresponding NNlib function
- As a rule of thumb, :nearest should work with arrays of arbitrary dimensions
- :bilinear works with 4D Arrays
- :trilinear works with 5D Arrays

Returns

Upsampled Input of size size or of size (I_1 x scale[1], ..., I_N x scale[N], C, N)
Empty NamedTuple()

Index

Lux.AdaptiveMaxPool
Lux.AdaptiveMeanPool
Lux.BatchNorm
Lux.BranchLayer
Lux.Chain
Lux.Conv
Lux.Dense
Lux.Dropout
Lux.FlattenLayer
Lux.GRUCell
Lux.GlobalMaxPool
Lux.GlobalMeanPool
Lux.GroupNorm
Lux.LSTMCell
Lux.MaxPool
Lux.MeanPool
Lux.NoOpLayer
Lux.PairwiseFusion
Lux.Parallel
Lux.RNNCell
Lux.ReshapeLayer
Lux.Scale
Lux.SelectDim
Lux.SkipConnection
Lux.Upsample
Lux.VariationalHiddenDropout
Lux.WeightNorm
Lux.WrappedFunction
Lux.ActivationFunction