Boltz.ClassTokensType
ClassTokens(dim; init=Lux.zeros32)

Appends class tokens to an input with embedding dimension dim for use in many vision transformer namels.

Boltz.MultiHeadAttentionType
MultiHeadAttention(in_planes::Int, number_heads::Int; qkv_bias::Bool=false,
                   attention_dropout_rate::T=0.0f0,
                   projection_dropout_rate::T=0.0f0) where {T}

Multi-head self-attention layer

Boltz.ViPosEmbeddingType
ViPosEmbedding(embedsize, npatches;
               init = (rng, dims...) -> randn(rng, Float32, dims...))

Positional embedding layer used by many vision transformer-like namels.

Boltz._fast_chunkMethod
_fast_chunk(x::AbstractArray, ::Val{n}, ::Val{dim})

Type-stable and faster version of MLUtils.chunk

Boltz._flatten_spatialMethod
_flatten_spatial(x::AbstractArray{T, 4})

Flattens the first 2 dimensions of x, and permutes the remaining dimensions to (2, 1, 3)

Boltz._vgg_blockMethod
_vgg_block(input_filters, output_filters, depth, batchnorm)

A VGG block of convolution layers (reference).

Arguments

  • input_filters: number of input feature maps
  • output_filters: number of output feature maps
  • depth: number of convolution/convolution + batch norm layers
  • batchnorm: set to true to include batch normalization after each convolution
Boltz._vgg_classifier_layersMethod
_vgg_classifier_layers(imsize, nclasses, fcsize, dropout)

Create VGG classifier (fully connected) layers (reference).

Arguments

  • imsize: tuple (width, height, channels) indicating the size after the convolution layers (see Metalhead._vgg_convolutional_layers)
  • nclasses: number of output classes
  • fcsize: input and output size of the intermediate fully connected layer
  • dropout: the dropout level between each fully connected layer
Boltz._vgg_convolutional_layersMethod
_vgg_convolutional_layers(config, batchnorm, inchannels)

Create VGG convolution layers (reference).

Arguments

  • config: vector of tuples (output_channels, num_convolutions) for each block (see Metalhead._vgg_block)
  • batchnorm: set to true to include batch normalization after each convolution
  • inchannels: number of input channels
Boltz.transformer_encoderMethod
transformer_encoder(in_planes, depth, number_heads; mlp_ratio = 4.0f0, dropout = 0.0f0)

Transformer as used in the base ViT architecture. (reference).

Arguments

  • in_planes: number of input channels
  • depth: number of attention blocks
  • number_heads: number of attention heads
  • mlp_ratio: ratio of MLP layers to the number of input channels
  • dropout_rate: dropout rate
Boltz.vggMethod
vgg(imsize; config, inchannels, batchnorm = false, nclasses, fcsize, dropout)

Create a VGG model (reference).

Arguments

  • imsize: input image width and height as a tuple
  • config: the configuration for the convolution layers
  • inchannels: number of input channels
  • batchnorm: set to true to use batch normalization after each convolution
  • nclasses: number of output classes
  • fcsize: intermediate fully connected layer size (see Metalhead._vgg_classifier_layers)
  • dropout: dropout level between fully connected layers