# AutoEncoderToolkit.jl

Welcome to the `AutoEncoderToolkit.jl`

documentation. This package provides a simple interface for training and using Flux.jl-based autoencoders and variational autoencoders in Julia.

## Installation

You can install `AutoEncoderToolkit.jl`

using the Julia package manager. From the Julia REPL, type `]`

to enter the Pkg REPL mode and run:

`add AutoEncoderToolkit`

## Design

The idea behind `AutoEncoderToolkit.jl`

is to take advantage of Julia's multiple dispatch to provide a simple and flexible interface for training and using different types of autoencoders. The package is designed to be modular and allow the user to easily define and test custom encoder and decoder architectures. Moreover, when it comes to variational autoencoders, `AutoEncoderToolkit.jl`

takes a probabilistic perspective, where the type of encoders and decoders defines (via multiple dispatch) the corresponding distribution used within the corresponding loss function.

For example, assume you want to train a variational autoencoder with convolutional layers in the encoder and deconvolutional layers in the decoder on the `MNIST`

dataset. You can easily do this as follows:

Let's begin by defining the encoder. For this, we will use the `JointGaussianLogEncoder`

type, which is a simple encoder that takes a `Flux.Chain`

for the shared layers between the mean and log-variance layers and two `Flux.Dense`

(or `Flux.Chain`

) layers for the last layers of the encoder.

```
# Define dimensionality of latent space
n_latent = 2
# Define number of initial channels
n_channels_init = 128
# Define convolutional layers
conv_layers = Flux.Chain(
# First convolutional layer
Flux.Conv((3, 3), 1 => n_channels_init, Flux.relu; stride=2, pad=1),
# Second convolutional layer
Flux.Conv(
(3, 3), n_channels_init => n_channels_init * 2, Flux.relu;
stride=2, pad=1
),
# Flatten the output
AutoEncoderToolkit.Flatten()
)
# Define layers for µ and log(σ)
µ_layer = Flux.Dense(n_channels_init * 2 * 7 * 7, n_latent, Flux.identity)
logσ_layer = Flux.Dense(n_channels_init * 2 * 7 * 7, n_latent, Flux.identity)
# build encoder
encoder = AutoEncoderToolkit.JointGaussianLogEncoder(conv_layers, µ_layer, logσ_layer)
```

The `Flatten`

layer is a custom layer defined in `AutoEncoderToolkit.jl`

that flattens the output into a 1D vector. This flattening operation is necessary because the output of the convolutional layers is a 4D tensor, while the input to the `µ`

and `log(σ)`

layers is a 1D vector. The custom layer is needed to be able to save the model and load it later as `BSON`

and `JLD2`

do not play well with anonymous functions.

For the decoder, given the binary nature of the `MNIST`

dataset, we expect the output to be a `Bernoulli`

distribution. We can define the decoder as follows:

```
# Define deconvolutional layers
deconv_layers = Flux.Chain(
# Define linear layer out of latent space
Flux.Dense(n_latent => n_channels_init * 2 * 7 * 7, Flux.identity),
# Unflatten input using custom Reshape layer
AutoEncoderToolkit.Reshape(7, 7, n_channels_init * 2, :),
# First transposed convolutional layer
Flux.ConvTranspose(
(4, 4), n_channels_init * 2 => n_channels_init, Flux.relu;
stride=2, pad=1
),
# Second transposed convolutional layer
Flux.ConvTranspose(
(4, 4), n_channels_init => 1, Flux.relu;
stride=2, pad=1
),
# Add normalization layer
Flux.BatchNorm(1, Flux.sigmoid),
)
# Define decoder
decoder = AutoEncoderToolkit.BernoulliDecoder(deconv_layers)
```

Again, the custom `Reshape`

layer is used to reshape the output of the linear layer to the shape expected by the transposed convolutional layers. This custom layer is needed to be able to save the model and load it later.

By defining the decoder as a `BernoulliDecoder`

, `AutoEncoderToolkit.jl`

already knows the log-likehood function to use when training the model. We can then simply define our variational autoencoder by combining the encoder and decoder as

```
# Define variational autoencoder
vae = encoder * decoder
```

If for any reason we were curious to explore a different distribution for the decoder, for example, a `Normal`

distribution with constant variance, it would be as simple as defining the decoder as a `SimpleGaussianDecoder`

.

```
# Define decoder with Normal likelihood function
decoder = AutoEncoderToolkit.SimpleGaussianDecoder(deconv_layers)
# Re-defining the variational autoencoder
vae = encoder * decoder
```

Everything else in our training pipeline would remain the same thanks to multiple dispatch.

Furthermore, let's say that we would like to use a different flavor for our variational autoencoder. In particular the `InfoVAE`

(also known as `MMD-VAE`

) includes extra terms in the loss function to maximize mutual information between the latent space and the input data. We can easily take our `vae`

model and convert it into a `MMDVAE`

-type object from the `MMDVAEs`

submodule as follows:

`mmdvae = AutoEncoderToolkit.MMDVAEs.MMDVAE(vae)`

This is the power of `AutoEncoderToolkit.jl`

and Julia's multiple dispatch!

## Implemented Autoencoders

model | module | description |
---|---|---|

Autoencoder | `AEs` | Vanilla deterministic autoencoder |

Variational Autoencoder | `VAEs` | Vanilla variational autoencoder |

β-VAE | `VAEs` | beta-VAE to weigh the reconstruction vs. KL divergence in ELBO |

MMD-VAEs | `MMDs` | Maximum-Mean Discrepancy Variational Autoencoders |

InfoMax-VAEs | `InfoMaxVAEs` | Information Maximization Variational Autoencoders |

Hamiltonian VAE | `HVAEs` | Hamiltonian Variational Autoencoders |

Riemannian Hamiltonian-VAE | `RHVAEs` | Riemannian-Hamiltonian Variational Autoencoder |

If you are interested in contributing to the package to add a new model, please check the GitHub repository. We are always looking to expand the list of available models. And `AutoEncoderToolkit.jl`

's structure should make it relatively easy.

## GPU support

`AutoEncoderToolkit.jl`

supports GPU training out of the box for `CUDA.jl`

-compatible GPUs. The `CUDA`

functionality is provided as an extension. Therefore, to train a model on the GPU, simply import `CUDA`

into the current environment, then move the model and data to the GPU. The rest of the training pipeline remains the same.