shinkansen!(loss, model, data...; state, epochs=1, [batchsize, keywords...])

This is a re-design of train!:

  • The loss function must accept the remaining arguments: loss(model, data...)
  • The optimiser state from setup must be passed to the keyword state.

By default it calls gradient(loss, model, data...) just like that. Same order as the arguments. If you specify epochs = 100, then it will do this 100 times.

But if you specify batchsize = 32, then it first makes DataLoader(data...; batchsize), and uses that to generate smaller arrays to feed to gradient. All other keywords are passed to DataLoader, e.g. to shuffle batches.

Returns the loss from every call.


X = repeat(hcat(digits.(0:3, base=2, pad=2)...), 1, 32)
Y = Flux.onehotbatch(xor.(eachrow(X)...), 0:1)

model = Chain(Dense(2 => 3, sigmoid), BatchNorm(3), Dense(3 => 2))
state = Flux.setup(Adam(0.1, (0.7, 0.95)), model)
# state = Optimisers.setup(Optimisers.Adam(0.1, (0.7, 0.95)), model)  # for now

shinkansen!(model, X, Y; state, epochs=100, batchsize=16, shuffle=true) do m, x, y
    Flux.logitcrossentropy(m(x), y)

all((softmax(model(X)) .> 0.5) .== Y)
@compact(forward::Function; name=nothing, parameters...)

Creates a layer by specifying some parameters, in the form of keywords, and (usually as a do block) a function for the forward pass. You may think of @compact as a specialized let block creating local variables that are trainable in Flux. Declared variable names may be used within the body of the forward function.

Here is a linear model:

r = @compact(w = rand(3)) do x
  w .* x
r([1, 1, 1])  # x is set to [1, 1, 1].

Here is a linear model with bias and activation:

d = @compact(in=5, out=7, W=randn(out, in), b=zeros(out), act=relu) do x
  y = W * x
  act.(y .+ b)
d(ones(5, 10))  # 7×10 Matrix as output.

Finally, here is a simple MLP:

using Flux

n_in = 1
n_out = 1
nlayers = 3

model = @compact(
  w1=Dense(n_in, 128),
  w2=[Dense(128, 128) for i=1:nlayers],
  w3=Dense(128, n_out),
) do x
  embed = act(w1(x))
  for w in w2
    embed = act(w(embed))
  out = w3(embed)
  return out

model(randn(n_in, 32))  # 1×32 Matrix as output.

We can train this model just like any Chain:

data = [([x], 2x-x^3) for x in -2:0.1f0:2]
optim = Flux.setup(Adam(), model)

for epoch in 1:1000
  Flux.train!((m,x,y) -> (m(x) - y)^2, model, data, optim)

You may also specify a name for the model, which will be used instead of the default printout, which gives a verbatim representation of the code used to construct the model:

model = @compact(w=rand(3), name="Linear(3 => 1)") do x
  sum(w .* x)
println(model)  # "Linear(3 => 1)"

This can be useful when using @compact to hierarchically construct complex models to be used inside a Chain.