`Fluxperimental.shinkansen!`

— Method`shinkansen!(loss, model, data...; state, epochs=1, [batchsize, keywords...])`

This is a re-design of `train!`

:

- The loss function must accept the remaining arguments:
`loss(model, data...)`

- The optimiser state from
`setup`

must be passed to the keyword`state`

.

By default it calls `gradient(loss, model, data...)`

just like that. Same order as the arguments. If you specify `epochs = 100`

, then it will do this 100 times.

But if you specify `batchsize = 32`

, then it first makes `DataLoader(data...; batchsize)`

, and uses that to generate smaller arrays to feed to `gradient`

. All other keywords are passed to `DataLoader`

, e.g. to shuffle batches.

Returns the loss from every call.

**Example**

```
X = repeat(hcat(digits.(0:3, base=2, pad=2)...), 1, 32)
Y = Flux.onehotbatch(xor.(eachrow(X)...), 0:1)
model = Chain(Dense(2 => 3, sigmoid), BatchNorm(3), Dense(3 => 2))
state = Flux.setup(Adam(0.1, (0.7, 0.95)), model)
# state = Optimisers.setup(Optimisers.Adam(0.1, (0.7, 0.95)), model) # for now
shinkansen!(model, X, Y; state, epochs=100, batchsize=16, shuffle=true) do m, x, y
Flux.logitcrossentropy(m(x), y)
end
all((softmax(model(X)) .> 0.5) .== Y)
```

`Fluxperimental.@compact`

— Macro`@compact(forward::Function; name=nothing, parameters...)`

Creates a layer by specifying some `parameters`

, in the form of keywords, and (usually as a `do`

block) a function for the forward pass. You may think of `@compact`

as a specialized `let`

block creating local variables that are trainable in Flux. Declared variable names may be used within the body of the `forward`

function.

Here is a linear model:

```
r = @compact(w = rand(3)) do x
w .* x
end
r([1, 1, 1]) # x is set to [1, 1, 1].
```

Here is a linear model with bias and activation:

```
d = @compact(in=5, out=7, W=randn(out, in), b=zeros(out), act=relu) do x
y = W * x
act.(y .+ b)
end
d(ones(5, 10)) # 7×10 Matrix as output.
```

Finally, here is a simple MLP:

```
using Flux
n_in = 1
n_out = 1
nlayers = 3
model = @compact(
w1=Dense(n_in, 128),
w2=[Dense(128, 128) for i=1:nlayers],
w3=Dense(128, n_out),
act=relu
) do x
embed = act(w1(x))
for w in w2
embed = act(w(embed))
end
out = w3(embed)
return out
end
model(randn(n_in, 32)) # 1×32 Matrix as output.
```

We can train this model just like any `Chain`

:

```
data = [([x], 2x-x^3) for x in -2:0.1f0:2]
optim = Flux.setup(Adam(), model)
for epoch in 1:1000
Flux.train!((m,x,y) -> (m(x) - y)^2, model, data, optim)
end
```

You may also specify a `name`

for the model, which will be used instead of the default printout, which gives a verbatim representation of the code used to construct the model:

```
model = @compact(w=rand(3), name="Linear(3 => 1)") do x
sum(w .* x)
end
println(model) # "Linear(3 => 1)"
```

This can be useful when using `@compact`

to hierarchically construct complex models to be used inside a `Chain`

.