Fluxperimental.shinkansen!
— Methodshinkansen!(loss, model, data...; state, epochs=1, [batchsize, keywords...])
This is a re-design of train!
:
- The loss function must accept the remaining arguments:
loss(model, data...)
- The optimiser state from
setup
must be passed to the keywordstate
.
By default it calls gradient(loss, model, data...)
just like that. Same order as the arguments. If you specify epochs = 100
, then it will do this 100 times.
But if you specify batchsize = 32
, then it first makes DataLoader(data...; batchsize)
, and uses that to generate smaller arrays to feed to gradient
. All other keywords are passed to DataLoader
, e.g. to shuffle batches.
Returns the loss from every call.
Example
X = repeat(hcat(digits.(0:3, base=2, pad=2)...), 1, 32)
Y = Flux.onehotbatch(xor.(eachrow(X)...), 0:1)
model = Chain(Dense(2 => 3, sigmoid), BatchNorm(3), Dense(3 => 2))
state = Flux.setup(Adam(0.1, (0.7, 0.95)), model)
# state = Optimisers.setup(Optimisers.Adam(0.1, (0.7, 0.95)), model) # for now
shinkansen!(model, X, Y; state, epochs=100, batchsize=16, shuffle=true) do m, x, y
Flux.logitcrossentropy(m(x), y)
end
all((softmax(model(X)) .> 0.5) .== Y)
Fluxperimental.@compact
— Macro@compact(forward::Function; name=nothing, parameters...)
Creates a layer by specifying some parameters
, in the form of keywords, and (usually as a do
block) a function for the forward pass. You may think of @compact
as a specialized let
block creating local variables that are trainable in Flux. Declared variable names may be used within the body of the forward
function.
Here is a linear model:
r = @compact(w = rand(3)) do x
w .* x
end
r([1, 1, 1]) # x is set to [1, 1, 1].
Here is a linear model with bias and activation:
d = @compact(in=5, out=7, W=randn(out, in), b=zeros(out), act=relu) do x
y = W * x
act.(y .+ b)
end
d(ones(5, 10)) # 7×10 Matrix as output.
Finally, here is a simple MLP:
using Flux
n_in = 1
n_out = 1
nlayers = 3
model = @compact(
w1=Dense(n_in, 128),
w2=[Dense(128, 128) for i=1:nlayers],
w3=Dense(128, n_out),
act=relu
) do x
embed = act(w1(x))
for w in w2
embed = act(w(embed))
end
out = w3(embed)
return out
end
model(randn(n_in, 32)) # 1×32 Matrix as output.
We can train this model just like any Chain
:
data = [([x], 2x-x^3) for x in -2:0.1f0:2]
optim = Flux.setup(Adam(), model)
for epoch in 1:1000
Flux.train!((m,x,y) -> (m(x) - y)^2, model, data, optim)
end
You may also specify a name
for the model, which will be used instead of the default printout, which gives a verbatim representation of the code used to construct the model:
model = @compact(w=rand(3), name="Linear(3 => 1)") do x
sum(w .* x)
end
println(model) # "Linear(3 => 1)"
This can be useful when using @compact
to hierarchically construct complex models to be used inside a Chain
.