Integrating with Hyperopt.jl
JsonGrinder.jl
and Mill.jl
went long way to simplify creation of classifiers from data stored in JSONs, but they purposefully skipped the optimization of the architecture, as we authors believe this should be handled by other special-purpose libraries. Below, we show how to use Hyperopt.jl
to do this optimization for us. Even though it is far from optimal, we can run it and forget it. The example will be based on DeviceID example, but it is quite oblivious. We start by explaining the core concepts and at the end we will include it into a full-fleged example.
First, we create a simple function, which creates a feed-forward neural networks with input dimenions idim
, nlayers
number of hidden layers, nneurons
number of neurons in hidden and output layer, fun
nonlinearity, and bnfun
nonlinearity ( nothing
meanins disabled)
function ffnn(idim, nneurons, nlayers, fun, bnfun)
c = []
for i in 1:nlayers
idim = i == 1 ? idim : nneurons
push!(c, Dense(idim, nneurons, fun))
if bnfun != nothing
push!(c, BatchNorm(nneurons, bnfun))
end
end
Chain(c...)
end
In our example, we will use Hyperband algorithm for its simplicity (and hopefully good results). It requires us to define two functions: first initializes the model, the second trains it for predefined number of iterations while supporting warm-start.
function evaluatemodel(specimen, nneurons, nlayers, fun, bnfun, η)
model = reflectinmodel(specimen,
d -> ffnn(d, nneurons, nlayers, fun, bnfun),
SegmentedMeanMax,
fsm = Dict("" => d -> Chain(ffnn(d, nneurons, nlayers, fun, bnfun)..., Dense(nneurons, 2)))
)
opt = ADAM(η)
evaluatemodel(2000, model, opt)
end
function evaluatemodel(iterations, model, opt)
ps = Flux.params(model);
train!((x...) -> loss(model, x...), ps, minibatch, opt, iterations)
e = error(validation_set)
(e, (model, opt, cby))
end
The call of Hyperband from hyperband.jl
, where we prescribed the possible values for each value (for futher details see docs of Hyperopt.jl
). (Hyperband
does not use the parameter i
, therefore I set it to zero. Parameter R
determines number of resources, which corresponds to the number of tried confgiurations created by RandomSamples
and η
determines fraction of distarded solutions (which means that 18 solutions will be discarded in the second step). The invokation of hyperband looks like
ho = @hyperopt for i=0,
sampler = Hyperband(R=27, η=3, inner=RandomSampler()),
nneurons = [8,16,32,64,128],
nlayers = [1,2,3],
fun = [relu, tanh],
# bnfun = [nothing, identity, relu, tanh],
bnfun = [nothing],
η = [1e-2,1e-3,1e-4]
if state === nothing
@show (nneurons, nlayers, fun, bnfun, η)
res = evaluatemodel(specimen, nneurons, nlayers, fun, bnfun, η)
else
res = evaluatemodel(3000, state...)
end
res
end
model, opt = ho.minimizer
and we can fine-tune the model
final = evaluatemodel(20000, model, opt)
trn = = accuracy(model, trnfiles)