It is recommended to read the Motivation section first to understand the crucial ideas behind hierarchical multiple instance learning.
Nodes
Mill.jl
enables representation of arbitrarily complex tree-like hierarchies and appropriate models for these hierarchies. It defines two core abstract types:
AbstractNode
which stores data on any level of abstraction and its subtypes can be further nestedAbstractMillModel
which helps to define a corresponding model. For each specific implementation ofAbstractNode
we have one or more specificAbstractMillModel
s for processing it.
Below we will go through implementation of ArrayNode
, BagNode
and ProductNode
together with their corresponding models. It is possible to define data and model nodes for more complex behaviors (see Adding custom nodes), however, these three core types are already sufficient for a lot of tasks, for instance, representing any JSON
document and using appropriate models to convert it to a vector represention or classify it (see Processing JSONs).
ArrayNode
and ArrayModel
ArrayNode
thinly wraps an array of features (specifically any subtype of AbstractArray
):
julia> X = Float32.([1 2 3 4; 5 6 7 8])
2×4 Array{Float32,2}:
1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
julia> AN = ArrayNode(X)
2×4 ArrayNode{Array{Float32,2},Nothing}:
1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
Data carried by any AbstractNode
can be accessed with the Mill.data
function as follows:
julia> Mill.data(AN)
2×4 Array{Float32,2}:
1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
Similarly, ArrayModel
wraps any function performing operation over this array. In example below, we wrap a feature matrix X
and a Dense
model from Flux.jl
:
using Flux: Dense
julia> f = Dense(2, 3)
Dense(2, 3)
julia> AM = ArrayModel(f)
ArrayModel(Dense(2, 3))
We can apply the model now with AM(AN)
to get another ArrayNode
and verify that the feedforward layer f
is really applied:
julia> AM(AN)
3×4 ArrayNode{Array{Float32,2},Nothing}:
-0.6329817 -1.2827497 -1.9325174 -2.5822854
-0.56179756 -1.2174246 -1.8730516 -2.528679
0.047799364 0.41558656 0.7833737 1.151161
julia> f(X) == AM(AN) |> Mill.data
true
The most common interpretation of the data inside ArrayNode
s is that each column contains features of one sample and therefore the node AN
carries size(Mill.data(AN), 2)
samples. In this sense, ArrayNode
s wrap the standard machine learning problem, where each sample is represented with a vector, a matrix or a more general tensor of features. Alternatively, one can obtain a number of samples of any AbstractNode
with nobs
function from StatsBase.jl
package:
using StatsBase: nobs
julia> nobs(AN)
4
BagNode
BagNode
is represents the standard multiple instance learning problem, that is, each sample is a bag containing an arbitrary number of instances. In the simplest case, each instance is a vector:
julia> BN = BagNode(AN, [1:1, 2:3, 4:4])
BagNode with 3 obs
└── ArrayNode(2×4 Array with Float32 elements) with 4 obs
where for simplicity we used AN
from the previous example. Each BagNode
carries data
and bags
fields:
julia> Mill.data(BN)
2×4 ArrayNode{Array{Float32,2},Nothing}:
1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
julia> BN.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 2:3, 4:4])
Here, data
can be an arbitrary AbstractNode
storing representation of instances (ArrayNode
in this case) and bags
field contains information, which instances belong to which bag. In this specific case bn
stores three bags (samples). The first one consists of a single instance {[1.0, 5.0]}
(first column of AN
), the second one of two instances {[2.0, 6.0], [3.0, 7.0]}
, and the last one of a single instance {[4.0, 8.0]}
. We can see that we deal with three top-level samples (bags):
julia> nobs(BN)
3
whereas they are formed using four instances:
julia> nobs(AN)
4
In Mill.jl
, there two ways to store indices of the bag's instances:
- in
AlignedBags
structure, which accepts aVector
ofUnitRange
s and requires all bag's instances stored continuously:
julia> AlignedBags([1:3, 4:4, 5:6])
AlignedBags{Int64}(UnitRange{Int64}[1:3, 4:4, 5:6])
- and in
ScatteredBags
structure, which accepts aVector
ofVectors
s storing not necessarily contiguous indices:
julia> ScatteredBags([[3, 2, 1], [4], [6, 5]])
ScatteredBags{Int64}([[3, 2, 1], [4], [6, 5]])
The two examples above are semantically equivalent, as bags are unordered collections of instances. An empty bag with no instances is in AlignedBags
specified as empty range 0:-1
and in ScatteredBags
as an empty vector Int[]
. The constructor of BagNode
accepts directly one of these two structures and tries to automagically decide the better type in other cases.
BagModel
Each BagNode
is processed by a BagModel
, which contains two (sub)models and an aggregation operator:
julia> im = ArrayModel(Dense(2, 3))
ArrayModel(Dense(2, 3))
julia> a = max_aggregation(3)
Aggregation{Float32}:
SegmentedMax(ψ = Float32[0.0, 0.0, 0.0])
julia> bm = ArrayModel(Dense(4, 4))
ArrayModel(Dense(4, 4))
julia> BM = BagModel(im, a, bm)
BagModel … ↦ ⟨SegmentedMax(3)⟩ ↦ ArrayModel(Dense(4, 4))
└── ArrayModel(Dense(2, 3))
The first network submodel (called instance model im
) is responsible for converting the instance representation to a vector form:
julia> y = im(AN)
3×4 ArrayNode{Array{Float32,2},Nothing}:
-4.411283 -4.800618 -5.189953 -5.5792885
1.6697277 1.557303 1.4448783 1.3324537
4.13834 4.187142 4.235944 4.284745
Note that because of the property mentioned above, the output of instance model im
will always be an ArrayNode
wrapping a matrix. We get four columns, one for each instance. This result is then used in Aggregation
(a
) which takes vector representation of all instances and produces a single vector per bag:
julia> y = a(y, BN.bags)
4×3 ArrayNode{Array{Float32,2},Nothing}:
-4.411283 -4.800618 -5.5792885
1.6697277 1.557303 1.3324537
4.13834 4.235944 4.284745
0.6931472 1.0986123 0.6931472
To read more about aggregation operators and find out why there are four rows instead of three after applying the operator, see Bag aggregation section.
Finally, y
is then passed to a feed forward model (called bag model bm
) producing the final output per bag. In our example we therefore get a matrix with three columns:
julia> y = bm(y)
4×3 ArrayNode{Array{Float32,2},Nothing}:
0.08765207 0.31226996 -0.015378637
1.7760211 1.923082 2.0592246
1.0429261 1.3282981 1.4331931
-0.3872499 -0.16527262 -0.8316975
However, the best way to use a bag model node is to simply apply it, which results into the same output:
julia> BM(BN) == y
true
The whole procedure is depicted in the following picture:
Three instances of the BagNode
are represented by red subtrees are first mapped with instance model im
, aggregated (aggregation operator here is a concatenation of two different operators $a_1$ and $a_2$), and the results of aggregation are transformed with bag model bm
.
ProductNode
s and ProductModel
s
ProductNode
can be thought of as a Cartesian Product or a Dictionary
. It holds a Tuple
or NamedTuple
of nodes (not necessarily of the same type). For example, a ProductNode
with a BagNode
and an ArrayNode
as children would look like this:
julia> PN = ProductNode((a=ArrayNode(Float32.([1 2 3; 4 5 6])), b=BN))
ProductNode with 3 obs
├── a: ArrayNode(2×3 Array with Float32 elements) with 3 obs
└── b: BagNode with 3 obs
└── ArrayNode(2×4 Array with Float32 elements) with 4 obs
Analogically, the ProductModel
contains a (Named
)Tuple
of (sub)models processing each of its children (stored in ms
field standing for models), as well as one more (sub)model m
:
julia> ms = (a=AM, b=BM)
(a = ArrayModel(Dense(2, 3)), b = BagModel … ↦ ⟨SegmentedMax(3)⟩ ↦ ArrayModel(Dense(4, 4)))
julia> m = ArrayModel(Dense(7, 2))
ArrayModel(Dense(7, 2))
julia> PM = ProductModel(ms, m)
ProductModel … ↦ ArrayModel(Dense(7, 2))
├── a: ArrayModel(Dense(2, 3))
└── b: BagModel … ↦ ⟨SegmentedMax(3)⟩ ↦ ArrayModel(Dense(4, 4))
└── ArrayModel(Dense(2, 3))
Again, since the library is based on the property that the output of each model is an ArrayNode
, the product model applies models from ms
to appropriate children and vertically concatenates the output, which is then processed by model m
. An example of model processing the above sample would be:
julia> y = PM.m(vcat(PM[:a](PN[:a]), PM[:b](PN[:b])))
2×3 ArrayNode{Array{Float32,2},Nothing}:
0.14728111 -0.46532086 -0.5963221
0.5292632 1.1747541 1.2010721
which is equivalent to:
julia> PM(PN) == y
true
Application of another product model (this time with four subtrees (keys)) can be visualized as follows:
In general, we recommend to use NamedTuple
s, because the key can be used for indexing both ProductNode
s and ProductModel
s.