Tip

It is recommended to read the Motivation section first to understand the crucial ideas behind hierarchical multiple instance learning.

Nodes

Mill.jl enables representation of arbitrarily complex tree-like hierarchies and appropriate models for these hierarchies. It defines two core abstract types:

  1. AbstractNode which stores data on any level of abstraction and its subtypes can be further nested
  2. AbstractMillModel which helps to define a corresponding model. For each specific implementation of AbstractNode we have one or more specific AbstractMillModels for processing it.

Below we will go through implementation of ArrayNode, BagNode and ProductNode together with their corresponding models. It is possible to define data and model nodes for more complex behaviors (see Adding custom nodes), however, these three core types are already sufficient for a lot of tasks, for instance, representing any JSON document and using appropriate models to convert it to a vector represention or classify it (see Processing JSONs).

ArrayNode and ArrayModel

ArrayNode thinly wraps an array of features (specifically any subtype of AbstractArray):

julia> X = Float32.([1 2 3 4; 5 6 7 8])
2×4 Array{Float32,2}:
 1.0  2.0  3.0  4.0
 5.0  6.0  7.0  8.0

julia> AN = ArrayNode(X)
2×4 ArrayNode{Array{Float32,2},Nothing}:
 1.0  2.0  3.0  4.0
 5.0  6.0  7.0  8.0

Data carried by any AbstractNode can be accessed with the Mill.data function as follows:

julia> Mill.data(AN)
2×4 Array{Float32,2}:
 1.0  2.0  3.0  4.0
 5.0  6.0  7.0  8.0

Similarly, ArrayModel wraps any function performing operation over this array. In example below, we wrap a feature matrix X and a Dense model from Flux.jl:

using Flux: Dense
julia> f = Dense(2, 3)
Dense(2, 3)

julia> AM = ArrayModel(f)
ArrayModel(Dense(2, 3))

We can apply the model now with AM(AN) to get another ArrayNode and verify that the feedforward layer f is really applied:

julia> AM(AN)
3×4 ArrayNode{Array{Float32,2},Nothing}:
 -0.6329817    -1.2827497   -1.9325174  -2.5822854
 -0.56179756   -1.2174246   -1.8730516  -2.528679
  0.047799364   0.41558656   0.7833737   1.151161

julia> f(X) == AM(AN) |> Mill.data
true
Model outputs

A convenient property of all Mill.jl models is that after applying them to a corresponding data node we always obtain an ArrayNode as output regardless of the type and complexity of the model. This becomes important later.

The most common interpretation of the data inside ArrayNodes is that each column contains features of one sample and therefore the node AN carries size(Mill.data(AN), 2) samples. In this sense, ArrayNodes wrap the standard machine learning problem, where each sample is represented with a vector, a matrix or a more general tensor of features. Alternatively, one can obtain a number of samples of any AbstractNode with nobs function from StatsBase.jl package:

using StatsBase: nobs
julia> nobs(AN)
4

BagNode

BagNode is represents the standard multiple instance learning problem, that is, each sample is a bag containing an arbitrary number of instances. In the simplest case, each instance is a vector:

julia> BN = BagNode(AN, [1:1, 2:3, 4:4])
BagNode with 3 obs
  └── ArrayNode(2×4 Array with Float32 elements) with 4 obs

where for simplicity we used AN from the previous example. Each BagNode carries data and bags fields:

julia> Mill.data(BN)
2×4 ArrayNode{Array{Float32,2},Nothing}:
 1.0  2.0  3.0  4.0
 5.0  6.0  7.0  8.0

julia> BN.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 2:3, 4:4])

Here, data can be an arbitrary AbstractNode storing representation of instances (ArrayNode in this case) and bags field contains information, which instances belong to which bag. In this specific case bn stores three bags (samples). The first one consists of a single instance {[1.0, 5.0]} (first column of AN), the second one of two instances {[2.0, 6.0], [3.0, 7.0]}, and the last one of a single instance {[4.0, 8.0]}. We can see that we deal with three top-level samples (bags):

julia> nobs(BN)
3

whereas they are formed using four instances:

julia> nobs(AN)
4

In Mill.jl, there two ways to store indices of the bag's instances:

  • in AlignedBags structure, which accepts a Vector of UnitRanges and requires all bag's instances stored continuously:
julia> AlignedBags([1:3, 4:4, 5:6])
AlignedBags{Int64}(UnitRange{Int64}[1:3, 4:4, 5:6])
  • and in ScatteredBags structure, which accepts a Vector of Vectorss storing not necessarily contiguous indices:
julia> ScatteredBags([[3, 2, 1], [4], [6, 5]])
ScatteredBags{Int64}([[3, 2, 1], [4], [6, 5]])

The two examples above are semantically equivalent, as bags are unordered collections of instances. An empty bag with no instances is in AlignedBags specified as empty range 0:-1 and in ScatteredBags as an empty vector Int[]. The constructor of BagNode accepts directly one of these two structures and tries to automagically decide the better type in other cases.

BagModel

Each BagNode is processed by a BagModel, which contains two (sub)models and an aggregation operator:

julia> im = ArrayModel(Dense(2, 3))
ArrayModel(Dense(2, 3))

julia> a = max_aggregation(3)
Aggregation{Float32}:
 SegmentedMax(ψ = Float32[0.0, 0.0, 0.0])

julia> bm = ArrayModel(Dense(4, 4))
ArrayModel(Dense(4, 4))

julia> BM = BagModel(im, a, bm)
BagModel … ↦ ⟨SegmentedMax(3)⟩ ↦ ArrayModel(Dense(4, 4))
  └── ArrayModel(Dense(2, 3))

The first network submodel (called instance model im) is responsible for converting the instance representation to a vector form:

julia> y = im(AN)
3×4 ArrayNode{Array{Float32,2},Nothing}:
 -4.411283   -4.800618  -5.189953   -5.5792885
  1.6697277   1.557303   1.4448783   1.3324537
  4.13834     4.187142   4.235944    4.284745

Note that because of the property mentioned above, the output of instance model im will always be an ArrayNode wrapping a matrix. We get four columns, one for each instance. This result is then used in Aggregation (a) which takes vector representation of all instances and produces a single vector per bag:

julia> y = a(y, BN.bags)
4×3 ArrayNode{Array{Float32,2},Nothing}:
 -4.411283   -4.800618   -5.5792885
  1.6697277   1.557303    1.3324537
  4.13834     4.235944    4.284745
  0.6931472   1.0986123   0.6931472
More about aggregation

To read more about aggregation operators and find out why there are four rows instead of three after applying the operator, see Bag aggregation section.

Finally, y is then passed to a feed forward model (called bag model bm) producing the final output per bag. In our example we therefore get a matrix with three columns:

julia> y = bm(y)
4×3 ArrayNode{Array{Float32,2},Nothing}:
  0.08765207   0.31226996  -0.015378637
  1.7760211    1.923082     2.0592246
  1.0429261    1.3282981    1.4331931
 -0.3872499   -0.16527262  -0.8316975

However, the best way to use a bag model node is to simply apply it, which results into the same output:

julia> BM(BN) == y
true

The whole procedure is depicted in the following picture:

Three instances of the BagNode are represented by red subtrees are first mapped with instance model im, aggregated (aggregation operator here is a concatenation of two different operators $a_1$ and $a_2$), and the results of aggregation are transformed with bag model bm.

Musk example

Another handy feature of Mill.jl models is that they are completely differentiable and therefore fit in the Flux.jl framework. Nodes for processing arrays and bags are sufficient to solve the classical Musk problem.

ProductNodes and ProductModels

ProductNode can be thought of as a Cartesian Product or a Dictionary. It holds a Tuple or NamedTuple of nodes (not necessarily of the same type). For example, a ProductNode with a BagNode and an ArrayNode as children would look like this:

julia> PN = ProductNode((a=ArrayNode(Float32.([1 2 3; 4 5 6])), b=BN))
ProductNode with 3 obs
  ├── a: ArrayNode(2×3 Array with Float32 elements) with 3 obs
  └── b: BagNode with 3 obs
           └── ArrayNode(2×4 Array with Float32 elements) with 4 obs

Analogically, the ProductModel contains a (Named)Tuple of (sub)models processing each of its children (stored in ms field standing for models), as well as one more (sub)model m:

julia> ms = (a=AM, b=BM)
(a = ArrayModel(Dense(2, 3)), b = BagModel … ↦ ⟨SegmentedMax(3)⟩ ↦ ArrayModel(Dense(4, 4)))

julia> m = ArrayModel(Dense(7, 2))
ArrayModel(Dense(7, 2))

julia> PM = ProductModel(ms, m)
ProductModel … ↦ ArrayModel(Dense(7, 2))
  ├── a: ArrayModel(Dense(2, 3))
  └── b: BagModel … ↦ ⟨SegmentedMax(3)⟩ ↦ ArrayModel(Dense(4, 4))
           └── ArrayModel(Dense(2, 3))

Again, since the library is based on the property that the output of each model is an ArrayNode, the product model applies models from ms to appropriate children and vertically concatenates the output, which is then processed by model m. An example of model processing the above sample would be:

julia> y = PM.m(vcat(PM[:a](PN[:a]), PM[:b](PN[:b])))
2×3 ArrayNode{Array{Float32,2},Nothing}:
 0.14728111  -0.46532086  -0.5963221
 0.5292632    1.1747541    1.2010721

which is equivalent to:

julia> PM(PN) == y
true

Application of another product model (this time with four subtrees (keys)) can be visualized as follows:

Indexing in product nodes

In general, we recommend to use NamedTuples, because the key can be used for indexing both ProductNodes and ProductModels.