Data nodes

Index

API

Mill.AbstractNodeType
AbstractNode

Supertype for any structure representing a data node.

Mill.AbstractProductNodeType
AbstractProductNode <: AbstractNode

Supertype for any structure representing a data node implementing a Cartesian product of data in subtrees.

Mill.AbstractBagNodeType
AbstractBagNode <: AbstractNode

Supertype for any data node structure representing a multi-instance learning problem.

Mill.dataFunction
Mill.data(n::AbstractNode)

Return data stored in node n.

Examples

julia> Mill.data(ArrayNode([1 2; 3 4], "metadata"))
2×2 Array{Int64,2}:
 1  2
 3  4

julia> Mill.data(BagNode(ArrayNode([1 2; 3 4]), bags([1:3, 4:4]), "metadata"))
2×2 ArrayNode{Array{Int64,2},Nothing}:
 1  2
 3  4

See also: Mill.metadata

Mill.metadataFunction
Mill.metadata(n::AbstractNode)

Return metadata stored in node n.

Examples

julia> Mill.metadata(ArrayNode([1 2; 3 4], "metadata"))
"metadata"

julia> Mill.metadata(BagNode(ArrayNode([1 2; 3 4]), bags([1:3, 4:4]), "metadata"))
"metadata"

See also: Mill.data

Mill.dropmetaFunction
dropmeta(n:AbstractNode)

Drop metadata stored in data node n.

Examples

julia> n1 = ArrayNode(NGramMatrix(["foo", "bar"]), ["metafoo", "metabar"])
2053×2 ArrayNode{NGramMatrix{String,Array{String,1},Int64},Array{String,1}}:
 "foo"
 "bar"

julia> n2 = dropmeta(n1)
2053×2 ArrayNode{NGramMatrix{String,Array{String,1},Int64},Nothing}:
 "foo"
 "bar"

julia> isnothing(Mill.metadata(n2))
true

See also: Mill.metadata.

Mill.catobsFunction
catobs(ns...)

Merge multiple nodes storing samples (observations) into one suitably promoting in the process if possible.

Similar to Base.cat but concatenates along the abstract "axis" where samples are stored.

In case of repeated calls with varying number of arguments or argument types, use reduce(catobs, [ns...]) to save compilation time.

Examples

julia> catobs(ArrayNode(zeros(2, 2)), ArrayNode([1 2; 3 4]))
2×4 ArrayNode{Array{Float64,2},Nothing}:
 0.0  0.0  1.0  2.0
 0.0  0.0  3.0  4.0

julia> n = ProductNode((t1=ArrayNode(randn(2, 3)), t2=BagNode(ArrayNode(randn(3, 8)), bags([1:3, 4:5, 6:8]))))
ProductNode with 3 obs
  ├── t1: ArrayNode(2×3 Array with Float64 elements) with 3 obs
  └── t2: BagNode with 3 obs
            └── ArrayNode(3×8 Array with Float64 elements) with 8 obs

julia> catobs(n[1], n[3])
ProductNode with 2 obs
  ├── t1: ArrayNode(2×2 Array with Float64 elements) with 2 obs
  └── t2: BagNode with 2 obs
            └── ArrayNode(3×6 Array with Float64 elements) with 6 obs

See also: Mill.subset.

Mill.subsetFunction
subset(n, i)

Extract a subset i of samples (observations) stored in node n.

Similar to Base.getindex or MLDataPattern.getobs but defined for all Mill.jl compatible data as well.

Examples

julia> Mill.subset(ArrayNode(NGramMatrix(["Hello", "world"])), 2)
2053×1 ArrayNode{NGramMatrix{String,Array{String,1},Int64},Nothing}:
 "world"

julia> Mill.subset(BagNode(ArrayNode(randn(2, 8)), [1:2, 3:3, 4:7, 8:8]), 1:3)
BagNode with 3 obs
  └── ArrayNode(2×7 Array with Float64 elements) with 7 obs

See also: catobs.

Mill.mapdataFunction
mapdata(f, x)

Recursively apply f to data in all leaves of x.

Examples

julia> n1 = ProductNode((a=ArrayNode(zeros(2,2)), b=ArrayNode(ones(2,2))))
ProductNode with 2 obs
  ├── a: ArrayNode(2×2 Array with Float64 elements) with 2 obs
  └── b: ArrayNode(2×2 Array with Float64 elements) with 2 obs

julia> n2 = Mill.mapdata(x -> x .+ 1, n1)
ProductNode with 2 obs
  ├── a: ArrayNode(2×2 Array with Float64 elements) with 2 obs
  └── b: ArrayNode(2×2 Array with Float64 elements) with 2 obs

julia> Mill.data(n2).a
2×2 ArrayNode{Array{Float64,2},Nothing}:
 1.0  1.0
 1.0  1.0

julia> Mill.data(n2).b
2×2 ArrayNode{Array{Float64,2},Nothing}:
 2.0  2.0
 2.0  2.0
Mill.removeinstancesFunction
removeinstances(n::AbstractBagNode, mask)

Remove instances from n using mask and remap bag indices accordingly.

Examples

julia> b1 = BagNode(ArrayNode([1 2 3; 4 5 6]), bags([1:2, 0:-1, 3:3]))
BagNode with 3 obs
  └── ArrayNode(2×3 Array with Int64 elements) with 3 obs

julia> b2 = removeinstances(b1, [false, true, true])
BagNode with 3 obs
  └── ArrayNode(2×2 Array with Int64 elements) with 2 obs

julia> b2.data
2×2 ArrayNode{Array{Int64,2},Nothing}:
 2  3
 5  6

julia> b2.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 0:-1, 2:2])
Mill.ArrayNodeType
ArrayNode{A <: AbstractArray, C} <: AbstractNode

Data node for storing array-like data of type A and metadata of type C. The convention is that samples are stored along the last axis, e.g. in columns of a matrix.

See also: AbstractNode, ArrayModel.

Mill.ArrayNodeMethod
ArrayNode(d::AbstractArray, m=nothing)

Construct a new ArrayNode with data d and metadata m.

Examples

julia> a = ArrayNode([1 2; 3 4; 5 6])
3×2 ArrayNode{Array{Int64,2},Nothing}:
 1  2
 3  4
 5  6

See also: AbstractNode, ArrayModel.

Mill.BagNodeType
BagNode{T <: Union{AbstractNode, Missing}, B <: AbstractBags, C} <: AbstractBagNode

Data node that represents a multi-instance learning problem. Contains instances stored in a subtree of type T, bag indices of type B and optional metadata of type C.

See also: WeightedBagNode, AbstractBagNode, AbstractNode, BagModel.

Mill.BagNodeMethod
BagNode(d::Union{AbstractNode, Missing}, b::AbstractBags, m=nothing)
BagNode(d::Union{AbstractNode, Missing}, b::AbstractVector, m=nothing)

Construct a new BagNode with data d, bags b, and metadata m. If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode with 2 obs
  └── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements) with 3 obs

julia> BagNode(ArrayNode(randn(2, 5)), [1, 2, 2, 1, 1])
BagNode with 2 obs
  └── ArrayNode(2×5 Array with Float64 elements) with 5 obs

See also: WeightedBagNode, AbstractBagNode, AbstractNode, BagModel.

Mill.WeightedBagNodeMethod
WeightedBagNode(d::Union{AbstractNode, Missing}, b::AbstractBags, w::Vector, m=nothing)
WeightedBagNode(d::Union{AbstractNode, Missing}, b::AbstractVector, w::Vector, m=nothing)

Construct a new WeightedBagNode with data d, bags b, weights w and metadata m. If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> BagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
BagNode with 2 obs
  └── ArrayNode(2053×2 NGramMatrix with Int64 elements) with 2 obs

julia> BagNode(ArrayNode(zeros(2, 2)), [1, 2], [1, 2])
BagNode with 2 obs
  └── ArrayNode(2×2 Array with Float64 elements) with 2 obs

See also: BagNode, AbstractBagNode, AbstractNode, BagModel.

Mill.ProductNodeMethod
ProductNode(ds, m=nothing)

Construct a new ProductNode with data ds, and metadata m. ds should be an iterable (preferably Tuple or NamedTuple) and all its elements must contain the same number of observations.

Examples

julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode with 2 obs
  ├── ArrayNode(2×2 Array with Float64 elements) with 2 obs
  └── ArrayNode(2×2 OneHotMatrix with Bool elements) with 2 obs

julia> ProductNode((x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
                    x2 = BagNode(ArrayNode([1 2; 3 4]), [1:3, 4:4])))
ProductNode with 2 obs
  ├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements) with 2 obs
  └── x2: BagNode with 2 obs
            └── ArrayNode(2×2 Array with Int64 elements) with 2 obs

julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1; 3])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]

See also: AbstractProductNode, AbstractNode, ProductModel.

Mill.LazyNodeMethod
LazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)

Construct a new LazyNode with name Name, data d, and metadata m.

Examples

julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{Codons} with 2 obs

See also: AbstractNode, LazyModel, Mill.unpack2mill.

Mill.unpack2millFunction
Mill.unpack2mill(x::LazyNode)

Return a representation of LazyNodex using Mill.jl structures. Every custom LazyNode should have a special method as it is used in LazyModel.

Examples

function Mill.unpack2mill(ds::LazyNode{:Sentence})
    s = split.(ds.data, " ")
    x = NGramMatrix(reduce(vcat, s))
    BagNode(ArrayNode(x), Mill.length2bags(length.(s)))
end
julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode with 2 obs
  └── ArrayNode(2053×3 NGramMatrix with Int64 elements) with 3 obs

See also: LazyNode, LazyModel.