Data nodes

Index

Mill.AbstractBagNode
Mill.AbstractNode
Mill.AbstractProductNode
Mill.ArrayNode
Mill.ArrayNode
Mill.BagNode
Mill.BagNode
Mill.LazyNode
Mill.LazyNode
Mill.ProductNode
Mill.ProductNode
Mill.WeightedBagNode
Mill.WeightedBagNode
Mill.catobs
Mill.data
Mill.dropmeta
Mill.mapdata
Mill.metadata
Mill.removeinstances
Mill.subset
Mill.unpack2mill

API

Mill.AbstractNode — Type

AbstractNode

Supertype for any structure representing a data node.

Mill.AbstractProductNode — Type

AbstractProductNode <: AbstractNode

Supertype for any structure representing a data node implementing a Cartesian product of data in subtrees.

Mill.AbstractBagNode — Type

AbstractBagNode <: AbstractNode

Supertype for any data node structure representing a multi-instance learning problem.

Mill.data — Function

Mill.data(n::AbstractNode)

Return data stored in node n.

Examples

julia> Mill.data(ArrayNode([1 2; 3 4], "metadata"))
2×2 Array{Int64,2}:
 1  2
 3  4

julia> Mill.data(BagNode(ArrayNode([1 2; 3 4]), bags([1:3, 4:4]), "metadata"))
2×2 ArrayNode{Array{Int64,2},Nothing}:
 1  2
 3  4

See also: AbstractNode, ArrayModel.

Mill.BagNode — Type

BagNode{T <: Union{AbstractNode, Missing}, B <: AbstractBags, C} <: AbstractBagNode

Data node that represents a multi-instance learning problem. Contains instances stored in a subtree of type T, bag indices of type B and optional metadata of type C.

Mill.BagNode — Method

BagNode(d::Union{AbstractNode, Missing}, b::AbstractBags, m=nothing)
BagNode(d::Union{AbstractNode, Missing}, b::AbstractVector, m=nothing)

Construct a new BagNode with data d, bags b, and metadata m. If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode with 2 obs
  └── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements) with 3 obs

julia> BagNode(ArrayNode(randn(2, 5)), [1, 2, 2, 1, 1])
BagNode with 2 obs
  └── ArrayNode(2×5 Array with Float64 elements) with 5 obs

Mill.WeightedBagNode — Type

WeightedBagNode{T <: Union{AbstractNode, Missing}, B <: AbstractBags, W, C} <: AbstractBagNode

Structure like BagNode but allows to specify weights of type W of each instance.

Mill.WeightedBagNode — Method

WeightedBagNode(d::Union{AbstractNode, Missing}, b::AbstractBags, w::Vector, m=nothing)
WeightedBagNode(d::Union{AbstractNode, Missing}, b::AbstractVector, w::Vector, m=nothing)

Construct a new WeightedBagNode with data d, bags b, weights w and metadata m. If b is an AbstractVector, Mill.bags is applied first.

Examples

julia> BagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
BagNode with 2 obs
  └── ArrayNode(2053×2 NGramMatrix with Int64 elements) with 2 obs

julia> BagNode(ArrayNode(zeros(2, 2)), [1, 2], [1, 2])
BagNode with 2 obs
  └── ArrayNode(2×2 Array with Float64 elements) with 2 obs

Mill.ProductNode — Type

ProductNode{T, C} <: AbstractProductNode

Data node representing a Cartesian product of several spaces each represented by subtree stored in iterable of type T. May store metadata of type C.

Mill.ProductNode — Method

ProductNode(ds, m=nothing)

Construct a new ProductNode with data ds, and metadata m. ds should be an iterable (preferably Tuple or NamedTuple) and all its elements must contain the same number of observations.

Examples

julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode with 2 obs
  ├── ArrayNode(2×2 Array with Float64 elements) with 2 obs
  └── ArrayNode(2×2 OneHotMatrix with Bool elements) with 2 obs

julia> ProductNode((x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
                    x2 = BagNode(ArrayNode([1 2; 3 4]), [1:3, 4:4])))
ProductNode with 2 obs
  ├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements) with 2 obs
  └── x2: BagNode with 2 obs
            └── ArrayNode(2×2 Array with Int64 elements) with 2 obs

julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1; 3])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]

Mill.LazyNode — Type

LazyNode{Name, D, C} <: AbstractNode

Data node storing data of type D in a lazy manner and optional metadata of type C.

Source of data or its type is specified in Name.

Mill.LazyNode — Method

LazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)

Construct a new LazyNode with name Name, data d, and metadata m.

Examples

julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{Codons} with 2 obs

Mill.unpack2mill — Function

Mill.unpack2mill(x::LazyNode)

Return a representation of LazyNodex using Mill.jl structures. Every custom LazyNode should have a special method as it is used in LazyModel.

Examples

function Mill.unpack2mill(ds::LazyNode{:Sentence})
    s = split.(ds.data, " ")
    x = NGramMatrix(reduce(vcat, s))
    BagNode(ArrayNode(x), Mill.length2bags(length.(s)))
end

julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode with 2 obs
  └── ArrayNode(2053×3 NGramMatrix with Int64 elements) with 3 obs