Data nodes
Index
Mill.AbstractBagNode
Mill.AbstractNode
Mill.AbstractProductNode
Mill.ArrayNode
Mill.ArrayNode
Mill.BagNode
Mill.BagNode
Mill.LazyNode
Mill.LazyNode
Mill.ProductNode
Mill.ProductNode
Mill.WeightedBagNode
Mill.WeightedBagNode
Mill.catobs
Mill.data
Mill.dropmeta
Mill.mapdata
Mill.metadata
Mill.removeinstances
Mill.subset
Mill.unpack2mill
API
Mill.AbstractNode
— TypeAbstractNode
Supertype for any structure representing a data node.
Mill.AbstractProductNode
— TypeAbstractProductNode <: AbstractNode
Supertype for any structure representing a data node implementing a Cartesian product of data in subtrees.
Mill.AbstractBagNode
— TypeAbstractBagNode <: AbstractNode
Supertype for any data node structure representing a multi-instance learning problem.
Mill.data
— FunctionMill.data(n::AbstractNode)
Return data stored in node n
.
Examples
julia> Mill.data(ArrayNode([1 2; 3 4], "metadata"))
2×2 Array{Int64,2}:
1 2
3 4
julia> Mill.data(BagNode(ArrayNode([1 2; 3 4]), bags([1:3, 4:4]), "metadata"))
2×2 ArrayNode{Array{Int64,2},Nothing}:
1 2
3 4
See also: Mill.metadata
Mill.metadata
— FunctionMill.metadata(n::AbstractNode)
Return metadata stored in node n
.
Examples
julia> Mill.metadata(ArrayNode([1 2; 3 4], "metadata"))
"metadata"
julia> Mill.metadata(BagNode(ArrayNode([1 2; 3 4]), bags([1:3, 4:4]), "metadata"))
"metadata"
See also: Mill.data
Mill.dropmeta
— Functiondropmeta(n:AbstractNode)
Drop metadata stored in data node n
.
Examples
julia> n1 = ArrayNode(NGramMatrix(["foo", "bar"]), ["metafoo", "metabar"])
2053×2 ArrayNode{NGramMatrix{String,Array{String,1},Int64},Array{String,1}}:
"foo"
"bar"
julia> n2 = dropmeta(n1)
2053×2 ArrayNode{NGramMatrix{String,Array{String,1},Int64},Nothing}:
"foo"
"bar"
julia> isnothing(Mill.metadata(n2))
true
See also: Mill.metadata
.
Mill.catobs
— Functioncatobs(ns...)
Merge multiple nodes storing samples (observations) into one suitably promoting in the process if possible.
Similar to Base.cat
but concatenates along the abstract "axis" where samples are stored.
In case of repeated calls with varying number of arguments or argument types, use reduce(catobs, [ns...])
to save compilation time.
Examples
julia> catobs(ArrayNode(zeros(2, 2)), ArrayNode([1 2; 3 4]))
2×4 ArrayNode{Array{Float64,2},Nothing}:
0.0 0.0 1.0 2.0
0.0 0.0 3.0 4.0
julia> n = ProductNode((t1=ArrayNode(randn(2, 3)), t2=BagNode(ArrayNode(randn(3, 8)), bags([1:3, 4:5, 6:8]))))
ProductNode with 3 obs
├── t1: ArrayNode(2×3 Array with Float64 elements) with 3 obs
└── t2: BagNode with 3 obs
└── ArrayNode(3×8 Array with Float64 elements) with 8 obs
julia> catobs(n[1], n[3])
ProductNode with 2 obs
├── t1: ArrayNode(2×2 Array with Float64 elements) with 2 obs
└── t2: BagNode with 2 obs
└── ArrayNode(3×6 Array with Float64 elements) with 6 obs
See also: Mill.subset
.
Mill.subset
— Functionsubset(n, i)
Extract a subset i
of samples (observations) stored in node n
.
Similar to Base.getindex
or MLDataPattern.getobs
but defined for all Mill.jl
compatible data as well.
Examples
julia> Mill.subset(ArrayNode(NGramMatrix(["Hello", "world"])), 2)
2053×1 ArrayNode{NGramMatrix{String,Array{String,1},Int64},Nothing}:
"world"
julia> Mill.subset(BagNode(ArrayNode(randn(2, 8)), [1:2, 3:3, 4:7, 8:8]), 1:3)
BagNode with 3 obs
└── ArrayNode(2×7 Array with Float64 elements) with 7 obs
See also: catobs
.
Mill.mapdata
— Functionmapdata(f, x)
Recursively apply f
to data in all leaves of x
.
Examples
julia> n1 = ProductNode((a=ArrayNode(zeros(2,2)), b=ArrayNode(ones(2,2))))
ProductNode with 2 obs
├── a: ArrayNode(2×2 Array with Float64 elements) with 2 obs
└── b: ArrayNode(2×2 Array with Float64 elements) with 2 obs
julia> n2 = Mill.mapdata(x -> x .+ 1, n1)
ProductNode with 2 obs
├── a: ArrayNode(2×2 Array with Float64 elements) with 2 obs
└── b: ArrayNode(2×2 Array with Float64 elements) with 2 obs
julia> Mill.data(n2).a
2×2 ArrayNode{Array{Float64,2},Nothing}:
1.0 1.0
1.0 1.0
julia> Mill.data(n2).b
2×2 ArrayNode{Array{Float64,2},Nothing}:
2.0 2.0
2.0 2.0
Mill.removeinstances
— Functionremoveinstances(n::AbstractBagNode, mask)
Remove instances from n
using mask
and remap bag indices accordingly.
Examples
julia> b1 = BagNode(ArrayNode([1 2 3; 4 5 6]), bags([1:2, 0:-1, 3:3]))
BagNode with 3 obs
└── ArrayNode(2×3 Array with Int64 elements) with 3 obs
julia> b2 = removeinstances(b1, [false, true, true])
BagNode with 3 obs
└── ArrayNode(2×2 Array with Int64 elements) with 2 obs
julia> b2.data
2×2 ArrayNode{Array{Int64,2},Nothing}:
2 3
5 6
julia> b2.bags
AlignedBags{Int64}(UnitRange{Int64}[1:1, 0:-1, 2:2])
Mill.ArrayNode
— TypeArrayNode{A <: AbstractArray, C} <: AbstractNode
Data node for storing array-like data of type A
and metadata of type C
. The convention is that samples are stored along the last axis, e.g. in columns of a matrix.
See also: AbstractNode
, ArrayModel
.
Mill.ArrayNode
— MethodArrayNode(d::AbstractArray, m=nothing)
Construct a new ArrayNode
with data d
and metadata m
.
Examples
julia> a = ArrayNode([1 2; 3 4; 5 6])
3×2 ArrayNode{Array{Int64,2},Nothing}:
1 2
3 4
5 6
See also: AbstractNode
, ArrayModel
.
Mill.BagNode
— TypeBagNode{T <: Union{AbstractNode, Missing}, B <: AbstractBags, C} <: AbstractBagNode
Data node that represents a multi-instance learning problem. Contains instances stored in a subtree of type T
, bag indices of type B
and optional metadata of type C
.
See also: WeightedBagNode
, AbstractBagNode
, AbstractNode
, BagModel
.
Mill.BagNode
— MethodBagNode(d::Union{AbstractNode, Missing}, b::AbstractBags, m=nothing)
BagNode(d::Union{AbstractNode, Missing}, b::AbstractVector, m=nothing)
Construct a new BagNode
with data d
, bags b
, and metadata m
. If b
is an AbstractVector
, Mill.bags
is applied first.
Examples
julia> BagNode(ArrayNode(maybehotbatch([1, missing, 2], 1:2)), AlignedBags([1:1, 2:3]))
BagNode with 2 obs
└── ArrayNode(2×3 MaybeHotMatrix with Union{Missing, Bool} elements) with 3 obs
julia> BagNode(ArrayNode(randn(2, 5)), [1, 2, 2, 1, 1])
BagNode with 2 obs
└── ArrayNode(2×5 Array with Float64 elements) with 5 obs
See also: WeightedBagNode
, AbstractBagNode
, AbstractNode
, BagModel
.
Mill.WeightedBagNode
— TypeWeightedBagNode{T <: Union{AbstractNode, Missing}, B <: AbstractBags, W, C} <: AbstractBagNode
Structure like BagNode
but allows to specify weights of type W
of each instance.
See also: BagNode
, AbstractBagNode
, AbstractNode
, BagModel
.
Mill.WeightedBagNode
— MethodWeightedBagNode(d::Union{AbstractNode, Missing}, b::AbstractBags, w::Vector, m=nothing)
WeightedBagNode(d::Union{AbstractNode, Missing}, b::AbstractVector, w::Vector, m=nothing)
Construct a new WeightedBagNode
with data d
, bags b
, weights w
and metadata m
. If b
is an AbstractVector
, Mill.bags
is applied first.
Examples
julia> BagNode(ArrayNode(NGramMatrix(["s1", "s2"])), bags([1:2, 0:-1]), [0.2, 0.8])
BagNode with 2 obs
└── ArrayNode(2053×2 NGramMatrix with Int64 elements) with 2 obs
julia> BagNode(ArrayNode(zeros(2, 2)), [1, 2], [1, 2])
BagNode with 2 obs
└── ArrayNode(2×2 Array with Float64 elements) with 2 obs
See also: BagNode
, AbstractBagNode
, AbstractNode
, BagModel
.
Mill.ProductNode
— TypeProductNode{T, C} <: AbstractProductNode
Data node representing a Cartesian product of several spaces each represented by subtree stored in iterable of type T
. May store metadata of type C
.
See also: AbstractProductNode
, AbstractNode
, ProductModel
.
Mill.ProductNode
— MethodProductNode(ds, m=nothing)
Construct a new ProductNode
with data ds
, and metadata m
. ds
should be an iterable (preferably Tuple
or NamedTuple
) and all its elements must contain the same number of observations.
Examples
julia> ProductNode((ArrayNode(zeros(2, 2)), ArrayNode(Flux.onehotbatch([1, 2], 1:2))))
ProductNode with 2 obs
├── ArrayNode(2×2 Array with Float64 elements) with 2 obs
└── ArrayNode(2×2 OneHotMatrix with Bool elements) with 2 obs
julia> ProductNode((x1 = ArrayNode(NGramMatrix(["Hello", "world"])),
x2 = BagNode(ArrayNode([1 2; 3 4]), [1:3, 4:4])))
ProductNode with 2 obs
├── x1: ArrayNode(2053×2 NGramMatrix with Int64 elements) with 2 obs
└── x2: BagNode with 2 obs
└── ArrayNode(2×2 Array with Int64 elements) with 2 obs
julia> ProductNode((ArrayNode([1 2; 3 4]), ArrayNode([1; 3])))
ERROR: AssertionError: All subtrees must have an equal amount of instances!
[...]
See also: AbstractProductNode
, AbstractNode
, ProductModel
.
Mill.LazyNode
— TypeLazyNode{Name, D, C} <: AbstractNode
Data node storing data of type D
in a lazy manner and optional metadata of type C
.
Source of data or its type is specified in Name
.
See also: AbstractNode
, LazyModel
, Mill.unpack2mill
.
Mill.LazyNode
— MethodLazyNode([Name::Symbol], d, m=nothing)
LazyNode{Name}(d, m=nothing)
Construct a new LazyNode
with name Name
, data d
, and metadata m
.
Examples
julia> LazyNode(:Codons, ["GGGCGGCGA", "CCTCGCGGG"])
LazyNode{Codons} with 2 obs
See also: AbstractNode
, LazyModel
, Mill.unpack2mill
.
Mill.unpack2mill
— FunctionMill.unpack2mill(x::LazyNode)
Return a representation of LazyNode
x
using Mill.jl
structures. Every custom LazyNode
should have a special method as it is used in LazyModel
.
Examples
function Mill.unpack2mill(ds::LazyNode{:Sentence})
s = split.(ds.data, " ")
x = NGramMatrix(reduce(vcat, s))
BagNode(ArrayNode(x), Mill.length2bags(length.(s)))
end
julia> LazyNode{:Sentence}(["foo bar", "baz"]) |> Mill.unpack2mill
BagNode with 2 obs
└── ArrayNode(2053×3 NGramMatrix with Int64 elements) with 3 obs