Home
Mill.jl
is a library built on top of Flux.jl
aimed to flexibly prototype hierarchical multi-instance learning models as described in Tomáš Pevný , Petr Somol (2017) and Tomáš Pevný , Petr Somol (2016). It is developed to be:
- flexible and versatile
- as general as possible
- fast
- and dependent on only handful of other packages
Installation
Run the following in REPL:
] add Mill
Go to
- Motivation for a brief introduction into the philosophy of
Mill.jl
- Architecture of Mill
- Examples
- Helper tools
- References
- TODO finish this
Mill._addmattvec!
— Method_addmattvec!(o, i, W, x, j)
add a product of a transposed matrix `W` with a j-th column of `x` to i-th columns of `o`
Mill._addmatvec!
— Method_addmatvec!(o, i, W, x, j)
add a product of matrix `W` with a j-th column of `x` to i-th columns of `o`
Mill._addvecvect!
— Method_outeradd!(W, Δ, i, x, j)
add an outer product of i-th column of `Δ` and transposed `j`-th columns of `x` to `W`
Mill.catobs
— Functioncatobs(as...)
concatenates `as...` into a single datanode while preserving their structure
Mill.countngrams!
— Methodfunction countngrams!(o,x,n::Int,b::Int)
counts number of of n
grams of x
with base b
to o
and store it to o
Mill.countngrams
— Methodfunction countngrams(x,n::Int,b::Int)
counts number of of n
grams of x
with base b
to o
Mill.data
— Methoddata(x::AbstractNode)
return data hold by the datanode
Mill.metadata
— Methodmetadata(x::AbstractNode)
return metadata hold by the datanode
Mill.ngrams!
— Methodngrams!(o,x,n::Int,b::Int)
store indexes of n
grams of x
with base b
to o
Mill.ngrams
— Methodngrams(x,n::Int,b::Int)
indexes of n
grams of x
with base b
Mill.remapbag
— Methodfunction remapbag(b::Bags,idcs::Vector{Int})
bags corresponding to indices with collected indices
Mill.sparsify
— Method sparsify(x,nnzrate)
replace matrices with at most `nnzrate` fraction of non-zeros with SparseMatrixCSC
julia> x = ProductNode((
ProductNode((
MatrixNode(randn(5,5)),
MatrixNode(zeros(5,5))
)),
MatrixNode(zeros(5,5))
))
julia> mapdata(i -> sparsify(i,0.05),x)
Mill.ArrayModel
— Typestruct ArrayModel{T <: MillFunction} <: AbstractMillModel m::T end
use a Chain, Dense, or any other function on an ArrayNode
Mill.ArrayNode
— TypeMill.BagChain
— TypeBagChain(layers...) BagChain multiple layers / functions together, so that they are called in sequence on a given input supported by bags.
Mill.BagConv
— Typestruct BagConv{T, F}
W::T
σ::F
end
Convolution over a matrix `X` correctly handing borders between bags. The convolution is little bit special, as it
assumes that input is always a matrix (never a tensor) and the kernel always spans the full dimension of the vector.
BagConv(d::Int, o::Int, n::Int, σ = identity)
`d` --- input dimension
`o` --- output dimension (number of channels)
`n` --- size of convolution
`σ` --- transfer function used after the convolution
note that of `n` equals one, then the convolution boils down to multiplication of input data `x` with a matrix `W`
Mill.BagModel
— Typestruct BagModel{T <: AbstractMillModel, U <: AbstractMillModel} <: AbstractMillModel im::T a::Aggregation bm::U end
use a im
model on data in BagNode
, the uses function a
to aggregate individual bags, and finally it uses bm
model on the output
Mill.NGramIterator
— Typestruct NGramIterator{T} s::T n::Int b::Int end
Iterates and enumerates ngrams of collection of integers s::T
with zero padding. Enumeration is computed as in positional number systems, where items of s
are digits and b
is the base.
In order to reduce collisions when mixing ngrams of different order one should avoid zeros and negative integers in s
and should set base b
to be equal to the expected number of unique tokkens in s
.
Examples
julia> it = Mill.NGramIterator(collect(1:9), 3, 10)
NGramIterator{Array{Int64,1}}([1, 2, 3, 4, 5, 6, 7, 8, 9], 3, 10, 9223372036854775807)
julia> Mill.string_start_code!(0); Mill.string_end_code!(0); collect(it)
11-element Array{Int64,1}:
1
12
123
234
345
456
567
678
789
890
900
julia> sit = Mill.NGramIterator(codeunits("deadbeef"), 3, 256) # creates collisions as codeunits returns tokens from 0x00:0xff
NGramIterator{Base.CodeUnits{UInt8,String}}(UInt8[0x64, 0x65, 0x61, 0x64, 0x62, 0x65, 0x65, 0x66], 3, 256, 9223372036854775807)
julia> collect(sit)
10-element Array{Int64,1}:
100
25701
6579553
6644068
6382690
6578789
6448485
6645094
6645248
6684672
Mill.NGramMatrix
— Typestruct NGramMatrix{T} s::Vector{T} n::Int b::Int m::Int end
Represents strings stored in array s
as ngrams of cardinality n
. Strings are internally stored as strings and the multiplication with dense matrix is overloaded and b
is a base for calculation of trigrams. Finally m
is the modulo applied on indexes of ngrams.
The structure essentially represents module one-hot representation of strings, where each columns contains one observation (string). Therefore the structure can be viewed as a matrix with m
rows and length(s)
columns
Mill.ProductModel
— Typestruct ProductModel{N, T <: MillFunction} <: AbstractMillModel
ms::NTuple{N, AbstractMillModel}
m::ArrayModel{T}
end
uses each model in `ms` on each data in `ProductNode`, concatenate the output and pass it to the chainmodel `m`