Home

Mill.jl is a library built on top of Flux.jl aimed to flexibly prototype hierarchical multi-instance learning models as described in Tomáš Pevný , Petr Somol (2017) and Tomáš Pevný , Petr Somol (2016). It is developed to be:

  • flexible and versatile
  • as general as possible
  • fast
  • and dependent on only handful of other packages

Installation

Run the following in REPL:

] add Mill

Go to

Mill._addmattvec!Method
_addmattvec!(o, i, W, x, j)

add a product of a transposed matrix `W` with a j-th column of `x` to i-th columns of `o`
Mill._addmatvec!Method
_addmatvec!(o, i, W, x, j)

add a product of matrix `W` with a j-th column of `x` to i-th columns of `o`
Mill._addvecvect!Method
_outeradd!(W, Δ, i, x, j)

add an outer product of i-th column of `Δ` and transposed `j`-th columns of `x` to `W`
Mill.catobsFunction
catobs(as...)

concatenates `as...` into a single datanode while preserving their structure
Mill.countngrams!Method

function countngrams!(o,x,n::Int,b::Int)

counts number of of n grams of x with base b to o and store it to o

Mill.countngramsMethod

function countngrams(x,n::Int,b::Int)

counts number of of n grams of x with base b to o

Mill.dataMethod
data(x::AbstractNode)

return data hold by the datanode
Mill.metadataMethod
metadata(x::AbstractNode)

return metadata hold by the datanode
Mill.ngrams!Method

ngrams!(o,x,n::Int,b::Int)

store indexes of n grams of x with base b to o

Mill.ngramsMethod

ngrams(x,n::Int,b::Int)

indexes of n grams of x with base b

Mill.remapbagMethod
function remapbag(b::Bags,idcs::Vector{Int})

bags corresponding to indices with collected indices
Mill.sparsifyMethod
	sparsify(x,nnzrate)

	replace matrices with at most `nnzrate` fraction of non-zeros with SparseMatrixCSC
julia> x = ProductNode((
				ProductNode((
					MatrixNode(randn(5,5)),
					MatrixNode(zeros(5,5))
						)),
				MatrixNode(zeros(5,5))
				))
julia> mapdata(i -> sparsify(i,0.05),x)
Mill.ArrayModelType

struct ArrayModel{T <: MillFunction} <: AbstractMillModel m::T end

use a Chain, Dense, or any other function on an ArrayNode

Mill.BagChainType

BagChain(layers...) BagChain multiple layers / functions together, so that they are called in sequence on a given input supported by bags.

Mill.BagConvType
struct BagConv{T, F}
    W::T
    σ::F
end

Convolution over a matrix `X` correctly handing borders between bags. The convolution is little bit special, as it 
assumes that input is always a matrix (never a tensor) and the kernel always spans the full dimension of the vector.

BagConv(d::Int, o::Int, n::Int, σ = identity)
`d` --- input dimension
`o` --- output dimension (number of channels)
`n` --- size of convolution
`σ` --- transfer function used after the convolution

note that of `n` equals one, then the convolution boils down to multiplication of input data `x` with a matrix `W`
Mill.BagModelType

struct BagModel{T <: AbstractMillModel, U <: AbstractMillModel} <: AbstractMillModel im::T a::Aggregation bm::U end

use a im model on data in BagNode, the uses function a to aggregate individual bags, and finally it uses bm model on the output

Mill.NGramIteratorType

struct NGramIterator{T} s::T n::Int b::Int end

Iterates and enumerates ngrams of collection of integers s::T with zero padding. Enumeration is computed as in positional number systems, where items of s are digits and b is the base.

In order to reduce collisions when mixing ngrams of different order one should avoid zeros and negative integers in s and should set base b to be equal to the expected number of unique tokkens in s.

Examples

julia> it = Mill.NGramIterator(collect(1:9), 3, 10)
NGramIterator{Array{Int64,1}}([1, 2, 3, 4, 5, 6, 7, 8, 9], 3, 10, 9223372036854775807)

julia> Mill.string_start_code!(0); Mill.string_end_code!(0); collect(it)
11-element Array{Int64,1}:
   1
  12
 123
 234
 345
 456
 567
 678
 789
 890
 900
julia> sit = Mill.NGramIterator(codeunits("deadbeef"), 3, 256)    # creates collisions as codeunits returns tokens from 0x00:0xff
NGramIterator{Base.CodeUnits{UInt8,String}}(UInt8[0x64, 0x65, 0x61, 0x64, 0x62, 0x65, 0x65, 0x66], 3, 256, 9223372036854775807)

julia> collect(sit)
10-element Array{Int64,1}:
     100
   25701
 6579553
 6644068
 6382690
 6578789
 6448485
 6645094
 6645248
 6684672
Mill.NGramMatrixType

struct NGramMatrix{T} s::Vector{T} n::Int b::Int m::Int end

Represents strings stored in array s as ngrams of cardinality n. Strings are internally stored as strings and the multiplication with dense matrix is overloaded and b is a base for calculation of trigrams. Finally m is the modulo applied on indexes of ngrams.

The structure essentially represents module one-hot representation of strings, where each columns contains one observation (string). Therefore the structure can be viewed as a matrix with m rows and length(s) columns

Mill.ProductModelType
struct ProductModel{N, T <: MillFunction} <: AbstractMillModel
    ms::NTuple{N, AbstractMillModel}
    m::ArrayModel{T}
end

uses each model in `ms` on each data in `ProductNode`, concatenate the output and pass it to the chainmodel `m`