Special arrays
Index
Mill.MaybeHotMatrix
Mill.MaybeHotVector
Mill.NGramIterator
Mill.NGramIterator
Mill.NGramMatrix
Mill.NGramMatrix
Mill.PostImputingMatrix
Mill.PostImputingMatrix
Mill.PreImputingMatrix
Mill.PreImputingMatrix
Mill.countngrams
Mill.countngrams!
Mill.maybehot
Mill.maybehotbatch
Mill.ngrams
Mill.ngrams!
Mill.postimputing_dense
Mill.preimputing_dense
API
Mill.MaybeHotVector
— TypeMaybeHotVector{T, U, V} <: AbstractVector{V}
A vector-like structure for representing one-hot encoded variables. Like Flux.OneHotVector
but supports missing
values.
Construct with the maybehot
function.
See also: MaybeHotMatrix
, maybehotbatch
.
Mill.maybehot
— Functionmaybehot(l, labels)
Return a MaybeHotVector
where the first occurence of l
in labels
is set to 1
and all other elements are set to 0
.
Examples
julia> maybehot(:b, [:a, :b, :c])
3-element MaybeHotVector{Int64,Int64,Bool}:
0
1
0
julia> maybehot(missing, 1:3)
3-element MaybeHotVector{Missing,Int64,Missing}:
missing
missing
missing
See also: maybehotbatch
, MaybeHotVector
, MaybeHotMatrix
.
Mill.MaybeHotMatrix
— TypeMaybeHotMatrix{T, U, V} <: AbstractMatrix{V}
A matrix-like structure for representing one-hot encoded variables. Like Flux.OneHotMatrix
but supports missing
values.
Construct with the maybehotbatch
function.
See also: MaybeHotVector
, maybehot
.
Mill.maybehotbatch
— Functionmaybehotbatch(ls, labels)
Return a MaybeHotMatrix
in which each column corresponds to one element of ls
containing 1
at its first occurence in labels
with all other elements set to 0
.
Examples
julia> maybehotbatch([:c, :a], [:a, :b, :c])
3×2 MaybeHotMatrix{Int64,Int64,Bool}:
0 1
0 0
1 0
julia> maybehotbatch([missing, 2], 1:3)
3×2 MaybeHotMatrix{Union{Missing, Int64},Int64,Union{Missing, Bool}}:
missing false
missing true
missing false
See also: maybehot
, MaybeHotMatrix
, MaybeHotVector
.
Mill.NGramIterator
— TypeNGramIterator{T}
Iterates over ngram codes of collection of integers s
using Mill.string_start_code()
and Mill.string_end_code()
for padding. NGram codes are computed as in positional number systems, where items of s
are digits, b
is the base, and m
is modulo.
In order to reduce collisions when mixing ngrams of different order one should avoid zeros and negative integers in s
and should set base b
to the expected number of unique tokens in s
.
See also: NGramMatrix
, ngrams
, ngrams!
, countngrams
, countngrams!
.
Mill.NGramIterator
— MethodNGramIterator(s, n=3, b=256, m=typemax(Int))
Construct an NGramIterator
. If s
is an AbstractString
it is first converted to integers with Base.codeunits
.
Examples
julia> NGramIterator("deadbeef", 3, 256, 17) |> collect
10-element Array{Int64,1}:
2
16
9
9
6
10
11
15
2
6
julia> NGramIterator(collect(1:9), 3, 10, 1009) |> collect
11-element Array{Int64,1}:
221
212
123
234
345
456
567
678
789
893
933
julia> Mill.string_start_code()
0x02
julia> Mill.string_end_code()
0x03
See also: NGramMatrix
, ngrams
, ngrams!
, countngrams
, countngrams!
.
Mill.ngrams
— Functionngrams(o, x, n=3, b=256)
Return codes of n
grams of x
using base b
.
Examples
julia> ngrams("foo", 3, 256)
5-element Array{Int64,1}:
131686
157295
6713199
7302915
7275267
See also: ngrams!
, countngrams
, countngrams!
, NGramMatrix
, NGramIterator
.
Mill.ngrams!
— Functionngrams!(o, x, n=3, b=256)
Store codes of n
grams of x
using base b
to o
.
Examples
julia> o = zeros(Int, 5)
5-element Array{Int64,1}:
0
0
0
0
0
julia> ngrams!(o, "foo", 3, 256)
5-element Array{Int64,1}:
131686
157295
6713199
7302915
7275267
See also: ngrams
, countngrams
, countngrams!
, NGramMatrix
, NGramIterator
.
Mill.countngrams
— Functioncountngrams(o, x, n, b, m)
Count the number of of n
grams of x
using base b
and modulo m
into a vector of length m
in case x
is a single sequence or into a matrix with m
rows if x
is an iterable of sequences.
Examples
julia> countngrams("foo", 3, 256, 5)
5-element Array{Int64,1}:
2
1
1
0
1
julia> countngrams(["foo", "bar"], 3, 256, 5)
5×2 Array{Int64,2}:
2 1
1 0
1 2
0 0
1 2
See also: countngrams!
, ngrams
, ngrams!
, NGramMatrix
, NGramIterator
.
Mill.countngrams!
— Functioncountngrams!(o, x, n, b, m=length(o))
Count the number of of n
grams of x
using base b
and modulo m
and store the result to o
.
Examples
julia> o = zeros(Int, 5)
5-element Array{Int64,1}:
0
0
0
0
0
julia> countngrams!(o, "foo", 3, 256)
5-element Array{Int64,1}:
2
1
1
0
1
See also: countngrams
, ngrams
, ngrams!
, NGramMatrix
, NGramIterator
.
Mill.NGramMatrix
— TypeNGramMatrix{T, U} <: AbstractMatrix{U}
A matrix-like structure for lazily representing sequences like strings as ngrams of cardinality n
using b
as a base for calculations and m
as the modulo. Therefore, the matrix has m
rows and one column for representing each sequence. Missing sequences are supported.
See also: NGramIterator
, ngrams
, ngrams!
, countngrams
, countngrams!
.
Mill.NGramMatrix
— MethodNGramMatrix(s, n=3, b=256, m=2053)
Construct an NGramMatrix
. s
can either be a single sequence or any AbstractVector
.
Examples
julia> NGramMatrix([1,2,3])
2053×1 NGramMatrix{Array{Int64,1},Array{Array{Int64,1},1},Int64}:
[1, 2, 3]
julia> NGramMatrix(["a", missing, "c"], 2, 128)
2053×3 NGramMatrix{Union{Missing, String},Array{Union{Missing, String},1},Union{Missing, Int64}}:
"a"
missing
"c"
See also: NGramIterator
, ngrams
, ngrams!
, countngrams
, countngrams!
.
Mill.PostImputingMatrix
— TypePostImputingMatrix{T <: Number, U <: AbstractMatrix{T}, V <: AbstractVector{T}} <: AbstractMatrix{T}
A parametrized matrix that fills in a default vector of parameters whenever a "missing" column is encountered during multiplication.
Supports multiplication with NGramMatrix
, MaybeHotMatrix
and MaybeHotVector
. For any other AbstractMatrix
it falls back to standard multiplication.
Examples
julia> A = PostImputingMatrix(ones(2, 2), -ones(2))
2×2 PostImputingMatrix{Float64,Array{Float64,2},Array{Float64,1}}:
W:
1.0 1.0
1.0 1.0
ψ:
-1.0
-1.0
julia> A * maybehotbatch([1, missing], 1:2)
2×2 Array{Float64,2}:
1.0 -1.0
1.0 -1.0
See also: PreImputingMatrix
.
Mill.PostImputingMatrix
— MethodPostImputingMatrix(W::AbstractMatrix{T}, ψ=zeros(T, size(W, 1))) where T
Construct a PostImputingMatrix
with multiplication parameters W
and default parameters ψ
.
Examples
julia> PostImputingMatrix([1 2; 3 4])
2×2 PostImputingMatrix{Int64,Array{Int64,2},Array{Int64,1}}:
W:
1 2
3 4
ψ:
0
0
See also: PreImputingMatrix
.
Mill.postimputing_dense
— Functionpostimputing_dense(d_in, d_out, σ)
Like Flux.Dense
, but use a PostImputingMatrix
instead of a standard matrix.
Examples
julia> d = postimputing_dense(2, 3)
[post_imputing]Dense(2, 3)
julia> typeof(d.W)
PostImputingMatrix{Float32,Array{Float32,2},Array{Float32,1}}
julia> typeof(d.b)
Array{Float32,1}
See also: PostImputingMatrix
, preimputing_dense
, PreImputingMatrix
.
Mill.PreImputingMatrix
— TypePreImputingMatrix{T <: Number, U <: AbstractMatrix{T}, V <: AbstractVector{T}} <: AbstractMatrix{T}
A parametrized matrix that fills in elements from a default vector of parameters whenever a missing
element is encountered during multiplication.
Examples
julia> A = PreImputingMatrix(ones(2, 2), -ones(2))
2×2 PreImputingMatrix{Float64,Array{Float64,2},Array{Float64,1}}:
W:
1.0 1.0
1.0 1.0
ψ:
-1.0 -1.0
julia> A * [0 1; missing -1]
2×2 Array{Float64,2}:
-1.0 0.0
-1.0 0.0
See also: PreImputingMatrix
.
Mill.PreImputingMatrix
— MethodPreImputingMatrix(W::AbstractMatrix{T}, ψ=zeros(T, size(W, 2))) where T
Construct a PreImputingMatrix
with multiplication parameters W
and default parameters ψ
.
Examples
julia> PreImputingMatrix([1 2; 3 4])
2×2 PreImputingMatrix{Int64,Array{Int64,2},Array{Int64,1}}:
W:
1 2
3 4
ψ:
0 0
See also: PostImputingMatrix
.
Mill.preimputing_dense
— Functionpreimputing_dense(in, out, σ)
Like Flux.Dense
, but use a PreImputingMatrix
instead of a standard matrix.
Examples
julia> d = preimputing_dense(2, 3)
[pre_imputing]Dense(2, 3)
julia> typeof(d.W)
PreImputingMatrix{Float32,Array{Float32,2},Array{Float32,1}}
julia> typeof(d.b)
Array{Float32,1}
See also: PreImputingMatrix
, postimputing_dense
, PostImputingMatrix
.