API
Types
SequencerJ.SequencerResult
— TypeSequencerResult
EOSeg::Dict{Tuple,Any} # list of elongation and orderings (BFS,DFS), one per Segment
EOAlgScale::Dict{Tuple,Any} # elongation and orderings for the cumulated weighted distances
D::AbstractMatrix # final distance matrix
mst::LightGraphs.AbstractGraph # final mst
η::Real # final elongation
order::AbstractVector #final ordering from bfs
Type containing the results of a Sequencer run.
Call accessor functions using the SequencerResult object. For example, to get the final column ordering of the data, call order
:
A = rand(100,100);
r = sequence(A;);
order(r)
100-element Array{Int64,1}: 100 71 95 80 98 22 10 70 66 62 ⋮ 84 33 78 55 50 89 19 77 94
Functions
SequencerJ.sequence
— Functionsequence(A::VecOrMat{T};
scales=nothing,
metrics=ALL_METRICS,
grid=nothing,
) where {T <: Real}
Analyze the provided m x n
matrix (or m vectors of vectors n) by applying one or more 1-dimensional statistical metrics to each column of data, possibly after dividing the data into smaller row sets. Columns are compared pairwise for each combination of metric and scale to create a n x n distance matrix that is analyzed as a graph using a novel algorithm. Details of the algorithm are provided in the paper cited below, by D. Baron and B. Ménard.
sequence(A, metrics=ALL_METRICS, scales=nothing)
or equivalently:
sequence(A)
as metrics=ALL_METRICS
and scales=nothing
are defaults. If you want to specify only one metric, you must wrap it in a 1-tuple. e.g. to use only KL Divergence, write:
sequence(A, metrics=(KLD,), scales=nothing
Use the scales
keyword to specify the number of "chunks" into which the data should be divided, as a tuple, e.g.
sequence(A, scales=(1,3,5))
In the autoscale
mode (enabled by default as scales=nothing
) SequencerJ will find its own "best scale" based on running the Sequencer against a sample of columns (10% for now) and picking the scale that results in the greatest elongation.
A 1-D grid may be provided. The grid - whose deltas figure in the distance calculations - must be non-negative real numbers (Float16, Float32, or Float64). The grid length must equal the number of rows in A.
julia> sequence(A; metrics=(WASS1D,L2), grid=collect(0.5:0.5:size(A,1))) # grid must equal the size of A along dim 1
The paper that describes the Sequencer algorithm and its applications can be found on Arxiv.
SequencerJ.elong
— Functionelong(r::SequencerResult)
Return the elongation coefficient of the final, weighted graph.
SequencerJ.order
— Functionorder(r::SequencerResult)
Return the result column indices, as determined by the Sequencer algorithm.
SequencerJ.mst
— Functionmst(r::SequencerResult)
Return the final minimum spanning tree graph from a Sequencer run.
SequencerJ.D
— FunctionD(r::SequencerResult)
Return the final distance matrix from the Sequencer run.
SequencerJ.EMD
— MethodEMD distance with with no grid provided. Grids default to axes(u,1) and axes(v,1).
SequencerJ.EMD
— MethodEMD(u::AbstractVector{T}, v::AbstractVector{T}) where {T <: Real}
Calculate the Earth Mover Distance (EMD) a.k.a the 1-Wasserstein distance between the two given vectors, accepting a default grid. u
and v
are treated as weights on the grid. The default grid is equal to the first axis of u
and v
.
u = rand(10);
u .= u / sum(u);
sum(u)
EMD
implements the Distances package convention of runnable types:
v = rand(10);
v .= v / sum(v);
d = EMD()(u,v)
SequencerJ.EMD
— MethodSame as EMD(u,v,uw,vw)
but using Distances-style runnable type syntax.
SequencerJ.emd
— FunctionConvenience method for EMD(u,v)
.
See EMD(u,v)
Convenience method for EMD(u,v,uw,vw)
.
See EMD(u,v,uw,vw)
SequencerJ.Energy
— MethodDefault constructor with no specified grid.
SequencerJ.Energy
— MethodEnergy(u,v)
Calculate Székely's energy distance between the two given vectors, accepting a default grid. u
and v
are treated as weights on the grid. The default grid is equal to the first axis of u
and v
.
SequencerJ.Energy
— MethodEnergy(u,v,uw,vw)
Calculate Székely's energy distance between the two given vectors, u
and v
, whose weights uw
, vw
are treated as a empirical cumulative distribution function (CDF). u
and v
must have the same length, respectively, as uw
and vw
.
SequencerJ.energy
— FunctionConvenience function for Energy with a supplied grid.
Convenience function for Energy with a default unit grid.
SequencerJ.elongation
— Functionjulia
elongation(g, startidx)
Returns the ratio of the graph g's half-length (mean of path distances) over the half-width, defined as the mean count of shortest paths from the center node of a minimum spanning tree over the graph. This function calls the dijkstra_shortest_paths
function in LightGraphs
.
SequencerJ.ensuregrid!
— FunctionEnsure that the size of the 1-D grid, if provided, is compatible with the data in A
. Create a grid if one was not provided.
SequencerJ.unroll
— FunctionWalk the given graph, starting from the given vertex, returning the list of all outbound vertices that are visited.
SequencerJ.prettyp
— FunctionPrint the sequence of outbound nodes of a graph starting from the given index. pystyle implies reversing the final order and substracting 1 from the vertex number.
Return the first and last len
elements of the vector as a string.
Default is to print 3 elements at head and tail.
julia> v = collect(1:10);
julia> prettyp(v)
"1,2,3...8,9,10"
Example w/5 elements visible
julia> v = collect(1:10);
julia> prettyp(v, 5)
"1,2,3,4,5...6,7,8,9,10"