API

Types

SequencerJ.SequencerResultType
SequencerResult
    EOSeg::Dict{Tuple,Any} # list of elongation and orderings (BFS,DFS), one per Segment    
    EOAlgScale::Dict{Tuple,Any} # elongation and orderings for the cumulated weighted distances
    D::AbstractMatrix # final distance matrix
    mst::LightGraphs.AbstractGraph # final mst
    η::Real # final elongation
    order::AbstractVector #final ordering from bfs

Type containing the results of a Sequencer run.

Call accessor functions using the SequencerResult object. For example, to get the final column ordering of the data, call order :

A = rand(100,100);
r = sequence(A;);
order(r)
100-element Array{Int64,1}:
 100
  71
  95
  80
  98
  22
  10
  70
  66
  62
   ⋮
  84
  33
  78
  55
  50
  89
  19
  77
  94

Functions

SequencerJ.sequenceFunction
sequence(A::VecOrMat{T}; 
    scales=nothing,
    metrics=ALL_METRICS,
    grid=nothing,
    ) where {T <: Real}

Analyze the provided m x n matrix (or m vectors of vectors n) by applying one or more 1-dimensional statistical metrics to each column of data, possibly after dividing the data into smaller row sets. Columns are compared pairwise for each combination of metric and scale to create a n x n distance matrix that is analyzed as a graph using a novel algorithm. Details of the algorithm are provided in the paper cited below, by D. Baron and B. Ménard.

sequence(A, metrics=ALL_METRICS, scales=nothing)

or equivalently:

sequence(A)

as metrics=ALL_METRICS and scales=nothing are defaults. If you want to specify only one metric, you must wrap it in a 1-tuple. e.g. to use only KL Divergence, write:

sequence(A, metrics=(KLD,), scales=nothing

Use the scales keyword to specify the number of "chunks" into which the data should be divided, as a tuple, e.g.

sequence(A, scales=(1,3,5))

In the autoscale mode (enabled by default as scales=nothing) SequencerJ will find its own "best scale" based on running the Sequencer against a sample of columns (10% for now) and picking the scale that results in the greatest elongation.

A 1-D grid may be provided. The grid - whose deltas figure in the distance calculations - must be non-negative real numbers (Float16, Float32, or Float64). The grid length must equal the number of rows in A.

julia> sequence(A; metrics=(WASS1D,L2), grid=collect(0.5:0.5:size(A,1))) # grid must equal the size of A along dim 1

The paper that describes the Sequencer algorithm and its applications can be found on Arxiv.

SequencerJ.elongFunction
elong(r::SequencerResult)

Return the elongation coefficient of the final, weighted graph.

SequencerJ.orderFunction
order(r::SequencerResult)

Return the result column indices, as determined by the Sequencer algorithm.

SequencerJ.mstFunction
mst(r::SequencerResult)

Return the final minimum spanning tree graph from a Sequencer run.

SequencerJ.DFunction
D(r::SequencerResult)

Return the final distance matrix from the Sequencer run.

SequencerJ.EMDMethod

EMD distance with with no grid provided. Grids default to axes(u,1) and axes(v,1).

SequencerJ.EMDMethod
EMD(u::AbstractVector{T}, v::AbstractVector{T}) where {T <: Real}

Calculate the Earth Mover Distance (EMD) a.k.a the 1-Wasserstein distance between the two given vectors, accepting a default grid. u and v are treated as weights on the grid. The default grid is equal to the first axis of u and v.

u = rand(10);
u .= u / sum(u);
sum(u)

EMD implements the Distances package convention of runnable types:

v = rand(10);
v .= v / sum(v);
d = EMD()(u,v)
SequencerJ.EMDMethod

Same as EMD(u,v,uw,vw) but using Distances-style runnable type syntax.

SequencerJ.EnergyMethod
Energy(u,v)

Calculate Székely's energy distance between the two given vectors, accepting a default grid. u and v are treated as weights on the grid. The default grid is equal to the first axis of u and v.

SequencerJ.EnergyMethod
Energy(u,v,uw,vw)

Calculate Székely's energy distance between the two given vectors, u and v, whose weights uw, vw are treated as a empirical cumulative distribution function (CDF). u and v must have the same length, respectively, as uw and vw.

SequencerJ.energyFunction

Convenience function for Energy with a supplied grid.

Convenience function for Energy with a default unit grid.

SequencerJ.elongationFunction

julia

elongation(g, startidx)

Returns the ratio of the graph g's half-length (mean of path distances) over the half-width, defined as the mean count of shortest paths from the center node of a minimum spanning tree over the graph. This function calls the dijkstra_shortest_paths function in LightGraphs.

SequencerJ.ensuregrid!Function

Ensure that the size of the 1-D grid, if provided, is compatible with the data in A. Create a grid if one was not provided.

SequencerJ.unrollFunction

Walk the given graph, starting from the given vertex, returning the list of all outbound vertices that are visited.

SequencerJ.prettypFunction

Print the sequence of outbound nodes of a graph starting from the given index. pystyle implies reversing the final order and substracting 1 from the vertex number.

Return the first and last len elements of the vector as a string.

Default is to print 3 elements at head and tail.

julia> v = collect(1:10);
julia> prettyp(v)

"1,2,3...8,9,10"

Example w/5 elements visible

julia> v = collect(1:10);
julia> prettyp(v, 5)

"1,2,3,4,5...6,7,8,9,10"

Metrics