GroupNumbers

Installation

Install this package with Pkg.add("GroupNumbers")

Description

A family of iterators for grouping adjecent elements of the given iterator xs.

compare functionemits the grouped elementsemits the grouped indices
isequalgroupby2groupby2_indices
groupby2_dictgroupby2_dict_indicesalso emits key
isapproxgroupby_numbersgroupby_numbers_indices
groupby_numbers_dictgroupby_numbers_dict_indicesalso emits key
accepts optional emit parameter
  • groupby2YYYZZZ(xs; keyfunc=identity, compare=isequal)
  • groupby_numbersYYYZZZ(xs; keyfunc=identity, compare=isapprox, kwargs)

Here, "YYY" = "" or "_dict", and "ZZZ" = "" or "_indices".

Apply keyfunc function to each element of xs to compute the key for comparison. For default, keyfunc is identity, so the key is each element itself.

Compare the adjacent keys by compare function. While groupby2YYYZZZ family adopt isequal as the default compare function, groupby_numbersYYYZZZ family adopt isapprox with accompanying kwargs being supplied to the keyword parameters of the default isapprox function, allowing the control of the tolerance.

  • Unbranded iterators ("ZZZ" = "") emit the grouped elements.

  • The _indices alternatives ("ZZZ" = "_indices" ) emit the indices of the grouped elements.

  • Unbranded iterators ("YYY" = "") emit only the grouped elements or their indices. Optional keyword parameter emit may be specified to apply further transfomation to each element, although it will induce runtime dispatch and lower performance.

  • The _dict alternatives ("YYY" = "_dict" ) emit also the first keys.

Examples

Example 1: Groups characters in a string

groupby2(xs) is equivalent to IterTools.groupby(identity, xs).

Simple case

julia> collect(groupby2("AAAABBBCCD"))
4-element Vector{Vector{Char}}:
 ['A', 'A', 'A', 'A']
 ['B', 'B', 'B']
 ['C', 'C']
 ['D']
julia> using IterTools
julia> collect(IterTools.groupby(identity, "AAAABBBCCD")); # => same result

Emits keys

Use groupby2_dict(xs) if you need the keys.

julia> collect(groupby2_dict("AAAABBBCCD"))
4-element Vector{Tuple{Any, Vector{Char}}}:
 ('A', ['A', 'A', 'A', 'A'])
 ('B', ['B', 'B', 'B'])
 ('C', ['C', 'C'])
 ('D', ['D'])

Groups without case sensitive

Specify keyfunc optional parameter to a function that computes a key.

julia> collect(groupby2_dict("AaAABbBcCD"; keyfunc=uppercase))
4-element Vector{Tuple{Any, Vector{Char}}}:
 ('A', ['A', 'a', 'A', 'A'])
 ('B', ['B', 'b', 'B'])
 ('C', ['c', 'C'])
 ('D', ['D'])

Groups without case sensitive. Emits the grouped indices rather than the grouped elements.

julia> collect(groupby2_dict_indices("AaAABbBcCD", keyfunc=uppercase))
4-element Vector{Tuple{Any, Vector{Int64}}}:
 ('A', [1, 2, 3, 4])
 ('B', [5, 6, 7])
 ('C', [8, 9])
 ('D', [10])

Example 2: Groups integer numbers

Simple case

groupby2 and groupby_numbers can be used to group integer numbers.

julia> collect(groupby2([10,20,20,30]))
3-element Vector{Vector{Int64}}:
 [10]
 [20, 20]
 [30]
julia> collect(groupby_numbers([10,20,20,30])); # => same result

Emits keys

julia> collect(groupby2_dict([10,20,20,30]))
3-element Vector{Tuple{Any, Vector{Int64}}}:
 (10, [10])
 (20, [20, 20])
 (30, [30])
julia> collect(groupby_numbers_dict([10,20,20,30])); # => same result

Groups by absolute values

julia> collect(groupby2_dict([10,-20,20,30]; keyfunc=abs))
3-element Vector{Tuple{Any, Vector{Int64}}}:
 (10, [10])
 (20, [-20, 20])
 (30, [30])
julia> collect(groupby_numbers_dict([10,-20,20,30]; keyfunc=abs)); # => same result

Groups by absolute values. Emits the grouped indices rather than the grouped elements.

julia> collect(groupby2_dict_indices([10,-20,20,30]; keyfunc=abs))
3-element Vector{Tuple{Any, Vector{Int64}}}:
 (10, [1])
 (20, [2, 3])
 (30, [4])
julia> collect(groupby_numbers_dict_indices([10,-20,20,30]; keyfunc=abs)); # => same result

Example 3: Groups floating point numbers

Use groupby_numbersYYYZZZ rather than groupby2YYYZZZ to group floating point numbers.

Simple case.

groupby_numbersYYYZZZ groups floating point numbers with isapprox function by default.

julia> collect(groupby_numbers([ 2e-10, 2e-9, 2e-8, 2e-7 ] .+ 1))
3-element Vector{Vector{Float64}}:
 [1.0000000002, 1.000000002]
 [1.00000002]
 [1.0000002]

Adjusts tolerance with atol and rtol parameters.

Consult the manual of Base.isapprox for its keyword parameters such as atol and rtol.

julia> collect(groupby_numbers([ 2e-8, 2e-7, 2e-6, 2e-5 ] .+ 1; atol=1e-6))
3-element Vector{Vector{Float64}}:
 [1.00000002, 1.0000002]
 [1.000002]
 [1.00002]
julia> collect(groupby_numbers([ 16, 17, 19, 20 ]* 1e-4 .+ 1; atol=2e-4))
2-element Vector{Vector{Float64}}:
 [1.0016, 1.0017]
 [1.0019, 1.002]

Groups by their absolute values

julia> collect(groupby_numbers([ 1+2e-6, -1+2e-5, 1+2e-4, 1-2e-3 ]; 
        keyfunc=abs, rtol=1e-4))
3-element Vector{Vector{Float64}}:
 [1.000002, -0.99998]
 [1.0002]
 [0.998]

Emits the grouped indices rather than the grouped elements.

julia> collect(groupby_numbers_indices([ 1+2e-6, -1+2e-5, 1+2e-4, 1-2e-3 ]; 
        keyfunc=abs, rtol=1e-4))
3-element Vector{Vector{Int64}}:
 [1, 2]
 [3]
 [4]

Example 4: Groups noisy vectors

groupby_numbersYYYZZZ can be used to group an array of floating point numbers.

Groups array of vectors

Rotation preserves norm.

julia> using LinearAlgebra
julia> # Rotation matrix
       t=15; r15 = [ cosd(t) -sind(t); sind(t) cosd(t)]
2×2 Matrix{Float64}:
 0.965926  -0.258819
 0.258819   0.965926

julia> using IterTools
julia> vs1 = collect( Iterators.take(
                iterated(v -> (1+rand()*1e-8)*r15*v, [1, 0]), 5) )
5-element Vector{Vector}:
 [1, 0]
 [0.9659258323666292, 0.25881904673099826]
 [0.8660254177031013, 0.5000000080359436]
 [0.7071067969544697, 0.7071067969544694]
 [0.5000000112991584, 0.8660254233551546]

julia> # group by norm
       collect( groupby_numbers_indices(vs1; keyfunc=norm, atol=1e-6))
1-element Vector{Vector{Int64}}:
 [1, 2, 3, 4, 5]

Groups array of tuple consisting of vector and its norm

Calculate the vectors and their norms to avoid recalculate the latters.

julia> using LinearAlgebra

julia> vs1=vec( [ begin 
            v= [i1,i2] *(1+(rand()-0.5)*1e-8);
            (norm(v),v)
        end for i1 in -2:2, i2 in -2:2] );

julia> # sort by norm
       vs2=sort(vs1; by=first);

julia> # group by norm
       collect(groupby_numbers_dict_indices(vs2; keyfunc=first))
6-element Vector{Tuple{Any, Vector{Int64}}}:
 (0.0, [1])
 (0.9999999976242439, [2, 3, 4, 5])
 (1.4142135561654923, [6, 7, 8, 9])
 (1.999999991951223, [10, 11, 12, 13])
 (2.2360679691661827, [14, 15, 16, 17, 18, 19, 20, 21])
 (2.828427114159456, [22, 23, 24, 25])

See also