BitSetTuple

Build Status codecov

This package exports the BitSetTuple data structure and functions to manipulate them. A BitSetTuple is a tuple of BitSet objects that can be used to efficiently track multiple sets of low cardinality to which we have to concurrently apply set operations.

Usage

We can create a BitSetTuple using:

n = 5
BitSetTuple(n)

This will create a tuple of length 5 where all elements are BitSets containing integers 1 through 5. BitSetTuples can also be created by directly passing vectors or tuples of integers:

BitSetTuple([[1,2,3], [2,3,4], [4,5,6]])

BitSetTuple(((1,2,3), (2,3,4), (4,5,6)))

Example Usage

Say we want to track, for a set of graphs over the same vertex set, which edges exist in all graphs. First, we will generate a collection of graphs:

using Graphs

n_graphs = 20
n_vertices = 30
e_prob = 0.8

function random_graph(n, e_prob)
    g = Graph(n)
    for i in 1:n
        for j in i:n
            if rand() < e_prob
                add_edge!(g, i, j)
            end
        end
    end
    return g
end
graphs = []
for _ in 1:n_graphs
    g = random_graph(n_vertices, e_prob)
    push!(graphs, g)
end

Next, we will write a function that uses a BitSetTuple to track the edges that exist in all graphs:

using BitSetTuples

function edges_in_all_graphs(graphs, n_vertices)
    edges = BitSetTuple(n_vertices)
    for graph in graphs
        g_edges = BitSetTuple(Graphs.neighbors.(Ref(graph), 1:n_vertices))
        intersect!(edges, g_edges)
    end
    return edges
end

We can use BenchmarkTools to benchmark our BitSetTuple implementation against a naive implementation that uses a Set under the hood:

using BenchmarkTools

# naive implementation
edges_in_all_graphs(graphs) = intersect(Set.(collect.(edges.(graphs)))...)

@benchmark edges_in_all_graphs(graphs, n_vertices)
@benchmark edges_in_all_graphs(graphs)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):   80.500 μs …   6.431 ms  ┊ GC (min … max): 0.00% … 93.84%
 Time  (median):      82.375 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   120.328 μs ± 246.504 μs  ┊ GC (mean ± σ):  8.47% ±  5.09%

  █▄▂▃▂▂▄▃▁           ▃▁                                        ▁
  ██████████▇▇▇▇▇▇█▇▇████▇▇▇▆▇▆▆▅▅▄▅▅▅▆▅▄▄▅▄▄▃▃▅▁▃▄▄▄▄▃▃▄▃▁▃▄▄▄ █
  80.5 μs       Histogram: log(frequency) by time        487 μs <

 Memory estimate: 132.28 KiB, allocs estimate: 2091.


BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  151.834 μs …   8.156 ms  ┊ GC (min … max):  0.00% … 96.01%
 Time  (median):     188.458 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   280.975 μs ± 386.313 μs  ┊ GC (mean ± σ):  12.22% ±  9.57%

  █▆▅▅▃▃▂▂▁▃▂▁                                                  ▂
  ██████████████▆▆▆▆▆▅▆▄▅▅▄▅▅▄▅▁▄▃▄▄▁▃▁▄▁▃▃▃▃▁▄▃▃▁▄▁▄▁▄▄▁▁▄▁▄▁▅ █
  152 μs        Histogram: log(frequency) by time       2.52 ms <

 Memory estimate: 501.38 KiB, allocs estimate: 371.

We see that our implementation using BitSetTuples is faster and more memory efficient. However, we do allocate significantly more often than the naive implementation, suggesting there might be additional optimizations possible in this package.