FastGroupBy.fastby!Method
fastby(fn, b[, v])

Group by b then apply fn to v once grouped. If v is not provided then fn is applied b

FastGroupBy.fgroupreduceMethod
fgroupreduce(fn, byvec, valvec, init)

Group by byvec and apply reduce(fn, valvec, init = init) within each group of byvec

FastGroupBy.sumbyFunction

Perform sum by group

sumby(df::Union{AbstractDataFrame,NDSparse}, by::Symbol, val::Symbol)
sumby(by::AbstractVector  val::AbstractVector)

Arguments

  • df : an AbstractDataFrame/NDSparse from which to extract the by and val columns
  • by : data table column to group by
  • val: data table column to sum

Returns

  • ::Dict : A Dict that maps unqiues values of by to sum of val

Examples

using FastGroupBy
using DataFrames, IndexedTables, Compat, BenchmarkTools
import DataFrames.DataFrame

const N = 10_000_000; const K = 100

# sumby is faster for DataFrame without missings
srand(1);
@time df = DataFrame(id = rand(1:Int(round(N/K)), N), val = rand(round.(rand(K)*100,4), N));
@belapsed DataFrames.aggregate(df, :id, sum)
@belapsed sumby(df, :id, :val)
FastGroupBy._contiguousby_dictMethod
_contiguousby(fn, byvec, valvec)

Apply by-operation assuming that the vector is grouped i.e. elements that belong to the same group are stored contiguously

FastGroupBy._contiguousby_vecMethod

Apply by-operation assuming that the vector is grouped i.e. elements that belong to the same group by stored contiguously and return a vector

FastGroupBy._contiguousreduceMethod
_contiguousreduce(fn, byvec, valvec)

Apply by-operation assuming that the vector is grouped i.e. elements that belong to the same group are stored contiguously

FastGroupBy.selectMethod

select(:col)

Return a funciton that obtains a column with the named symbol from an AbstractDataFrame or NDSparse