FastGroupBy.fastby!
— Methodfastby(fn, b[, v])
Group by b
then apply fn
to v
once grouped. If v
is not provided then fn
is applied b
FastGroupBy.fastby
— MethodFast Group By algorithm
FastGroupBy.fastby
— Methodfast(fn, df, bycol; name = [])
FastGroupBy.fgroupreduce
— Methodfgroupreduce(fn, byvec, valvec, init)
Group by byvec
and apply reduce(fn, valvec, init = init)
within each group of byvec
FastGroupBy.sumby
— FunctionPerform sum by group
sumby(df::Union{AbstractDataFrame,NDSparse}, by::Symbol, val::Symbol)
sumby(by::AbstractVector val::AbstractVector)
Arguments
df
: an AbstractDataFrame/NDSparse from which to extract the by and val columnsby
: data table column to group byval
: data table column to sum
Returns
::Dict
: A Dict that maps unqiues values of by to sum of val
Examples
using FastGroupBy
using DataFrames, IndexedTables, Compat, BenchmarkTools
import DataFrames.DataFrame
const N = 10_000_000; const K = 100
# sumby is faster for DataFrame without missings
srand(1);
@time df = DataFrame(id = rand(1:Int(round(N/K)), N), val = rand(round.(rand(K)*100,4), N));
@belapsed DataFrames.aggregate(df, :id, sum)
@belapsed sumby(df, :id, :val)
FastGroupBy.sumby_contiguous
— Methodsumby assuming that the elements are organised contiguously; it does not perform a check
FastGroupBy.sumby_dict
— Methodsumby using Dict - can be quite slow due to slow hash table operations
FastGroupBy.sumby_radixgroup!
— Methodsumby by using radix and counting sort to group by; it's only a partial sort. It's faster for large by
FastGroupBy.sumby_radixsort!
— Methodsumby by sorting the by column using radixsort
FastGroupBy.sumby_sortperm
— MethodThis is faster for smaller by and also doesn't change the input
IndexedTables.column
— Methodcolumn(df, :colname)
Extract a column from an AbstractDataFrame
FastGroupBy._contiguousby_dict
— Method_contiguousby(fn, byvec, valvec)
Apply by-operation assuming that the vector is grouped i.e. elements that belong to the same group are stored contiguously
FastGroupBy._contiguousby_vec
— MethodApply by-operation assuming that the vector is grouped i.e. elements that belong to the same group by stored contiguously and return a vector
FastGroupBy._contiguousreduce
— Method_contiguousreduce(fn, byvec, valvec)
Apply by-operation assuming that the vector is grouped i.e. elements that belong to the same group are stored contiguously
FastGroupBy._fastby!
— MethodInternal: single-function fastby, one by, one val
FastGroupBy.genca
— MethodGenerate CategoricalArrays
FastGroupBy.select
— Methodselect(:col)
Return a funciton that obtains a column with the named symbol from an AbstractDataFrame or NDSparse