FreqTables

Build status Coverage Status

This package allows computing one- or multi-way frequency tables (a.k.a. contingency or pivot tables) from any type of vector or array. It includes support for CategoricalArray and Tables.jl compliant objects, as well as for weighted counts.

Tables are represented as NamedArray objects.

julia> using FreqTables

julia> x = repeat(["a", "b", "c", "d"], outer=[100]);

julia> y = repeat(["A", "B", "C", "D"], inner=[10], outer=[10]);

julia> tbl = freqtable(x)
4-element Named Array{Int64,1}
Dim1  
──────┼────
a      100
b      100
c      100
d      100

julia> prop(tbl)
4-element Named Array{Float64,1}
Dim1  
──────┼─────
a      0.25
b      0.25
c      0.25
d      0.25

julia> freqtable(x, y)
4×4 Named Array{Int64,2}
Dim1  Dim2   A   B   C   D
────────────┼───────────────
a            30  20  30  20
b            30  20  30  20
c            20  30  20  30
d            20  30  20  30

julia> tbl2 = freqtable(x, y, subset=1:20)
4×2 Named Array{Int64,2}
Dim1  Dim2  A  B
────────────┼─────
a            3  2
b            3  2
c            2  3
d            2  3

julia> prop(tbl2, margins=2)
4×2 Named Array{Float64,2}
Dim1  Dim2    A    B
────────────┼─────────
a            0.3  0.2
b            0.3  0.2
c            0.2  0.3
d            0.2  0.3

julia> freqtable(x, y, subset=1:20, weights=repeat([1, .5], outer=[10]))
4×2 Named Array{Float64,2}
Dim1  Dim2    A    B
────────────┼─────────
a            3.0  2.0
b            1.5  1.0
c            2.0  3.0
d            1.0  1.5

For convenience, when working with tables (like e.g. a DataFrame) one can pass a table object and columns as symbols:

julia> using DataFrames, CSV

julia> iris = DataFrame(CSV.File(joinpath(dirname(pathof(DataFrames)), "../docs/src/assets/iris.csv")));

julia> iris.LongSepal = iris.SepalLength .> 5.0;

julia> freqtable(iris, :Species, :LongSepal)
3×2 Named Array{Int64,2}
Species  LongSepal  false   true
────────────────────┼─────────────
setosa                  28     22
versicolor               3     47
virginica                1     49

julia> freqtable(iris, :Species, :LongSepal, subset=iris.PetalLength .< 4.0)
2×2 Named Array{Int64,2}
Species  LongSepal  false   true
────────────────────┼─────────────
setosa                  28     22
versicolor               3      8

Note that when one of the input variables contains integers, Name(i) has to be used when indexing into the table to prevent i to be interpreted as a numeric index:

julia> df = DataFrame(A = 101:103, B = ["x","y","y"]);

julia> ft = freqtable(df, :A, :B)
3×2 Named Array{Int64,2}
Dim1  Dim2  x  y
────────────┼─────
101          1  0
102          0  1
103          0  1

julia> ft[Name(101), "x"]
1

julia> ft[101,"x"]
ERROR: BoundsError: attempt to access 10×2 Array{Int64,2} at index [101, 1]