StandardizedMatrices

Build StatusBuild statuscodecov

Statisticians often work with standardized matrices. If x is a data matrix with observations in rows, we want to work with z = StatsBase.zscore(x, 1). This package defines a StandardizedMatrix type that treats a matrix as standardized without copying or changing data in place.

A Motivating Example

Suppose our original matrix is sparse and we want to perform matrix-vector multiplication with a standardized version. Typically, standardizing a sparse matrix destroys the sparsity.

using StatsBase, BenchmarkTools, StandardizedMatrices

# generate some data
n, p = 100_000, 1000
x = sprandn(n, p, .01)
β = randn(p)

xdense = zscore(x, 1)		# this destroys the sparsity
z = StandardizedMatrix(x)	# this acts as standardized, but keeps sparse benefits

b1 = @benchmark xdense * β
b2 = @benchmark z * β
ratio(median(b1), median(b2))  # StandardizedMatrix is roughly 13 times faster

Methods implemented:

  • *()
  • mul!(Y, A::StandardizedMatrix, B)