GLFixedEffectModels.jl
This package estimates generalized linear models with high dimensional categorical variables. It builds on Matthieu Gomez's FixedEffects.jl and Amrei Stammann's Alpaca.
Installation
] add GLFixedEffectModels
Example use
using GLFixedEffectModels, GLM, Distributions
using RDatasets
df = dataset("datasets", "iris")
df.binary = zeros(Float64, size(df,1))
df[df.SepalLength .> 5.0,:binary] .= 1.0
df.SpeciesStr = string.(df.Species)
idx = rand(1:3,size(df,1),1)
a = ["A","B","C"]
df.Random = vec([a[i] for i in idx])
m = @formula binary ~ SepalWidth + fe(Species)
x = nlreg(df, m, Binomial(), LogitLink(), start = [0.2] )
m = @formula binary ~ SepalWidth + PetalLength + fe(Species)
nlreg(df, m, Binomial(), LogitLink(), Vcov.cluster(:SpeciesStr,:Random) , start = [0.2, 0.2] )
Documentation
The main function is nlreg()
, which returns a GLFixedEffectModel <: RegressionModel
.
nlreg(df, formula::FormulaTerm,
distribution::Distribution,
link::GLM.Link,
vcov::CovarianceEstimator; ...)
The required arguments are:
df
: a Tableformula
: A formula created using@formula
.distribution
: ADistribution
. See the documentation of GLM.jl for valid distributions.link
: ALink
function. See the documentation of GLM.jl for valid link functions.vcov
: ACovarianceEstimator
to compute the variance-covariance matrix.
The optional arguments are:
save::Union{Bool, Symbol} = false
: Should residuals and eventual estimated fixed effects saved in a dataframe? Usesave = :residuals
to only save residuals. Usesave = :fe
to only save fixed effects.method::Symbol
: A symbol for the method. Default is:cpu
. Alternatively,:gpu
requiresCuArrays
. In this case, use the optiondouble_precision = false
to useFloat32
. This option is the same as for the FixedEffectModels.jl package.contrasts::Dict = Dict()
An optional Dict of contrast codings for each categorical variable in theformula
. Any unspecified variables will haveDummyCoding
.maxiter::Integer = 1000
: Maximum number of iterations in the Newton-Raphson routine.maxiter_center::Integer = 10000
: Maximum number of iterations for centering procedure.double_precision::Bool
: Should the demeaning operation use Float64 rather than Float32? Default to true.dev_tol::Real
: Tolerance level for the first stopping condition of the maximization routine.rho_tol::Real
: Tolerance level for the stephalving in the maximization routine.step_tol::Real
: Tolerance level that accounts for rounding errors inside the stephalving routinecenter_tol::Real
: Tolerance level for the stopping condition of the centering algorithm. Default to 1e-8 ifdouble_precision = true
, 1e-6 otherwise.
Things that still need to be implemented
- Better default starting values
- Bias correction
- Weights
- Better StatsBase interface & prediction
- Better benchmarking
- Integration with RegressionTables.jl
Related Julia packages
- FixedEffectModels.jl estimates linear models with high dimensional categorical variables (and with or without endogeneous regressors).
- FixedEffects.jl is a package for fast pseudo-demeaning operations using LSMR. Both this package and FixedEffectModels.jl build on this.
- Alpaca.jl is a wrapper to the Alpaca R package, which solves the same tasks as this package.
- GLM.jl estimates generalized linear models, but without explicit support for categorical regressors.
- Econometrics.jl provides routines to estimate multinomial logit and other models.
- RegressionTables.jl will, in the future, support pretty printing of results from this package.
References
Fong, DC. and Saunders, M. (2011) LSMR: An Iterative Algorithm for Sparse Least-Squares Problems. SIAM Journal on Scientific Computing
Stammann, A. (2018) Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-way Fixed Effects. Mimeo, Heinrich-Heine University Düsseldorf