GLFixedEffectModels.jl
This package estimates generalized linear models with high dimensional categorical variables. It builds on Matthieu Gomez's FixedEffects.jl and Amrei Stammann's Alpaca.
Installation
] add GLFixedEffectModels
Example use
using GLFixedEffectModels, GLM, Distributions
using RDatasets
df = dataset("datasets", "iris")
df.binary = zeros(Float64, size(df,1))
df[df.SepalLength .> 5.0,:binary] .= 1.0
df.SpeciesStr = string.(df.Species)
idx = rand(1:3,size(df,1),1)
a = ["A","B","C"]
df.Random = vec([a[i] for i in idx])
m = @formula binary ~ SepalWidth + fe(Species)
x = nlreg(df, m, Binomial(), LogitLink(), start = [0.2] )
m = @formula binary ~ SepalWidth + PetalLength + fe(Species)
nlreg(df, m, Binomial(), LogitLink(), Vcov.cluster(:SpeciesStr,:Random) , start = [0.2, 0.2] )
Documentation
The main function is nlreg()
, which returns a GLFixedEffectModel <: RegressionModel
.
nlreg(df, formula::FormulaTerm,
distribution::Distribution,
link::GLM.Link,
vcov::CovarianceEstimator; ...)
The required arguments are:
df
: a Tableformula
: A formula created using@formula
.distribution
: ADistribution
. See the documentation of GLM.jl for valid distributions.link
: AGLM.Link
function. See the documentation of GLM.jl for valid link functions.vcov
: ACovarianceEstimator
to compute the variance-covariance matrix.
The optional arguments are:
save::Union{Bool, Symbol} = false
: Should residuals and eventual estimated fixed effects saved in a dataframe? Usesave = :residuals
to only save residuals. Usesave = :fe
to only save fixed effects.method::Symbol
: A symbol for the method. Default is:cpu
. Alternatively,:gpu
requiresCuArrays
. In this case, use the optiondouble_precision = false
to useFloat32
. This option is the same as for the FixedEffectModels.jl package.double_precision::Bool = true
: Uses 64-bit floats iftrue
, otherwise 32-bit.drop_singletons = true
: drop observations that are perfectly classified.contrasts::Dict = Dict()
An optional Dict of contrast codings for each categorical variable in theformula
. Any unspecified variables will haveDummyCoding
.maxiter::Integer = 1000
: Maximum number of iterations in the Newton-Raphson routine.maxiter_center::Integer = 10000
: Maximum number of iterations for centering procedure.double_precision::Bool
: Should the demeaning operation use Float64 rather than Float32? Default to true.dev_tol::Real
: Tolerance level for the first stopping condition of the maximization routine.rho_tol::Real
: Tolerance level for the stephalving in the maximization routine.step_tol::Real
: Tolerance level that accounts for rounding errors inside the stephalving routinecenter_tol::Real
: Tolerance level for the stopping condition of the centering algorithm. Default to 1e-8 ifdouble_precision = true
, 1e-6 otherwise.separation::Symbol = :ignore
: Method to detect/deal with separation. Currently supported values are:none
,:ignore
and:mu
.:none
checks for observations that are outside[separation_mu_lbound,separation_mu_ubound]
, and gives a warning, but does not do anything.:ignore
does not check (and may therefore be slightly faster than the other options).:mu
truncates mu atseparation_mu_lbound
orseparation_mu_ubound
.separation_mu_lbound::Real = -Inf
: Lower bound for the separation detection/correction heuristic (on mu). What a reasonable value would be depends on the model that you're trying to fit.separation_mu_ubound::Real = Inf
: Upper bound for the separation detection/correction heuristic.verbose::Bool = false
: Iftrue
, prints output on each iteration.
The function returns a GLFixedEffectModel
object which supports the StatsBase.RegressionModel
abstraction. It can be displayed in table form by using RegressionTables.jl.
Bias correction methods
The package experimentally supports bias correction methods for the following models:
- Binomial regression, Logit link, Two-way, Classic (Fernández-Val and Weidner (2016, 2018))
- Binomial regression, Probit link, Two-way, Classic (Fernández-Val and Weidner (2016, 2018))
- Binomial regression, Logit link, Two-way, Network (Hinz, Stammann and Wanner (2020) & Fernández-Val and Weidner (2016))
- Binomial regression, Probit link, Two-way, Network (Hinz, Stammann and Wanner (2020) & Fernández-Val and Weidner (2016))
- Binomial regression, Logit link, Three-way, Network (Hinz, Stammann and Wanner (2020))
- Binomial regression, Probit link, Three-way, Network (Hinz, Stammann and Wanner (2020))
- Poisson regression, Log link, Three-way, Network (Weidner and Zylkin (2021))
- Poisson regression, Log link, Two-way, Network (Weidner and Zylkin (2021))
Things that still need to be implemented
- Better default starting values
- Weights
- More options of dealing with separation
- Better StatsBase interface & prediction
- Better benchmarking
Related Julia packages
- FixedEffectModels.jl estimates linear models with high dimensional categorical variables (and with or without endogeneous regressors).
- FixedEffects.jl is a package for fast pseudo-demeaning operations using LSMR. Both this package and FixedEffectModels.jl build on this.
- Alpaca.jl is a wrapper to the Alpaca R package, which solves the same tasks as this package.
- GLM.jl estimates generalized linear models, but without explicit support for categorical regressors.
- Econometrics.jl provides routines to estimate multinomial logit and other models.
- RegressionTables.jl supports pretty printing of results from this package.
References
Fernández-Val, I. and Weidner, M., 2016. Individual and time effects in nonlinear panel models with large N, T. Journal of Econometrics, 192(1), pp.291-312.
Fernández-Val, I. and Weidner, M., 2018. Fixed effects estimation of large-T panel data models. Annual Review of Economics, 10, pp.109-138.
Fong, DC. and Saunders, M. (2011) LSMR: An Iterative Algorithm for Sparse Least-Squares Problems. SIAM Journal on Scientific Computing
Hinz, J., Stammann, A. and Wanner, J., 2021. State dependence and unobserved heterogeneity in the extensive margin of trade.
Stammann, A. (2018) Fast and Feasible Estimation of Generalized Linear Models with High-Dimensional k-way Fixed Effects. Mimeo, Heinrich-Heine University Düsseldorf
Weidner, M. and Zylkin, T., 2021. Bias and consistency in three-way gravity models. Journal of International Economics, 132, p.103513.