AnovaGLM
To use anova
on GLM objects
, we need AnovaGLM.jl
.
using AnovaGLM
This function will export all functions from GLM
and related function in this package, including anova
, anova_lm
, anova_glm
.
Ordinary linear model
We first import the well-known iris
dataset from RDatasets
.
iris = dataset("datasets", "iris")
There's two way to perform ANOVA. anova_lm
accepts a formula and data like GLM.lm
.
anova_lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + Species), iris)
Analysis of Variance
Type 1 test / F test
SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth + Species
Table:
──────────────────────────────────────────────────────────────
DOF Exp.SS Mean Square F value Pr(>|F|)
──────────────────────────────────────────────────────────────
(Intercept) 1 5121.68 5121.68 54403.6419 <1e-99
SepalWidth 1 1.4122 1.4122 15.0011 0.0002
PetalLength 1 84.43 84.43 896.8059 <1e-62
PetalWidth 1 1.8834 1.8834 20.0055 <1e-04
Species 2 0.8889 0.4445 4.7212 0.0103
(Residuals) 144 13.56 0.0941
──────────────────────────────────────────────────────────────
We can specify the type of sum of squares by keyword argument type
. Let's use type II SS.
anova_lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + Species), iris; type = 2)
Analysis of Variance
Type 2 test / F test
SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth + Species
Table:
──────────────────────────────────────────────────────────────
DOF Exp.SS Mean Square F value Pr(>|F|)
──────────────────────────────────────────────────────────────
(Intercept) 1 5121.68 5121.68 54403.6419 <1e-99
SepalWidth 1 3.1250 3.1250 33.1945 <1e-07
PetalLength 1 13.79 13.79 146.4310 <1e-22
PetalWidth 1 0.4090 0.4090 4.3448 0.0389
Species 2 0.8889 0.4445 4.7212 0.0103
(Residuals) 144 13.56 0.0941
──────────────────────────────────────────────────────────────
A StatsModels.TableRegressionModel
object is fitted and stored in the output of anova_lm
.
We can fit a model first and call anova
instead. anova
stores the model as well.
It doesn't create a copy, so any in-place change of the original model should be noticed.
lm1 = lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + Species), iris)
anova(lm1; type = 3)
Analysis of Variance
Type 3 test / F test
SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth + Species
Table:
──────────────────────────────────────────────────────────
DOF Exp.SS Mean Square F value Pr(>|F|)
──────────────────────────────────────────────────────────
(Intercept) 1 5.6694 5.6694 60.2211 <1e-11
SepalWidth 1 3.1250 3.1250 33.1945 <1e-07
PetalLength 1 13.79 13.79 146.4310 <1e-22
PetalWidth 1 0.4090 0.4090 4.3448 0.0389
Species 2 0.8889 0.4445 4.7212 0.0103
(Residuals) 144 13.56 0.0941
──────────────────────────────────────────────────────────
Multiple models can be compared through the same function.
The checker for nested models is not implemented now, so it should be ensured that the later model is more complex than the previous one.
lms = nestedmodels(LinearModel, @formula(SepalLength ~ SepalWidth * Species), iris; dropcollinear = false)
anova(lms)
Analysis of Variance
Type 1 test / F test
Model 1: SepalLength ~ 0
Model 2: SepalLength ~ 1
Model 3: SepalLength ~ 1 + SepalWidth
Model 4: SepalLength ~ 1 + SepalWidth + Species
Model 5: SepalLength ~ 1 + SepalWidth + Species + SepalWidth & Species
Table:
─────────────────────────────────────────────────────────────────────────────────
DOF ΔDOF Res.DOF R² ΔR² Res.SS Exp.SS F value Pr(>|F|)
─────────────────────────────────────────────────────────────────────────────────
1 1 144 0 5223.85
2 2 1 144 -0.0000 -0.0000 102.17 5121.68 26485.3010 <1e-99
3 3 1 144 0.0138 0.0138 100.76 1.4122 7.3030 0.0077
4 5 2 144 0.7259 0.7121 28.00 72.75 188.1091 <1e-40
5 7 2 144 0.7274 0.0015 27.85 0.1572 0.4064 0.6668
─────────────────────────────────────────────────────────────────────────────────
This result is a little bit different from GLM.ftest
:
ftest(getproperty.(lms.model[2:end], :model)...)
F-test: 4 models fitted on 150 observations
────────────────────────────────────────────────────────────────────
DOF ΔDOF SSR ΔSSR R² ΔR² F* p(>F)
────────────────────────────────────────────────────────────────────
[1] 2 102.1683 0.0000
[2] 3 1 100.7561 -1.4122 0.0138 0.0138 2.0744 0.1519
[3] 5 2 28.0037 -72.7524 0.7259 0.7121 189.6512 <1e-40
[4] 7 2 27.8465 -0.1572 0.7274 0.0015 0.4064 0.6668
────────────────────────────────────────────────────────────────────
In anova
, the F value is calculated by dividing MSR (mean of ΔDeviance) with mean of RSS of the most complex model just like anova
in R, while in GLM.ftest
, the denominator is replaced by RSS of subsequent model.
Generalized linear models
quine = dataset("MASS", "quine")
We fit a negative binomial regression on quine
dataset from MASS
.
nbm = glm(@formula(Days ~ Eth + Sex + Age + Lrn), quine, NegativeBinomial(2.0), LogLink())
anova(nbm)
Analysis of Variance
Type 1 test / F test
Days ~ 1 + Eth + Sex + Age + Lrn
Table:
───────────────────────────────────────────────────────────
DOF ΔDeviance Mean ΔDev F value Pr(>|F|)
───────────────────────────────────────────────────────────
(Intercept) 1 3667.69 3667.69 2472.0306 <1e-89
Eth 1 19.92 19.92 13.4246 0.0004
Sex 1 2.8515 2.8515 1.9219 0.1679
Age 3 14.42 4.8074 3.2402 0.0241
Lrn 1 3.8781 3.8781 2.6139 0.1082
(Residuals) 139 206.23 1.4837
───────────────────────────────────────────────────────────
There's also anova_glm
similar to anova_lm
.
anova
will automatically select test from F-test or likelihood-ratio test depending on the type of distribution. For distribution of Bernoulli()
, Binomial()
, Poisson()
that have fixed dispersion, likelihood-ratio test is preferred. For other distribution, F-test is preferred.
We can specify test by keyword arguments test
or putting test in the first argument.
anova(nbm; test = FTest) # == anova(FTest, nbm)
Analysis of Variance
Type 1 test / F test
Days ~ 1 + Eth + Sex + Age + Lrn
Table:
───────────────────────────────────────────────────────────
DOF ΔDeviance Mean ΔDev F value Pr(>|F|)
───────────────────────────────────────────────────────────
(Intercept) 1 3667.69 3667.69 2472.0306 <1e-89
Eth 1 19.92 19.92 13.4246 0.0004
Sex 1 2.8515 2.8515 1.9219 0.1679
Age 3 14.42 4.8074 3.2402 0.0241
Lrn 1 3.8781 3.8781 2.6139 0.1082
(Residuals) 139 206.23 1.4837
───────────────────────────────────────────────────────────
The next one is an axample of logistic regression.
mtcars = dataset("datasets", "mtcars")
We want to predict if the AM
is 0 or 1. Let's use logistic regression with and without interaction terms, and compare this two models by likelihood-ratio test.
glm1 = glm(@formula(AM ~ Cyl + HP + WT), mtcars, Binomial(), LogitLink())
glm2 = glm(@formula(AM ~ Cyl * HP * WT), mtcars, Binomial(), LogitLink())
anova(glm1, glm2) # == anova(LRT, glm1, glm2)
Analysis of Variance
Type 1 test / Likelihood-ratio test
Model 1: AM ~ 1 + Cyl + HP + WT
Model 2: AM ~ 1 + Cyl + HP + WT + Cyl & HP + Cyl & WT + HP & WT + Cyl & HP & WT
Table:
──────────────────────────────────────────────────
DOF ΔDOF Res.DOF Deviance χ² Pr(>|χ²|)
──────────────────────────────────────────────────
1 4 29 9.8415
2 8 4 25 <1e-06 9.8415 0.0432
──────────────────────────────────────────────────
lrtest(glm1, glm2)
Likelihood-ratio test: 2 models fitted on 32 observations
────────────────────────────────────────────────────
DOF ΔDOF LogLik Deviance Chisq p(>Chisq)
────────────────────────────────────────────────────
[1] 4 -4.9207 9.8415
[2] 8 4 -0.0000 0.0000 9.8415 0.0432
────────────────────────────────────────────────────
This function works identically as StatsModels.lrtest
.