AnovaGLM

To use anova on GLM objects , we need AnovaGLM.jl.

using AnovaGLM

This function will export all functions from GLM and related function in this package, including anova, anova_lm, anova_glm.

Ordinary linear model

We first import the well-known iris dataset from RDatasets.

iris = dataset("datasets", "iris")

There's two way to perform ANOVA. anova_lm accepts a formula and data like GLM.lm.

anova_lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + Species), iris)

Analysis of Variance

Type 1 test / F test

SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth + Species

Table:
──────────────────────────────────────────────────────────────
             DOF     Exp.SS  Mean Square     F value  Pr(>|F|)
──────────────────────────────────────────────────────────────
(Intercept)    1  5121.68      5121.68    54403.6419    <1e-99
SepalWidth     1     1.4122       1.4122     15.0011    0.0002
PetalLength    1    84.43        84.43      896.8059    <1e-62
PetalWidth     1     1.8834       1.8834     20.0055    <1e-04
Species        2     0.8889       0.4445      4.7212    0.0103
(Residuals)  144    13.56         0.0941                
──────────────────────────────────────────────────────────────

We can specify the type of sum of squares by keyword argument type. Let's use type II SS.

anova_lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + Species), iris; type = 2)

Analysis of Variance

Type 2 test / F test

SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth + Species

Table:
──────────────────────────────────────────────────────────────
             DOF     Exp.SS  Mean Square     F value  Pr(>|F|)
──────────────────────────────────────────────────────────────
(Intercept)    1  5121.68      5121.68    54403.6419    <1e-99
SepalWidth     1     3.1250       3.1250     33.1945    <1e-07
PetalLength    1    13.79        13.79      146.4310    <1e-22
PetalWidth     1     0.4090       0.4090      4.3448    0.0389
Species        2     0.8889       0.4445      4.7212    0.0103
(Residuals)  144    13.56         0.0941                
──────────────────────────────────────────────────────────────

A StatsModels.TableRegressionModel object is fitted and stored in the output of anova_lm.

We can fit a model first and call anova instead. anova stores the model as well.

Warn

It doesn't create a copy, so any in-place change of the original model should be noticed.

lm1 = lm(@formula(SepalLength ~ SepalWidth + PetalLength + PetalWidth + Species), iris)
anova(lm1; type = 3)

Analysis of Variance

Type 3 test / F test

SepalLength ~ 1 + SepalWidth + PetalLength + PetalWidth + Species

Table:
──────────────────────────────────────────────────────────
             DOF   Exp.SS  Mean Square   F value  Pr(>|F|)
──────────────────────────────────────────────────────────
(Intercept)    1   5.6694       5.6694   60.2211    <1e-11
SepalWidth     1   3.1250       3.1250   33.1945    <1e-07
PetalLength    1  13.79        13.79    146.4310    <1e-22
PetalWidth     1   0.4090       0.4090    4.3448    0.0389
Species        2   0.8889       0.4445    4.7212    0.0103
(Residuals)  144  13.56         0.0941              
──────────────────────────────────────────────────────────

Multiple models can be compared through the same function.

Note

The checker for nested models is not implemented now, so it should be ensured that the later model is more complex than the previous one.

lms = nestedmodels(LinearModel, @formula(SepalLength ~ SepalWidth * Species), iris; dropcollinear = false)
anova(lms)

Analysis of Variance

Type 1 test / F test

Model 1: SepalLength ~ 0
Model 2: SepalLength ~ 1
Model 3: SepalLength ~ 1 + SepalWidth
Model 4: SepalLength ~ 1 + SepalWidth + Species
Model 5: SepalLength ~ 1 + SepalWidth + Species + SepalWidth & Species

Table:
─────────────────────────────────────────────────────────────────────────────────
   DOF  ΔDOF  Res.DOF       R²      ΔR²   Res.SS     Exp.SS     F value  Pr(>|F|)
─────────────────────────────────────────────────────────────────────────────────
1    1            144   0                5223.85                           
2    2     1      144  -0.0000  -0.0000   102.17  5121.68    26485.3010    <1e-99
3    3     1      144   0.0138   0.0138   100.76     1.4122      7.3030    0.0077
4    5     2      144   0.7259   0.7121    28.00    72.75      188.1091    <1e-40
5    7     2      144   0.7274   0.0015    27.85     0.1572      0.4064    0.6668
─────────────────────────────────────────────────────────────────────────────────

This result is a little bit different from GLM.ftest:

ftest(getproperty.(lms.model[2:end], :model)...)

F-test: 4 models fitted on 150 observations
────────────────────────────────────────────────────────────────────
     DOF  ΔDOF       SSR      ΔSSR      R²     ΔR²        F*   p(>F)
────────────────────────────────────────────────────────────────────
[1]    2        102.1683            0.0000                          
[2]    3     1  100.7561   -1.4122  0.0138  0.0138    2.0744  0.1519
[3]    5     2   28.0037  -72.7524  0.7259  0.7121  189.6512  <1e-40
[4]    7     2   27.8465   -0.1572  0.7274  0.0015    0.4064  0.6668
────────────────────────────────────────────────────────────────────

In anova, the F value is calculated by dividing MSR (mean of ΔDeviance) with mean of RSS of the most complex model just like anova in R, while in GLM.ftest, the denominator is replaced by RSS of subsequent model.

Generalized linear models

quine = dataset("MASS", "quine")

We fit a negative binomial regression on quine dataset from MASS.

nbm = glm(@formula(Days ~ Eth + Sex + Age + Lrn), quine, NegativeBinomial(2.0), LogLink())
anova(nbm)

Analysis of Variance

Type 1 test / F test

Days ~ 1 + Eth + Sex + Age + Lrn

Table:
───────────────────────────────────────────────────────────
             DOF  ΔDeviance  Mean ΔDev    F value  Pr(>|F|)
───────────────────────────────────────────────────────────
(Intercept)    1  3667.69    3667.69    2472.0306    <1e-89
Eth            1    19.92      19.92      13.4246    0.0004
Sex            1     2.8515     2.8515     1.9219    0.1679
Age            3    14.42       4.8074     3.2402    0.0241
Lrn            1     3.8781     3.8781     2.6139    0.1082
(Residuals)  139   206.23       1.4837               
───────────────────────────────────────────────────────────

There's also anova_glm similar to anova_lm.

anova will automatically select test from F-test or likelihood-ratio test depending on the type of distribution. For distribution of Bernoulli(), Binomial(), Poisson() that have fixed dispersion, likelihood-ratio test is preferred. For other distribution, F-test is preferred.

We can specify test by keyword arguments test or putting test in the first argument.

anova(nbm; test = FTest) # == anova(FTest, nbm)

Analysis of Variance

Type 1 test / F test

Days ~ 1 + Eth + Sex + Age + Lrn

Table:
───────────────────────────────────────────────────────────
             DOF  ΔDeviance  Mean ΔDev    F value  Pr(>|F|)
───────────────────────────────────────────────────────────
(Intercept)    1  3667.69    3667.69    2472.0306    <1e-89
Eth            1    19.92      19.92      13.4246    0.0004
Sex            1     2.8515     2.8515     1.9219    0.1679
Age            3    14.42       4.8074     3.2402    0.0241
Lrn            1     3.8781     3.8781     2.6139    0.1082
(Residuals)  139   206.23       1.4837               
───────────────────────────────────────────────────────────

The next one is an axample of logistic regression.

mtcars = dataset("datasets", "mtcars")

We want to predict if the AM is 0 or 1. Let's use logistic regression with and without interaction terms, and compare this two models by likelihood-ratio test.

glm1 = glm(@formula(AM ~ Cyl + HP + WT), mtcars, Binomial(), LogitLink())
glm2 = glm(@formula(AM ~ Cyl * HP * WT), mtcars, Binomial(), LogitLink())
anova(glm1, glm2) # == anova(LRT, glm1, glm2)

Analysis of Variance

Type 1 test / Likelihood-ratio test

Model 1: AM ~ 1 + Cyl + HP + WT
Model 2: AM ~ 1 + Cyl + HP + WT + Cyl & HP + Cyl & WT + HP & WT + Cyl & HP & WT

Table:
──────────────────────────────────────────────────
   DOF  ΔDOF  Res.DOF  Deviance      χ²  Pr(>|χ²|)
──────────────────────────────────────────────────
1    4             29    9.8415             
2    8     4       25   <1e-06   9.8415     0.0432
──────────────────────────────────────────────────

lrtest(glm1, glm2)

Likelihood-ratio test: 2 models fitted on 32 observations
────────────────────────────────────────────────────
     DOF  ΔDOF   LogLik  Deviance   Chisq  p(>Chisq)
────────────────────────────────────────────────────
[1]    4        -4.9207    9.8415                   
[2]    8     4  -0.0000    0.0000  9.8415     0.0432
────────────────────────────────────────────────────

This function works identically as StatsModels.lrtest.