Frequentist Regression Models

CRRao.FrequentistRegressionType
FrequentistRegression{RegressionType}

Type to represent frequentist regression models returned by fit functions. This type is used internally by the package to represent all frequentist regression models. RegressionType is a Symbol representing the model class.

Linear Regression

StatsAPI.fitMethod
fit(formula::FormulaTerm, data::DataFrame, modelClass::LinearRegression)

Fit an OLS Linear Regression model on the input data. Uses the lm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LinearRegression}.

Example

julia> using CRRao, RDatasets, StatsPlots, StatsModels
julia> df = dataset("datasets", "mtcars")
32×12 DataFrame
 Row │ Model              MPG      Cyl    Disp     HP     DRat     WT       QSec     VS     AM     Gear   Carb  
     │ String31           Float64  Int64  Float64  Int64  Float64  Float64  Float64  Int64  Int64  Int64  Int64 
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Mazda RX4             21.0      6    160.0    110     3.9     2.62     16.46      0      1      4      4
   2 │ Mazda RX4 Wag         21.0      6    160.0    110     3.9     2.875    17.02      0      1      4      4
   3 │ Datsun 710            22.8      4    108.0     93     3.85    2.32     18.61      1      1      4      1
   4 │ Hornet 4 Drive        21.4      6    258.0    110     3.08    3.215    19.44      1      0      3      1
   5 │ Hornet Sportabout     18.7      8    360.0    175     3.15    3.44     17.02      0      0      3      2
   6 │ Valiant               18.1      6    225.0    105     2.76    3.46     20.22      1      0      3      1
  ⋮  │         ⋮             ⋮       ⋮       ⋮       ⋮       ⋮        ⋮        ⋮       ⋮      ⋮      ⋮      ⋮
  27 │ Porsche 914-2         26.0      4    120.3     91     4.43    2.14     16.7       0      1      5      2
  28 │ Lotus Europa          30.4      4     95.1    113     3.77    1.513    16.9       1      1      5      2
  29 │ Ford Pantera L        15.8      8    351.0    264     4.22    3.17     14.5       0      1      5      4
  30 │ Ferrari Dino          19.7      6    145.0    175     3.62    2.77     15.5       0      1      5      6
  31 │ Maserati Bora         15.0      8    301.0    335     3.54    3.57     14.6       0      1      5      8
  32 │ Volvo 142E            21.4      4    121.0    109     4.11    2.78     18.6       1      1      4      2
                                                                                                 20 rows omitted
julia> container = fit(@formula(MPG ~ HP + WT + Gear), df, LinearRegression())
Model Class: Linear Regression
Likelihood Mode: Gauss
Link Function: Identity
Computing Method: Optimization
────────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error      t  Pr(>|t|)   Lower 95%   Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept)  32.0137     4.63226      6.91    <1e-06  22.5249     41.5024
HP           -0.0367861  0.00989146  -3.72    0.0009  -0.0570478  -0.0165243
WT           -3.19781    0.846546    -3.78    0.0008  -4.93188    -1.46374
Gear          1.01998    0.851408     1.20    0.2410  -0.72405     2.76401
────────────────────────────────────────────────────────────────────────────
julia> coeftable(container)
────────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error      t  Pr(>|t|)   Lower 95%   Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept)  32.0137     4.63226      6.91    <1e-06  22.5249     41.5024
HP           -0.0367861  0.00989146  -3.72    0.0009  -0.0570478  -0.0165243
WT           -3.19781    0.846546    -3.78    0.0008  -4.93188    -1.46374
Gear          1.01998    0.851408     1.20    0.2410  -0.72405     2.76401
────────────────────────────────────────────────────────────────────────────
julia> sigma(container)
2.5741691724978972
julia> aic(container)
157.05277871921942
julia> predict(container)
32-element Vector{Float64}:
 23.668849952338718
 22.85340824320634
 25.253556140740894
 20.746171762311384
 17.635570543830177
 20.14663845388644
 14.644831040166633
 23.61182872351372
  ⋮
 16.340457241090512
 27.47793682112109
 26.922715039574857
 28.11844900519874
 17.264981908248554
 21.818065399379595
 13.374047477198516
 23.193986311384343
julia> residuals(container)
32-element Vector{Float64}:
 -2.668849952338718
 -1.8534082432063386
 -2.4535561407408935
  0.6538282376886144
  1.0644294561698224
 -2.0466384538864375
 -0.3448310401666319
  0.7881712764862776
  ⋮
  2.8595427589094875
 -0.1779368211210901
 -0.9227150395748573
  2.2815509948012576
 -1.4649819082485536
 -2.1180653993795957
  1.6259525228014837
 -1.7939863113843444
julia> plot(cooksdistance(container))

Logistic Regression

StatsAPI.fitMethod
fit(formula::FormulaTerm, data::DataFrame, modelClass::LogisticRegression, Link::Logit)

Fit a Logistic Regression model on the input data using the Logit link. Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LogisticRegression}.

Example

julia> using CRRao, RDatasets, StatsModels
julia> turnout = dataset("Zelig", "turnout")
2000×5 DataFrame
  Row │ Race   Age    Educate  Income   Vote  
      │ Cat…   Int32  Float64  Float64  Int32 
──────┼───────────────────────────────────────
    1 │ white     60     14.0   3.3458      1
    2 │ white     51     10.0   1.8561      0
    3 │ white     24     12.0   0.6304      0
    4 │ white     38      8.0   3.4183      1
    5 │ white     25     12.0   2.7852      1
    6 │ white     67     12.0   2.3866      1
  ⋮   │   ⋮      ⋮       ⋮        ⋮       ⋮
 1995 │ white     22      7.0   0.2364      0
 1996 │ white     26     16.0   3.3834      0
 1997 │ white     34     12.0   2.917       1
 1998 │ white     51     16.0   7.8949      1
 1999 │ white     22     10.0   2.4811      0
 2000 │ white     59     10.0   0.5523      0
                             1988 rows omitted
julia> container = fit(@formula(Vote ~ Age + Race + Income + Educate), turnout, LogisticRegression(), Logit())
Model Class: Logistic Regression
Likelihood Mode: Binomial
Link Function: Identity
Computing Method: Optimization
────────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error      z  Pr(>|z|)   Lower 95%   Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept)  -3.03426    0.325927    -9.31    <1e-19  -3.67307    -2.39546
Age           0.0283543  0.00346034   8.19    <1e-15   0.0215722   0.0351365
Race: white   0.250798   0.146457     1.71    0.0868  -0.0362521   0.537847
Income        0.177112   0.0271516    6.52    <1e-10   0.123896    0.230328
Educate       0.175634   0.0203308    8.64    <1e-17   0.135786    0.215481
────────────────────────────────────────────────────────────────────────────
julia> coeftable(container)
────────────────────────────────────────────────────────────────────────────
                  Coef.  Std. Error      z  Pr(>|z|)   Lower 95%   Upper 95%
────────────────────────────────────────────────────────────────────────────
(Intercept)  -3.03426    0.325927    -9.31    <1e-19  -3.67307    -2.39546
Age           0.0283543  0.00346034   8.19    <1e-15   0.0215722   0.0351365
Race: white   0.250798   0.146457     1.71    0.0868  -0.0362521   0.537847
Income        0.177112   0.0271516    6.52    <1e-10   0.123896    0.230328
Educate       0.175634   0.0203308    8.64    <1e-17   0.135786    0.215481
────────────────────────────────────────────────────────────────────────────
julia> loglikelihood(container)
-1011.9906318515575
julia> aic(container)
2033.981263703115
julia> bic(container)
2061.9857760008254
StatsAPI.fitMethod
fit(formula::FormulaTerm, data::DataFrame, modelClass::LogisticRegression, Link::Probit)

Fit a Logistic Regression model on the input data using the Probit link. Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LogisticRegression}.

StatsAPI.fitMethod
fit(formula::FormulaTerm, data::DataFrame, modelClass::LogisticRegression, Link::Cloglog)

Fit a Logistic Regression model on the input data using the Cloglog link. Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LogisticRegression}.

StatsAPI.fitMethod
fit(formula::FormulaTerm, data::DataFrame, modelClass::LogisticRegression, Link::Cauchit)

Fit a Logistic Regression model on the input data using the Cauchit link. Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:LogisticRegression}.

Negative Binomial Regression

StatsAPI.fitMethod
fit(formula::FormulaTerm, data::DataFrame, modelClass::NegBinomRegression)

Fit a Negative Binomial Regression model on the input data (with the default link function being the Log link). Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:NegativeBinomialRegression}.

Example

julia> using CRRao, RDatasets, StatsModels
julia> sanction = dataset("Zelig", "sanction")
78×8 DataFrame
 Row │ Mil    Coop   Target  Import  Export  Cost   Num    NCost         
     │ Int32  Int32  Int32   Int32   Int32   Int32  Int32  Cat…          
─────┼───────────────────────────────────────────────────────────────────
   1 │     1      4       3       1       1      4     15  major loss
   2 │     0      2       3       0       1      3      4  modest loss
   3 │     0      1       3       1       0      2      1  little effect
   4 │     1      1       3       1       1      2      1  little effect
   5 │     0      1       3       1       1      2      1  little effect
   6 │     0      1       3       0       1      2      1  little effect
  ⋮  │   ⋮      ⋮      ⋮       ⋮       ⋮       ⋮      ⋮          ⋮
  73 │     1      3       1       1       1      2     14  little effect
  74 │     0      2       1       0       0      1      2  net gain
  75 │     0      1       3       0       1      2      1  little effect
  76 │     0      4       3       1       0      2     13  little effect
  77 │     0      1       2       0       0      1      1  net gain
  78 │     1      3       1       1       1      2     10  little effect
                                                          66 rows omitted
julia> container = fit(@formula(Num ~ Target + Coop + NCost), sanction, NegBinomRegression())
Model Class: Count Regression
Likelihood Mode: Negative Binomial
Link Function: Log
Computing Method: Optimization
──────────────────────────────────────────────────────────────────────────────────
                          Coef.  Std. Error      z  Pr(>|z|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────────────────────────
(Intercept)         -1.14517       0.480887  -2.38    0.0172  -2.0877    -0.202652
Target               0.00862527    0.145257   0.06    0.9527  -0.276074   0.293324
Coop                 1.06397       0.115995   9.17    <1e-19   0.836621   1.29131
NCost: major loss   -0.23511       0.511443  -0.46    0.6457  -1.23752    0.7673
NCost: modest loss   1.30767       0.276012   4.74    <1e-05   0.766698   1.84865
NCost: net gain      0.183453      0.275387   0.67    0.5053  -0.356296   0.723202
──────────────────────────────────────────────────────────────────────────────────

Poisson Regression

StatsAPI.fitMethod
fit(formula::FormulaTerm, data::DataFrame, modelClass::PoissonRegression)

Fit a Poisson Regression model on the input data (with the default link function being the Log link). Uses the glm method from the GLM package under the hood. Returns an object of type FrequentistRegression{:PoissonRegression}.

Example

julia> using CRRao, RDatasets, StatsModels
julia> sanction = dataset("Zelig", "sanction")
78×8 DataFrame
 Row │ Mil    Coop   Target  Import  Export  Cost   Num    NCost         
     │ Int32  Int32  Int32   Int32   Int32   Int32  Int32  Cat…          
─────┼───────────────────────────────────────────────────────────────────
   1 │     1      4       3       1       1      4     15  major loss
   2 │     0      2       3       0       1      3      4  modest loss
   3 │     0      1       3       1       0      2      1  little effect
   4 │     1      1       3       1       1      2      1  little effect
   5 │     0      1       3       1       1      2      1  little effect
   6 │     0      1       3       0       1      2      1  little effect
  ⋮  │   ⋮      ⋮      ⋮       ⋮       ⋮       ⋮      ⋮          ⋮
  73 │     1      3       1       1       1      2     14  little effect
  74 │     0      2       1       0       0      1      2  net gain
  75 │     0      1       3       0       1      2      1  little effect
  76 │     0      4       3       1       0      2     13  little effect
  77 │     0      1       2       0       0      1      1  net gain
  78 │     1      3       1       1       1      2     10  little effect
                                                          66 rows omitted
julia> container = fit(@formula(Num ~ Target + Coop + NCost), sanction, PoissonRegression())
Model Class: Poisson Regression
Likelihood Mode: Poison
Link Function: Log
Computing Method: Optimization
─────────────────────────────────────────────────────────────────────────────────
                        Coef.  Std. Error      z  Pr(>|z|)   Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────────────────────
(Intercept)         -1.91392    0.261667   -7.31    <1e-12  -2.42678    -1.40106
Target               0.157769   0.0653822   2.41    0.0158   0.0296218   0.285915
Coop                 1.15127    0.0561861  20.49    <1e-92   1.04114     1.26139
NCost: major loss   -0.324051   0.230055   -1.41    0.1590  -0.774951    0.126848
NCost: modest loss   1.71973    0.100518   17.11    <1e-64   1.52272     1.91674
NCost: net gain      0.463907   0.16992     2.73    0.0063   0.13087     0.796944
─────────────────────────────────────────────────────────────────────────────────

Extended functions from StatsAPI.jl

StatsAPI.coeftableMethod
coeftable(container::FrequentistRegression)

Table of coefficients and other statistics of the model. Extends the coeftable method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get table of coefficients
coeftable(container)
StatsAPI.r2Method
r2(container::FrequentistRegression)

Coeffient of determination. Extends the r2 method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get r2
r2(container)
StatsAPI.adjr2Method
adjr2(container::FrequentistRegression)

Adjusted coeffient of determination. Extends the adjr2 method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get adjr2
adjr2(container)
StatsAPI.loglikelihoodMethod
loglikelihood(container::FrequentistRegression)

Log-likelihood of the model. Extends the loglikelihood method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get loglikelihood
adjr2(container)
StatsAPI.aicMethod
aic(container::FrequentistRegression)

Akaike's Information Criterion. Extends the aic method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get aic
aic(container)
StatsAPI.bicMethod
bic(container::FrequentistRegression)

Bayesian Information Criterion. Extends the bic method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get bic
bic(container)
CRRao.sigmaMethod
sigma(container::FrequentistRegression)

The sigma computes the residual standard error from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get sigma
sigma(container)
StatsAPI.predictMethod
predict(container::FrequentistRegression)

Predicted response of the model. Extends the predict method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get predicted response
predict(container)
StatsAPI.residualsMethod
residuals(container::FrequentistRegression)

Residuals of the model. Extends the residuals method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get residuals
residuals(container)
StatsAPI.cooksdistanceMethod
cooksdistance(container::FrequentistRegression)

Compute Cook's distance for each observation in a linear model. Extends the cooksdistance method from StatsAPI.jl.

Example

using CRRao, RDatasets, StatsModels

# Get the dataset
mtcars = dataset("datasets", "mtcars")

# Train the model
container = fit(@formula(MPG ~ HP + WT + Gear), mtcars, LinearRegression())

# Get vector of Cook's distances
cooksdistance(container)