Singleton type indicating that the inverse link should be automatically determined from the model type.
Automatic inverse link determination is implemented using package extensions, which are available beginning in Julia 1.9.
An error is thrown if the inverse link cannot be determined. This will always occur with Julia versions prior to 1.9, and will otherwise occur when no extension has been loaded that specifies the link function for the model type.
Currently, this is only implemented for GLM.jl and MixedModels.jl
eff_col=nothing, err_col=:err, typical=mean, invlink=identity,
effects as specified in
Effects are the model predictions made using values given via the reference grid. For terms present in the model, but not in the reference grid, then the typical value of those predictors is used. (In other words, effects are conditional on the typical value.) The function for computing typical values is specified via
typical. Note that this is also applied to categorical contrasts, thus yielding an average of the contrast, weighted by the balance of levels in the data set used to fit the model.
typical can either be a single, scalar-valued function (e.g.
mean) or a dictionary matching term symbols to scalar-valued functions. The use of a dictionary allows specifying different
typical functions for different input variables. In this case,
typical functions must be provided for all term variables except the intercept. If there is a single term that should be "typified" differently than others, then the use of
DataStructures.DefaultDict may be useful to create a default
typical value with only the exception explicitly specified. For example:
typical = DefaultDict(() -> mean) # default to x -> mean(x)
typical[:sex] = v -> 0.0 # typical value for :sex
By default, the column corresponding to the response variable in the formula is overwritten with the effects, but an alternative column for the effects can be specified by
eff_col. Note that
eff_col is determined first by trying
StatsBase.responsename and then falling back to the
string representation of the model's formula's left-hand side. For models with a transformed response, whether in the original formula specification or via hints/contrasts, the name will be the display name of the resulting term and not the original variable. This convention also has the advantage of highlighting another aspect of the underlying method: effects are computed on the scale of the transformed response. If this column does not exist, it is created. Pointwise standard errors are written into the column specified by
By default (
invlink=identity), effects are computed on the scale of the transformed response. For models with an explicit transformation, that transformation is the scale of the effects. For models with a link function, the scale of the effects is the link scale, i.e. after application of the link function. For example, effects for logitistic regression models are on the logit and not the probability scale.
If the inverse link is specified via
invlink, then effects and errors are computed on the original, untransformed scale via the delta method using automatic differentiation. This means that the
invlink function must be differentiable and should not involve inplace operations.
On Julia versions 1.9 or later, the special singleton value
AutoInvLink() can be used to specify that the appropriate inverse link should be determined automatically. In that case, a direct or analytic computation of the derivative is used when possible.
Effects are computed using the model's variance-covariance matrix, which is computed by default using
StatsBas.vcov. Alternative methods such as the sandwich estimator or robust estimators can be used by specifying
vcov, which should be a function of a single argument (the model) returning the estimated variance-covariance matrix. Vcov.jl and CovarianceMatrices.jl provide several possibilities as functions of multiple arguments and so it is necessary to curry when using these functions. For example
myvcov(x) = Base.Fix2(vcov, Vcov.robust())
The reference grid must contain columns for all predictors in the formula. (Interactions are computed automatically.) Contrasts must match the contrasts used to fit the model; using other contrasts will lead to undefined behavior.
Interaction terms are computed in the same way for any regression model: as the product of the lower-order terms. Typical values of lower terms thus propagate up into the interaction term in the same way that any value would.
The use of typical values for excluded effects differs from other approaches such as "partial effects" used in R packages like
remef. The R package
effects is similar in approach, but due to differing languages and licenses, no source code was inspected and there is no attempt at API compatibility or even similarity.
The approach for computing effect is based on the effects plots described here:
Fox, John (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software. Vol. 8, No. 15
eff_col=nothing, err_col=:err, typical=mean,
lower_col=:lower, upper_col=:upper, invlink=identity,
effects as specified by the
This is a convenience wrapper for
effects!. Instead of specifying a reference grid, a dictionary containing the levels/values of each predictor is specified. This is then expanded into a reference grid representing a fully-crossed design. Additionally, two extra columns are created representing the lower and upper edges of the confidence interval. The default
level=nothing corresponds to [resp-err, resp+err], while
level=0.95 corresponds to the 95% confidence interval.
emmeans(model::RegressionModel; eff_col=nothing, err_col=:err,
invlink=identity, levels=Dict(), dof=nothing)
Compute estimated marginal means, a.k.a. least-square (LS) means for a model.
By default, emmeans are computed for each level of each categorical variable along with the means of continuous variables. For centered terms, the center is used instead of the mean. Alternative levels can be specified with
dof is used to compute the associated degrees of freedom for a particular margin. For regression models, the appropriate degrees of freedom are the same degrees of freedom as used for the Wald tests on the coefficients. These are typically the residual degrees of freedom (e.g., via
dof is a function, then it is called on
model and filled elementwise into the
dof column. Alternatively,
dof can be specified as a single number or vector of appropriate length. For example, in a mixed effect model with a large number of observations,
dof=Inf may be appropriate.
err_col work exactly as in
Estimated marginal means are closely related to effects and are also known as least-square means. The functionality here is a convenience wrapper for
effects and maps onto the concept of least-square means as presented in e.g. the SAS documentation. There are several extensions available to estimated marginal means, related to when the marginalization occurs and how cells are weighted, but these are not currently supported. The documentation for the R package emmeans explains the background in more depth.
empairs(model::RegressionModel; eff_col=nothing, err_col=:err,
invlink=identity, levels=Dict(), dof=nothing, padjust=identity)
empairs(df::AbstractDataFrame; eff_col, err_col=:err, padjust=identity)
Compute pairwise differences of estimated marginal means.
The method for
AbstractDataFrame acts on the results of
emmeans, while the method for
RegressionModel is a convenience wrapper that calls
The keyword arguments are generally the same as
By default, pairs are computed for all combinations of categorical variables and the means/centers of continuous variables. The contrast for a pair "a" vs "b" is represented in the column for the contrasted variable with
a > b. For variables that are the same within a pair (e.g., a continuous variable), the sole value is displayed as is.
dof is not
nothing, then p-values are also computed using a t-distribution with the resulting degrees of freedom. The results for
dof=Inf correspond to using a z-distribution, but the column names in the returned dataframe remain
padjust is provided, then it is used to compute adjust the p-values for multiple comparisons.
MultipleTesting.jl provides a number of useful possibilities for this.
padjust is silently ignored if
dof is not provided.
This feature is experimental and the precise column names and presentation of contrasts/differences may change without being considered breaking.
The use of
invlink is subject to a number of interpretation subtleties. The EM means are computed on the scale of the linear predictor, then transformed to the scale of
invlink. The associated errors on the transformed scale are computed via the difference method. These estimates and errors are then used to compute the pairs. Test statistics are computed on the scale of these pairs. In general, these will not be the same as test statistics computed on the original scale. These subleties are discussed in the documentation for the R package
Compute a fully crossed reference grid.
design can be either a
NamedTuple or a
Dict, where the keys represent the variables, i.e., column names in the resulting grid and the values are vectors of possible values that each variable can take on.
julia> expand_grid(Dict(:x => ["a", "b"], :y => 1:3))
Row │ y x
│ Int64 String
1 │ 1 a
2 │ 2 a
3 │ 3 a
4 │ 1 b
5 │ 2 b
6 │ 3 b
Compute the pooled standard error of the mean.
This corresponds to the square root of the sum of the the squared SEMs, which in turn corresponds to the weighted root-mean-square of the underlying standard deviations.