# API

`Effects.AutoInvLink`

`Effects.effects`

`Effects.effects!`

`Effects.emmeans`

`Effects.empairs`

`Effects.expand_grid`

`Effects.pooled_sem`

`Effects.AutoInvLink`

— Type`AutoInvLink`

Singleton type indicating that the inverse link should be automatically determined from the model type.

Automatic inverse link determination is implemented using package extensions, which are available beginning in Julia 1.9.

An error is thrown if the inverse link cannot be determined. This will always occur with Julia versions prior to 1.9, and will otherwise occur when no extension has been loaded that specifies the link function for the model type.

Currently, this is only implemented for GLM.jl and MixedModels.jl

`Effects.effects!`

— Method```
effects!(reference_grid::DataFrame, model::RegressionModel;
eff_col=nothing, err_col=:err, typical=mean, invlink=identity,
vcov=StatsBase.vcov)
```

Compute the `effects`

as specified in `formula`

.

Effects are the model predictions made using values given via the reference grid. For terms present in the model, but not in the reference grid, then the typical value of those predictors is used. (In other words, effects are conditional on the typical value.) The function for computing typical values is specified via `typical`

. Note that this is also applied to categorical contrasts, thus yielding an average of the contrast, weighted by the balance of levels in the data set used to fit the model.

`typical`

can either be a single, scalar-valued function (e.g. `mean`

) or a dictionary matching term symbols to scalar-valued functions. The use of a dictionary allows specifying different `typical`

functions for different input variables. In this case, `typical`

functions must be provided for all term variables except the intercept. If there is a single term that should be "typified" differently than others, then the use of `DataStructures.DefaultDict`

may be useful to create a default `typical`

value with only the exception explicitly specified. For example:

```
using DataStructures
typical = DefaultDict(() -> mean) # default to x -> mean(x)
typical[:sex] = v -> 0.0 # typical value for :sex
```

By default, the column corresponding to the response variable in the formula is overwritten with the effects, but an alternative column for the effects can be specified by `eff_col`

. Note that `eff_col`

is determined first by trying `StatsBase.responsename`

and then falling back to the `string`

representation of the model's formula's left-hand side. For models with a transformed response, whether in the original formula specification or via hints/contrasts, the name will be the display name of the resulting term and not the original variable. This convention also has the advantage of highlighting another aspect of the underlying method: effects are computed on the scale of the transformed response. If this column does not exist, it is created. Pointwise standard errors are written into the column specified by `err_col`

.

By default (`invlink=identity`

), effects are computed on the scale of the transformed response. For models with an explicit transformation, that transformation is the scale of the effects. For models with a link function, the scale of the effects is the *link* scale, i.e. after application of the link function. For example, effects for logitistic regression models are on the logit and not the probability scale.

If the inverse link is specified via `invlink`

, then effects and errors are computed on the original, untransformed scale via the delta method using automatic differentiation. This means that the `invlink`

function must be differentiable and should not involve inplace operations.

On Julia versions 1.9 or later, the special singleton value `AutoInvLink()`

can be used to specify that the appropriate inverse link should be determined automatically. In that case, a direct or analytic computation of the derivative is used when possible.

Effects are computed using the model's variance-covariance matrix, which is computed by default using `StatsBas.vcov`

. Alternative methods such as the sandwich estimator or robust estimators can be used by specifying `vcov`

, which should be a function of a single argument (the model) returning the estimated variance-covariance matrix. Vcov.jl and CovarianceMatrices.jl provide several possibilities as functions of multiple arguments and so it is necessary to curry when using these functions. For example

```
using Vcov
myvcov(x) = Base.Fix2(vcov, Vcov.robust())
```

The reference grid must contain columns for all predictors in the formula. (Interactions are computed automatically.) Contrasts must match the contrasts used to fit the model; using other contrasts will lead to undefined behavior.

Interaction terms are computed in the same way for any regression model: as the product of the lower-order terms. Typical values of lower terms thus propagate up into the interaction term in the same way that any value would.

The use of typical values for excluded effects differs from other approaches such as "partial effects" used in R packages like `remef`

. The R package `effects`

is similar in approach, but due to differing languages and licenses, no source code was inspected and there is no attempt at API compatibility or even similarity.

The approach for computing effect is based on the effects plots described here:

Fox, John (2003). Effect Displays in R for Generalised Linear Models. Journal of Statistical Software. Vol. 8, No. 15

See also `AutoInvLink`

.

`Effects.effects`

— Method```
effects(design::AbstractDict, model::RegressionModel;
eff_col=nothing, err_col=:err, typical=mean,
lower_col=:lower, upper_col=:upper, invlink=identity,
vcov=StatsBase.vcov, level=nothing)
```

Compute the `effects`

as specified by the `design`

.

This is a convenience wrapper for `effects!`

. Instead of specifying a reference grid, a dictionary containing the levels/values of each predictor is specified. This is then expanded into a reference grid representing a fully-crossed design. Additionally, two extra columns are created representing the lower and upper edges of the confidence interval. The default `level=nothing`

corresponds to [resp-err, resp+err], while `level=0.95`

corresponds to the 95% confidence interval.

See also `AutoInvLink`

.

`Effects.emmeans`

— Method```
emmeans(model::RegressionModel; eff_col=nothing, err_col=:err,
invlink=identity, levels=Dict(), dof=nothing)
```

Compute estimated marginal means, a.k.a. least-square (LS) means for a model.

By default, emmeans are computed for each level of each categorical variable along with the means of continuous variables. For centered terms, the center is used instead of the mean. Alternative levels can be specified with `levels`

.

`dof`

is used to compute the associated degrees of freedom for a particular margin. For regression models, the appropriate degrees of freedom are the same degrees of freedom as used for the Wald tests on the coefficients. These are typically the residual degrees of freedom (e.g., via `dof_residual`

). If `dof`

is a function, then it is called on `model`

and filled elementwise into the `dof`

column. Alternatively, `dof`

can be specified as a single number or vector of appropriate length. For example, in a mixed effect model with a large number of observations, `dof=Inf`

may be appropriate.

`invlink`

, `eff_col`

and `err_col`

work exactly as in `effects!`

.

Estimated marginal means are closely related to effects and are also known as least-square means. The functionality here is a convenience wrapper for `effects`

and maps onto the concept of least-square means as presented in e.g. the SAS documentation. There are several extensions available to estimated marginal means, related to when the marginalization occurs and how cells are weighted, but these are not currently supported. The documentation for the R package emmeans explains the background in more depth.

`Effects.empairs`

— Method```
empairs(model::RegressionModel; eff_col=nothing, err_col=:err,
invlink=identity, levels=Dict(), dof=nothing, padjust=identity)
empairs(df::AbstractDataFrame; eff_col, err_col=:err, padjust=identity)
```

Compute pairwise differences of estimated marginal means.

The method for `AbstractDataFrame`

acts on the results of `emmeans`

, while the method for `RegressionModel`

is a convenience wrapper that calls `emmeans`

internally.

The keyword arguments are generally the same as `emmeans`

.

By default, pairs are computed for all combinations of categorical variables and the means/centers of continuous variables. The contrast for a pair "a" vs "b" is represented in the column for the contrasted variable with `a > b`

. For variables that are the same within a pair (e.g., a continuous variable), the sole value is displayed as is.

If `dof`

is not `nothing`

, then p-values are also computed using a t-distribution with the resulting degrees of freedom. The results for `dof=Inf`

correspond to using a z-distribution, but the column names in the returned dataframe remain `t`

and `Pr(>|t|)`

.

If `padjust`

is provided, then it is used to compute adjust the p-values for multiple comparisons. `MultipleTesting.jl`

provides a number of useful possibilities for this.

`padjust`

is silently ignored if `dof`

is not provided.

This feature is experimental and the precise column names and presentation of contrasts/differences may change without being considered breaking.

The use of `invlink`

is subject to a number of interpretation subtleties. The EM means are computed on the scale of the linear predictor, then transformed to the scale of `invlink`

. The associated errors on the transformed scale are computed via the difference method. These estimates and errors are then used to compute the pairs. Test statistics are computed on the scale of these pairs. In general, these will not be the same as test statistics computed on the original scale. These subleties are discussed in the documentation for the R package `emmeans`

.

`Effects.expand_grid`

— Method`expand_grid(design)`

Compute a fully crossed reference grid.

`design`

can be either a `NamedTuple`

or a `Dict`

, where the keys represent the variables, i.e., column names in the resulting grid and the values are vectors of possible values that each variable can take on.

```
julia> expand_grid(Dict(:x => ["a", "b"], :y => 1:3))
6×2 DataFrame
Row │ y x
│ Int64 String
─────┼───────────────
1 │ 1 a
2 │ 2 a
3 │ 3 a
4 │ 1 b
5 │ 2 b
6 │ 3 b
```

`Effects.pooled_sem`

— Method`pooled_sem(sems...)`

Compute the pooled standard error of the mean.

This corresponds to the square root of the sum of the the squared SEMs, which in turn corresponds to the weighted root-mean-square of the underlying standard deviations.