# API

The exported symbols from this package define its interface. Some symbols from other packages are re-exported for convenience. Fields of objects with composite types should not be accessed directly; the internals of any given structure may change at any time and this would not be considered a breaking change.

## Fitting a model

`BetaRegression.BetaRegressionModel`

— Type`BetaRegressionModel{T,L1,L2,V,M} <: RegressionModel`

Type representing a regression model for beta-distributed response values in the open interval (0, 1), as described by Ferrari and Cribari-Neto (2004).

The mean response is linked to the linear predictor by a link function with type `L1 <: Link01`

, i.e. the link must map $(0, 1) \mapsto \mathbb{R}$ and use the GLM package's interface for link functions. While there is no canonical link function for the beta regression model as there is for GLMs, logit is the most common choice.

The precision is transformed by a link function with type `L2 <: Link`

which should map $\mathbb{R} \mapsto \mathbb{R}$ or, ideally, $(0, \infty) \mapsto \mathbb{R}$ because the precision must be positive. The most common choices are the identity, log, and square root links.

`BetaRegression.BetaRegressionModel`

— Method```
BetaRegressionModel(X, y, link=LogitLink(), precisionlink=IdentityLink();
weights=nothing, offset=nothing)
```

Construct a `BetaRegressionModel`

object with the given model matrix `X`

, response `y`

, mean link function `link`

, precision link function `precisionlink`

, and optionally `weights`

and `offset`

. Note that the returned object is not fit until `fit!`

is called on it.

Support for user-provided weights is currently incomplete; passing a value other than `nothing`

or an empty array for `weights`

will result in an error for now.

`StatsAPI.fit`

— Method```
fit(BetaRegressionModel, formula, data, link=LogitLink(), precisionlink=IdentityLink();
kwargs...)
```

Fit a `BetaRegressionModel`

to the given table `data`

, which may be any Tables.jl-compatible table (e.g. a `DataFrame`

), using the given `formula`

, which can be constructed using `@formula`

. In this method, the response and model matrix are determined from the formula and table. It is also possible to provide them explicitly.

```
fit(BetaRegressionModel, X::AbstractMatrix, y::AbstractVector, link=LogitLink(),
precisionlink=IdentityLink(); kwargs...)
```

Fit a beta regression model using the provided model matrix `X`

and response vector `y`

. In both of these methods, a link function may be provided, otherwise the default logit link is used. Similarly, a link for the precision may be provided, otherwise the default identity link is used.

**Keyword Arguments**

`weights`

: A vector of weights or`nothing`

(default). Currently only`nothing`

is accepted.`offset`

: An offset vector to be added to the linear predictor or`nothing`

(default).`maxiter`

: Maximum number of Fisher scoring iterations to use when fitting. Default is 100.`atol`

: Absolute tolerance to use when checking for model convergence. Default is`sqrt(eps(T))`

where`T`

is the type of the estimates.`rtol`

: Relative tolerance to use when checking for convergence. Default is the Base default relative tolerance for`T`

.

If you experience convergence issues, you may consider trying a different link for the precision; `LogLink()`

is a common choice. Increasing the maximum number of iterations may also be beneficial, especially when working with `Float32`

.

`StatsAPI.fit!`

— Method`fit!(b::BetaRegressionModel{T}; maxiter=100, atol=sqrt(eps(T)), rtol=Base.rtoldefault(T))`

Fit the given `BetaRegressionModel`

, updating its values in-place. If model convergence is achieved, `b`

is returned, otherwise a `ConvergenceException`

is thrown.

Fitting the model consists of computing the maximum likelihood estimates for the coefficients and precision parameter via Fisher scoring with analytic derivatives. The model is determined to have converged when the score vector, i.e. the vector of first partial derivatives of the log likelihood with respect to the parameters, is approximately zero. This is determined by `isapprox`

using the specified `atol`

and `rtol`

. `maxiter`

dictates the maximum number of Fisher scoring iterations.

## Properties of a model

`StatsAPI.aic`

— Function`aic(model::StatisticalModel)`

Akaike's Information Criterion, defined as $-2 \log L + 2k$, with $L$ the likelihood of the model, and `k`

its number of consumed degrees of freedom (as returned by `dof`

).

`StatsAPI.aicc`

— Function`aicc(model::StatisticalModel)`

Corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989), defined as $-2 \log L + 2k + 2k(k-1)/(n-k-1)$, with $L$ the likelihood of the model, $k$ its number of consumed degrees of freedom (as returned by `dof`

), and $n$ the number of observations (as returned by `nobs`

).

`StatsAPI.bic`

— Function`StatsAPI.coef`

— Method`StatsAPI.coefnames`

— Method`coefnames(model::TableRegressionModel{<:BetaRegressionModel})`

For a `BetaRegressionModel`

fit using a table and `@formula`

, return the names of the coefficients as a vector of strings. The precision term is included as the last element in the array and has name `"(Precision)"`

.

`StatsAPI.coeftable`

— Method`coeftable(model::BetaRegressionModel; level=0.95)`

Return a table of the point estimates of the model parameters, their respective standard errors, $z$-statistics, Wald $p$-values, and confidence intervals at the given `level`

. The precision parameter is included as the last row in the table.

The object returned by this function implements the Tables.jl interface for tabular data.

`StatsAPI.confint`

— Method`confint(model::BetaRegressionModel; level=0.95)`

For a model with $p$ regression coefficients, return a $(p + 1) \times 2$ matrix of confidence intervals for the estimated coefficients and precision at the given `level`

.

`StatsAPI.deviance`

— Method`deviance(model::BetaRegressionModel)`

Compute the deviance of the model, defined as the sum of the squared deviance residuals.

See also: `devresid`

`GLM.devresid`

— Method`devresid(model::BetaRegressionModel)`

Compute the signed deviance residuals of the model,

\[\mathrm{sgn}(y_i - \hat{y}_i) \sqrt{2 \lvert \ell(y_i, \hat{\phi}) - \ell(\hat{y}_i, \hat{\phi}) \rvert}\]

where $\ell$ denotes the log likelihood, $y_i$ is the $i$th observed value of the response, $\hat{y}_i$ is the $i$th fitted value, and $\hat{\phi}$ is the estimated common precision parameter.

See also: `deviance`

`StatsAPI.dof`

— Method`dof(model::BetaRegressionModel)`

Return the number of estimated parameters in the model. For a model with $p$ independent variables, this is $p + 1$, since the precision must also be estimated.

`StatsAPI.dof_residual`

— Method`StatsAPI.fitted`

— Function`fitted(model::RegressionModel)`

Return the fitted values of the model.

`StatsAPI.informationmatrix`

— Method`informationmatrix(model::BetaRegressionModel; expected=true)`

Compute the information matrix of the model. By default, this is the Fisher information, i.e. the expected value of the matrix of second partial derivatives of `loglikelihood`

with respect to each element of `params`

. Set `expected`

to `false`

to obtain the observed information.

`StatsAPI.linearpredictor`

— Function`linearpredictor(model::RegressionModel)`

Return the model's linear predictor, `Xβ`

where `X`

is the model matrix and `β`

is the vector of coefficients, or `Xβ + offset`

if the model was fit with an offset.

`GLM.Link`

— Method`Link(model::BetaRegressionModel)`

Return the link function $g$ that links the mean $\mu$ to the linear predictor $\eta$ by $\mu = g^{-1}(\eta)$.

`StatsAPI.loglikelihood`

— Function```
loglikelihood(model::StatisticalModel)
loglikelihood(model::StatisticalModel, observation)
```

Return the log-likelihood of the model.

With an `observation`

argument, return the contribution of `observation`

to the log-likelihood of `model`

.

If `observation`

is a `Colon`

, return a vector of each observation's contribution to the log-likelihood of the model. In other words, this is the vector of the pointwise log-likelihood contributions.

In general, `sum(loglikehood(model, :)) == loglikelihood(model)`

.

`StatsAPI.modelmatrix`

— Function`modelmatrix(model::RegressionModel)`

Return the model matrix (a.k.a. the design matrix).

`StatsAPI.nobs`

— Method`nobs(model::BetaRegressionModel)`

Return the effective number of observations used to fit the model. For weighted models, this is the number of nonzero weights, otherwise it's the number of elements of the response (or equivalently, the number of rows in the model matrix).

`StatsAPI.offset`

— Function`offset(model::RegressionModel)`

Return the offset used in the model, i.e. the term added to the linear predictor with known coefficient 1, or `nothing`

if the model was not fit with an offset.

`StatsAPI.params`

— Method`Base.precision`

— Method`precision(model::BetaRegressionModel)`

Return the estimated precision parameter, $\phi$, for the model. This function returns $\phi$ on the natural scale, *not* on the precision link scale. This parameter is estimated alongside the regression coefficients and is included in coefficient tables, where it *is* displayed on the precision link scale.

`BetaRegression.precisionlink`

— Function`precisionlink(model::BetaRegressionModel)`

Return the link function $h$ that links the precision $\phi$ to the estimated constant parameter $\theta_{p+1}$ such that $\phi = h^{-1}(\theta_{p+1})$.

`StatsAPI.predict`

— Function`predict(model::RegressionModel, [newX])`

Form the predicted response of `model`

. An object with new covariate values `newX`

can be supplied, which should have the same type and structure as that used to fit `model`

; e.g. for a GLM it would generally be a `DataFrame`

with the same variable names as the original predictors.

`StatsAPI.r2`

— Method```
r2(model::BetaRegressionModel)
r²(model::BetaRegressionModel)
```

Return the Pearson correlation between the linear predictor $\eta$ and the link-transformed response $g(y)$.

`StatsAPI.residuals`

— Function`residuals(model::RegressionModel)`

Return the residuals of the model.

`StatsAPI.response`

— Function`response(model::RegressionModel)`

Return the model response (a.k.a. the dependent variable).

`StatsAPI.responsename`

— Method`responsename(model::TableRegressionModel{<:BetaRegressionModel})`

For a `BetaRegressionModel`

fit using a table and `@formula`

, return a string containing the left hand side of the formula, i.e. the model's response.

`StatsAPI.score`

— Method`score(model::BetaRegressionModel)`

Compute the score vector of the model, i.e. the vector of first partial derivatives of `loglikelihood`

with respect to each element of `params`

.

See also: `informationmatrix`

`StatsAPI.stderror`

— Method`stderror(model::BetaRegressionModel)`

Return the standard errors of the estimated model parameters, including both the regression coefficients and the precision.

See also: `vcov`

`StatsAPI.vcov`

— Method`vcov(model::BetaRegressionModel)`

Compute the variance-covariance matrix of the model, i.e. the inverse of the Fisher information matrix.

See also: `stderror`

, `informationmatrix`

`StatsAPI.weights`

— Function`weights(model::StatisticalModel)`

Return the weights used in the model.

There is a subtlety here that bears repeating. The function `coef`

does *not* include the precision term, only the regression coefficients, so for a model with $p$ independent variables, `coef`

will return a vector of length $p$. A number of other functions, such as `informationmatrix`

, `vcov`

, `stderror`

, etc., *do* include the precision term, and thus will return an array with (non-singleton) dimension $p + 1$. While this difference may seem strange at first blush, the design was chosen intentionally to ensure that the model matrix and regression coefficient vector are conformable for multiplication. Use `params`

to retrieve the full parameter vector with length $p + 1$.

## Link functions

This package employs the system for link functions defined by the GLM.jl package. In short, each link function has its own concrete type which subtypes `Link`

. Some may actually subtype `Link01`

, which is itself a subtype of `Link`

; this denotes that the function's domain is the open unit interval, $(0, 1)$. Link functions are applied with `linkfun`

and their inverse is applied with `linkinv`

. Relevant docstrings from GLM.jl are reproduced below.

Any mention of "the" link function for a `BetaRegressionModel`

refers to that applied to the mean (at least in this document). However, despite only having one linear predictor, `BetaRegressionModel`

s actually have two link functions: one for the mean and one for the precision.

### Mean

`GLM.Link01`

— Type`Link01`

An abstract subtype of `Link`

which are links defined on (0, 1)

`GLM.LogitLink`

— Type`LogitLink`

The canonical `Link01`

for `Distributions.Bernoulli`

and `Distributions.Binomial`

. The inverse link, `linkinv`

, is the c.d.f. of the standard logistic distribution, `Distributions.Logistic`

.

`GLM.CauchitLink`

— Type`CauchitLink`

A `Link01`

corresponding to the standard Cauchy distribution, `Distributions.Cauchy`

.

`GLM.CloglogLink`

— Type`CloglogLink`

A `Link01`

corresponding to the extreme value (or log-Weibull) distribution. The link is the complementary log-log transformation, `log(1 - log(-μ))`

.

`GLM.ProbitLink`

— Type`ProbitLink`

A `Link01`

whose `linkinv`

is the c.d.f. of the standard normal distribution, `Distributions.Normal()`

.

### Precision

`GLM.IdentityLink`

— Type`IdentityLink`

The canonical `Link`

for the `Normal`

distribution, defined as `η = μ`

.

`GLM.InverseLink`

— Type`InverseLink`

The canonical `Link`

for `Distributions.Gamma`

distribution, defined as `η = inv(μ)`

.

`GLM.InverseSquareLink`

— Type`InverseSquareLink`

The canonical `Link`

for `Distributions.InverseGaussian`

distribution, defined as `η = inv(abs2(μ))`

.

`GLM.LogLink`

— Type`LogLink`

The canonical `Link`

for `Distributions.Poisson`

, defined as `η = log(μ)`

.

`GLM.PowerLink`

— Type`PowerLink`

A `Link`

defined as `η = μ^λ`

when `λ ≠ 0`

, and to `η = log(μ)`

when `λ = 0`

, i.e. the class of transforms that use a power function or logarithmic function.

Many other links are special cases of `PowerLink`

:

`IdentityLink`

when λ = 1.`SqrtLink`

when λ = 0.5.`LogLink`

when λ = 0.`InverseLink`

when λ = -1.`InverseSquareLink`

when λ = -2.

`GLM.SqrtLink`

— Type`SqrtLink`

A `Link`

defined as `η = √μ`

## Developer documentation

This section documents some functions that are *not* user facing (and are thus not exported) and may be removed at any time. They're included here for the benefit of anyone looking to contribute to the package and wondering how certain internals work. Other internal functions may be documented with comments in the source code rather than with docstrings; read the source directly for more information on those.

`BetaRegression.dmueta`

— Function`dmueta(link::Link, η)`

Return the second derivative of `linkinv`

, $\frac{\partial^2 \mu}{\partial \eta^2}$, of the link function `link`

evaluated at the linear predictor value `η`

. A method of this function must be defined for a particular link function in order to compute the observed information matrix.

`BetaRegression.initialize!`

— Function`initialize!(b::BetaRegressionModel)`

Initialize the given `BetaRegressionModel`

by computing starting points for the parameter estimates and return the updated model object. The initial estimates are based on those from a linear regression model with the same model matrix as `b`

but with `linkfun.(Link(b), response(b))`

as the response.

If the initial estimate of the precision is invalid (not strictly positive) then it is taken instead to be 1 prior to applying the precision link function.