The exported symbols from this package define its interface. Some symbols from other packages are re-exported for convenience. Fields of objects with composite types should not be accessed directly; the internals of any given structure may change at any time and this would not be considered a breaking change.

Fitting a model

BetaRegressionModel{T,L1,L2,V,M} <: RegressionModel

Type representing a regression model for beta-distributed response values in the open interval (0, 1), as described by Ferrari and Cribari-Neto (2004).

The mean response is linked to the linear predictor by a link function with type L1 <: Link01, i.e. the link must map $(0, 1) \mapsto \mathbb{R}$ and use the GLM package's interface for link functions. While there is no canonical link function for the beta regression model as there is for GLMs, logit is the most common choice.

The precision is transformed by a link function with type L2 <: Link which should map $\mathbb{R} \mapsto \mathbb{R}$ or, ideally, $(0, \infty) \mapsto \mathbb{R}$ because the precision must be positive. The most common choices are the identity, log, and square root links.

BetaRegressionModel(X, y, link=LogitLink(), precisionlink=IdentityLink();
                    weights=nothing, offset=nothing)

Construct a BetaRegressionModel object with the given model matrix X, response y, mean link function link, precision link function precisionlink, and optionally weights and offset. Note that the returned object is not fit until fit! is called on it.


Support for user-provided weights is currently incomplete; passing a value other than nothing or an empty array for weights will result in an error for now.

fit(BetaRegressionModel, formula, data, link=LogitLink(), precisionlink=IdentityLink();

Fit a BetaRegressionModel to the given table data, which may be any Tables.jl-compatible table (e.g. a DataFrame), using the given formula, which can be constructed using @formula. In this method, the response and model matrix are determined from the formula and table. It is also possible to provide them explicitly.

fit(BetaRegressionModel, X::AbstractMatrix, y::AbstractVector, link=LogitLink(),
    precisionlink=IdentityLink(); kwargs...)

Fit a beta regression model using the provided model matrix X and response vector y. In both of these methods, a link function may be provided, otherwise the default logit link is used. Similarly, a link for the precision may be provided, otherwise the default identity link is used.

Keyword Arguments

  • weights: A vector of weights or nothing (default). Currently only nothing is accepted.
  • offset: An offset vector to be added to the linear predictor or nothing (default).
  • maxiter: Maximum number of Fisher scoring iterations to use when fitting. Default is 100.
  • atol: Absolute tolerance to use when checking for model convergence. Default is sqrt(eps(T)) where T is the type of the estimates.
  • rtol: Relative tolerance to use when checking for convergence. Default is the Base default relative tolerance for T.

If you experience convergence issues, you may consider trying a different link for the precision; LogLink() is a common choice. Increasing the maximum number of iterations may also be beneficial, especially when working with Float32.

fit!(b::BetaRegressionModel{T}; maxiter=100, atol=sqrt(eps(T)), rtol=Base.rtoldefault(T))

Fit the given BetaRegressionModel, updating its values in-place. If model convergence is achieved, b is returned, otherwise a ConvergenceException is thrown.

Fitting the model consists of computing the maximum likelihood estimates for the coefficients and precision parameter via Fisher scoring with analytic derivatives. The model is determined to have converged when the score vector, i.e. the vector of first partial derivatives of the log likelihood with respect to the parameters, is approximately zero. This is determined by isapprox using the specified atol and rtol. maxiter dictates the maximum number of Fisher scoring iterations.

Properties of a model


Akaike's Information Criterion, defined as $-2 \log L + 2k$, with $L$ the likelihood of the model, and k its number of consumed degrees of freedom (as returned by dof).


Corrected Akaike's Information Criterion for small sample sizes (Hurvich and Tsai 1989), defined as $-2 \log L + 2k + 2k(k-1)/(n-k-1)$, with $L$ the likelihood of the model, $k$ its number of consumed degrees of freedom (as returned by dof), and $n$ the number of observations (as returned by nobs).


Bayesian Information Criterion, defined as $-2 \log L + k \log n$, with $L$ the likelihood of the model, $k$ its number of consumed degrees of freedom (as returned by dof), and $n$ the number of observations (as returned by nobs).


Return a copy of the vector of regression coefficients $\mathbf{\beta}$.

See also: precision, params


For a BetaRegressionModel fit using a table and @formula, return the names of the coefficients as a vector of strings. The precision term is included as the last element in the array and has name "(Precision)".

coeftable(model::BetaRegressionModel; level=0.95)

Return a table of the point estimates of the model parameters, their respective standard errors, $z$-statistics, Wald $p$-values, and confidence intervals at the given level. The precision parameter is included as the last row in the table.

The object returned by this function implements the Tables.jl interface for tabular data.

confint(model::BetaRegressionModel; level=0.95)

For a model with $p$ regression coefficients, return a $(p + 1) \times 2$ matrix of confidence intervals for the estimated coefficients and precision at the given level.


Compute the deviance of the model, defined as the sum of the squared deviance residuals.

See also: devresid


Compute the signed deviance residuals of the model,

\[\mathrm{sgn}(y_i - \hat{y}_i) \sqrt{2 \lvert \ell(y_i, \hat{\phi}) - \ell(\hat{y}_i, \hat{\phi}) \rvert}\]

where $\ell$ denotes the log likelihood, $y_i$ is the $i$th observed value of the response, $\hat{y}_i$ is the $i$th fitted value, and $\hat{\phi}$ is the estimated common precision parameter.

See also: deviance


Return the number of estimated parameters in the model. For a model with $p$ independent variables, this is $p + 1$, since the precision must also be estimated.


Return the residual degrees of freedom for the model, defined as nobs minus dof.


Return the fitted values of the model.

informationmatrix(model::BetaRegressionModel; expected=true)

Compute the information matrix of the model. By default, this is the Fisher information, i.e. the expected value of the matrix of second partial derivatives of loglikelihood with respect to each element of params. Set expected to false to obtain the observed information.

See also: vcov, score


Return the model's linear predictor, where X is the model matrix and β is the vector of coefficients, or Xβ + offset if the model was fit with an offset.


Return the link function $g$ that links the mean $\mu$ to the linear predictor $\eta$ by $\mu = g^{-1}(\eta)$.

loglikelihood(model::StatisticalModel, observation)

Return the log-likelihood of the model.

With an observation argument, return the contribution of observation to the log-likelihood of model.

If observation is a Colon, return a vector of each observation's contribution to the log-likelihood of the model. In other words, this is the vector of the pointwise log-likelihood contributions.

In general, sum(loglikehood(model, :)) == loglikelihood(model).


Return the model matrix (a.k.a. the design matrix).


Return the effective number of observations used to fit the model. For weighted models, this is the number of nonzero weights, otherwise it's the number of elements of the response (or equivalently, the number of rows in the model matrix).


Return the offset used in the model, i.e. the term added to the linear predictor with known coefficient 1, or nothing if the model was not fit with an offset.


Return the vector of estimated model parameters $\theta = [\beta_1, \ldots, \beta_p, \phi]$, i.e. the regression coefficients and precision.


Mutating this array may invalidate the model object.

See also: coef, precision


Return the estimated precision parameter, $\phi$, for the model. This function returns $\phi$ on the natural scale, not on the precision link scale. This parameter is estimated alongside the regression coefficients and is included in coefficient tables, where it is displayed on the precision link scale.

See also: coef, params


Return the link function $h$ that links the precision $\phi$ to the estimated constant parameter $\theta_{p+1}$ such that $\phi = h^{-1}(\theta_{p+1})$.

predict(model::RegressionModel, [newX])

Form the predicted response of model. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit model; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.


Return the Pearson correlation between the linear predictor $\eta$ and the link-transformed response $g(y)$.


Return the residuals of the model.


Return the model response (a.k.a. the dependent variable).


For a BetaRegressionModel fit using a table and @formula, return a string containing the left hand side of the formula, i.e. the model's response.


Return the standard errors of the estimated model parameters, including both the regression coefficients and the precision.

See also: vcov


Compute the variance-covariance matrix of the model, i.e. the inverse of the Fisher information matrix.

See also: stderror, informationmatrix


Return the weights used in the model.

There is a subtlety here that bears repeating. The function coef does not include the precision term, only the regression coefficients, so for a model with $p$ independent variables, coef will return a vector of length $p$. A number of other functions, such as informationmatrix, vcov, stderror, etc., do include the precision term, and thus will return an array with (non-singleton) dimension $p + 1$. While this difference may seem strange at first blush, the design was chosen intentionally to ensure that the model matrix and regression coefficient vector are conformable for multiplication. Use params to retrieve the full parameter vector with length $p + 1$.

This package employs the system for link functions defined by the GLM.jl package. In short, each link function has its own concrete type which subtypes Link. Some may actually subtype Link01, which is itself a subtype of Link; this denotes that the function's domain is the open unit interval, $(0, 1)$. Link functions are applied with linkfun and their inverse is applied with linkinv. Relevant docstrings from GLM.jl are reproduced below.

Any mention of "the" link function for a BetaRegressionModel refers to that applied to the mean (at least in this document). However, despite only having one linear predictor, BetaRegressionModels actually have two link functions: one for the mean and one for the precision.



An abstract subtype of Link which are links defined on (0, 1)


A Link01 corresponding to the extreme value (or log-Weibull) distribution. The link is the complementary log-log transformation, log(1 - log(-μ)).



The canonical Link for the Normal distribution, defined as η = μ.

Developer documentation

This section documents some functions that are not user facing (and are thus not exported) and may be removed at any time. They're included here for the benefit of anyone looking to contribute to the package and wondering how certain internals work. Other internal functions may be documented with comments in the source code rather than with docstrings; read the source directly for more information on those.

dmueta(link::Link, η)

Return the second derivative of linkinv, $\frac{\partial^2 \mu}{\partial \eta^2}$, of the link function link evaluated at the linear predictor value η. A method of this function must be defined for a particular link function in order to compute the observed information matrix.


Initialize the given BetaRegressionModel by computing starting points for the parameter estimates and return the updated model object. The initial estimates are based on those from a linear regression model with the same model matrix as b but with linkfun.(Link(b), response(b)) as the response.

If the initial estimate of the precision is invalid (not strictly positive) then it is taken instead to be 1 prior to applying the precision link function.