General Interface

Understanding the interface

CRRao exports the fit function, which is used to train all types of models supported by the package. As of now, the function supports the following signatures.

fit(formula, data, modelClass)
fit(formula, data, modelClass, link)
fit(formula, data, modelClass, prior)
fit(formula, data, modelClass, link, prior)

It should be noted that not all model classes support every type of signature. The parameters passed above mean the following.

  1. The parameter formula must be a formula of type StatsModels.FormulaTerm. Any formula has an LHS and an RHS. The LHS represents the response variable, and the RHS represents the independent variables.

  2. The parameter data must be a DataFrame. This variable represents the dataset on which the model must be trained.

  3. modelClass represents the type of the statistical model to be used. Currently, CRRao supports four regression models, and the type of modelClass must be one of the following:

  4. Certain model classes (like Logistic Regression) support link functions; this is represented by the link parameter. Currently four link functions are supported: Logit, Probit, Cloglog and Cauchit. So, the type of link must be one of the following:

  5. CRRao also supports Bayesian models, and the priors to be can be specified while calling fit. Currently CRRao supports six different kinds of priors, and the type of the prior parameter must be one of the following.

Model Classes and Data Models

CRRao.LinearRegressionType
LinearRegression

Type representing the Linear Regression model class.

\[y =\alpha + X \beta+ \varepsilon,\]

where

\[\varepsilon \sim N(0,\sigma^2),\]

  • $y$ is the response vector of size $n$,
  • $X$ is the matrix of predictor variable of size $n \times p$,
  • $n$ is the sample size, and $p$ is the number of predictors,
  • $\alpha$ is the intercept of the model,
  • $\beta$ is the regression coefficients of the model, and
  • $\sigma$ is the standard deviation of the noise $\varepsilon$.
CRRao.LogisticRegressionType
LogisticRegression

Type representing the Logistic Regression model class.

\[y_i \sim Bernoulli(p_i), \]

where $i=1,2,\cdots,n, 0 < p_i < 1$,

  • $\mathbb{E}(y_i)=p_i$,
  • $\mathbb{P}(y_i=1) = p_i$ and $\mathbb{P}(y_i=0) = 1-p_i$, such that

\[\mathbb{E}(y_i)= p_i =g(\alpha +\mathbf{x}_i^T\beta),\]

  • $g(.)$ is the link-function,
  • $y_i$ is the $i^{th}$ element of the response vector $y$,
  • $\mathbf{x}_i=(x_{i1},x_{i2},\cdots,x_{in})$ is the $i^{th}$ row of the design matix of size $n \times p$,
  • $\alpha$ is the intercept of the model, and
  • $\beta$ is the regression coefficients of the model.
CRRao.NegBinomRegressionType
NegBinomRegression

Type representing the Negative Binomial Regression model class.

\[y_i \sim NegativeBinomial(\mu_i,\phi), i=1,2,\cdots,n\]

where

\[\mu_i = \exp(\alpha +\mathbf{x}_i^T\beta),\]

  • $y_i$ is the $i^{th}$ element of the response vector $y$,
  • $\mathbf{x}=(x_{i1},x_{i2},\cdots,x_{in})$ is the $i^{th}$ row of the design matix of size $n \times p$,
  • $\alpha$ is the intercept of the model, and
  • $\beta$ is the regression coefficients of the model.
CRRao.PoissonRegressionType
PoissonRegression

Type representing the Poisson Regression model class.

\[y_i \sim Poisson(\lambda_i), i=1,2,\cdots,n\]

where

\[\lambda_i = \exp(\alpha +\mathbf{x}_i^T\beta),\]

  • $y_i$ is the $i^{th}$ element of the response vector $y$,
  • $\mathbf{x}=(x_{i1},x_{i2},\cdots,x_{in})$ is the $i^{th}$ row of the design matix of size $n \times p$,
  • $\alpha$ is the intercept of the model, and
  • $\beta$ is the regression coefficients of the model.
CRRao.CRRaoLinkType
CRRaoLink

Abstract type representing link functions which are used to dispatch to appropriate calls.

CRRao.LogitType
Logit <: CRRaoLink

A type representing the Logit link function, which is defined by the formula

\[z\mapsto \dfrac{1}{1 + \exp(-z)}\]

CRRao.ProbitType
Probit <: CRRaoLink

A type representing the Probit link function, which is defined by the formula

\[z\mapsto \mathbb{P}[Z\le z]\]

where $Z\sim \text{Normal}(0, 1)$.

CRRao.CloglogType
Cloglog <: CRRaoLink

A type representing the Cloglog link function, which is defined by the formula

\[z\mapsto 1 - \exp(-\exp(z))\]

CRRao.CauchitType
Cauchit <: CRRaoLink

A type representing the Cauchit link function, which is defined by the formula

\[z\mapsto \dfrac{1}{2} + \dfrac{\text{atan}(z)}{\pi}\]

Prior Distributions

CRRao.Prior_GaussType
Prior_Gauss

Type representing the Gaussian Prior. Users have specific prior mean and standard deviation, for $\alpha$ and $\beta$ for linear regression model.

Prior model

\[\sigma \sim InverseGamma(a_0,b_0),\]

\[\alpha | \sigma,v \sim Normal(\alpha_0,\sigma_{\alpha_0}),\]

\[\beta | \sigma,v \sim Normal_p(\beta_0,\sigma_{\beta_0}),\]

Likelihood or data model

\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]

\[y_i \sim N(\mu_i,\sigma),\]

Note: $N()$ is Gaussian distribution of $y_i$, where

  • $\mathbf{E}(y_i)=g(\mu_i)$, and
  • $Var(y_i)=\sigma^2$.
CRRao.Prior_RidgeType
Prior_Ridge

Type representing the Ridge Prior.

Prior model

\[v \sim InverseGamma(h,h),\]

\[\sigma \sim InverseGamma(a_0,b_0),\]

\[\alpha | \sigma,v \sim Normal(0,v*\sigma),\]

\[\beta | \sigma,v \sim Normal_p(0,v*\sigma),\]

Likelihood or data model

\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]

\[y_i \sim D(\mu_i,\sigma),\]

Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where

  • $\mathbf{E}(y_i)=g(\mu_i)$, and
  • $Var(y_i)=\sigma^2$.
CRRao.Prior_LaplaceType
Prior_Laplace

Type representing the Laplace Prior.

Prior model

\[v \sim InverseGamma(h,h),\]

\[\sigma \sim InverseGamma(a_0,b_0),\]

\[\alpha | \sigma,v \sim Laplace(0,v*\sigma),\]

\[\beta | \sigma,v \sim Laplace(0,v*\sigma),\]

Likelihood or data model

\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]

\[y_i \sim D(\mu_i,\sigma),\]

Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where

  • $\mathbf{E}(y_i)=g(\mu_i)$, and
  • $Var(y_i)=\sigma^2$.
CRRao.Prior_CauchyType
Prior_Cauchy

Type representing the Cauchy Prior.

Prior model

\[\sigma \sim Half-Cauchy(0,1),\]

\[\alpha | \sigma \sim Cauchy(0,\sigma),\]

\[\beta | \sigma \sim Cauchy(0,v*\sigma),\]

Likelihood or data model

\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]

\[y_i \sim D(\mu_i,\sigma),\]

Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where

  • $\mathbf{E}(y_i)=g(\mu_i)$, and
  • $Var(y_i)=\sigma^2$.
CRRao.Prior_TDistType
Prior_TDist

Type representing the T-Distributed Prior.

Prior model

\[v \sim InverseGamma(h,h),\]

\[\sigma \sim InverseGamma(a_0,b_0),\]

\[\alpha | \sigma,v \sim \sigma t(v),\]

\[\beta | \sigma,v \sim \sigma t(v),\]

Likelihood or data model

\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]

\[y_i \sim D(\mu_i,\sigma),\]

Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where

  • $\mathbf{E}(y_i)=g(\mu_i)$, and
  • $Var(y_i)=\sigma^2$.
  • The $t(v)$ is $t$ distribution with $v$ degrees of freedom.
CRRao.Prior_HorseShoeType
Prior_HorseShoe

Type representing the HorseShoe Prior.

Prior model

\[\tau \sim HalfCauchy(0,1),\]

\[\lambda_j \sim HalfCauchy(0,1), j=1,2,\cdots,p\]

\[\sigma \sim HalfCauchy(0,1),\]

\[\alpha | \sigma,\tau \sim N(0,\tau *\sigma),\]

\[\beta_j | \sigma,\lambda_j ,\tau \sim Normal(0,\lambda_j *\tau *\sigma),\]

Likelihood or data model

\[\mu_i= \alpha + \mathbf{x}_i^T\beta\]

\[y_i \sim D(\mu_i,\sigma), i=1,2,\cdots,n\]

Note: $D()$ is appropriate distribution of $y_i$ based on the modelClass, where

  • $\mathbf{E}(y_i)=g(\mu_i)$,
  • $Var(y_i)=\sigma^2$, and
  • $\beta$=($\beta_1,\beta_2,\cdots,\beta_p$)

Setting Random Number Generators

CRRao.set_rngFunction
set_rng(rng)

Set the random number generator. This is useful if you want to work with reproducible results. rng must be a random number generator.

Example

using StableRNGs
CRRao.set_rng(StableRNG(1234))