Null distributions of the log-likelihood ratios

The null distribution of an LLR, $p(\mathrm{LLR}\mid{\mathcal H}_{\mathrm{null}})$, may be obtained either by simulation or analytically. Simulation, such as random permutations from real data or the generation of random data from statistics of real data, can deal with a much broader range of scenarios in which analytical expressions are unattainable. However, the drawbacks are obvious: simulation can take hundreds of times longer than analytical methods to reach a satisfiable precision. Here we obtained analytical expressions of $p(\mathrm{LLR}\mid{\mathcal H}_{\mathrm{null}})$ for all the Likelihood ratio tests.

Correlation test

${\mathcal H}_{\mathrm{null}}^{\mathrm{(0)}}=A\quad B$ indicates no correlation between $A$ and $B$. Therefore, we can start from

\[\tilde{B}_i\sim\mathrm{i.i.d\ }N(0,1).\]

In order to simulate the supernormalization step, we normalize $\tilde{B}_i$ into $B_i$ with zero mean and unit variance as:

\[B_i\equiv\frac{\tilde{B}_i-\bar{\tilde{B}}_i}{\sigma_{\tilde{B}}},\hspace{3em}\bar{\tilde{B}}\equiv \frac{1}{n}\sum_{i=1}^n\tilde{B}_i,\hspace{3em}\sigma_{\tilde{B}}^2\equiv\frac{1}{n}\sum_{i=1}^n\left(\tilde{B}_i-\bar{\tilde{B}}\right)^2.\]

Transform the random variables $\{\tilde{B}_i\}$ by defining

\[\begin{aligned} X_1 &\equiv \frac{1}{\sqrt{n}}\sum_{i=1}^nA_i\tilde{B}_i,\\ X_2 &\equiv \frac{1}{\sqrt{n}}\sum_{i=1}^n\tilde{B}_i,\\ X_3 &\equiv \left(\sum_{i=1}^n\tilde{B}_i^2\right)-X_1^2-X_2^2. \end{aligned}\]

Since $\tilde{B}_i\sim\mathrm{i.i.d\ }N(0,1)$, we can easily verify that $X_1,X_2,X_3$ are independent, and

\[X_1\sim N(0,1),\hspace{3em}X_2\sim N(0,1),\hspace{3em}X_3\sim\chi^2(n-2).\]

Expressing the LLR for the correlation test in terms of $X_1,X_2,X_3$ gives

\[\mathrm{LLR}^{\mathrm{(0)}}=-\frac{n}{2}\ln(1-Y),\]

in which

\[Y\equiv\frac{X_1^2}{X_1^2+X_3}\sim\mathrm{Beta}\Bigl(\frac 12,\frac{n-2}2\Bigr)\]

follows the Beta distribution.

We define the distribution ${\mathcal D}(\alpha,\beta)$ as the distribution of a random variable $Z=-\frac 12\ln(1-Y)$ for $Y\sim\mathrm{Beta}(\alpha/2,\beta/2)$, i.e.

\[Z=-\frac{1}{2}\ln(1-Y)\sim{\mathcal D}(\alpha,\beta).\]

The probability density function (PDF) for $Z\sim{\mathcal D}(\alpha,\beta)$ can be derived as: for $z>0$,

\[p(z\mid \alpha,\beta)=\frac{2}{B(\alpha/2,\beta/2)}\left(1-e^{-2z}\right)^{(\alpha/2-1)}e^{-\beta z},\]

and for $z\le0$, $p(z\mid \alpha,\beta)=0$. Here $B(a,b)$ is the Beta function. Therefore the null distribution for the correlation test is simply

\[\mathrm{LLR}^{\mathrm{(0)}}/n\sim{\mathcal D}(1,n-2).\]

Primary linkage test

${\mathcal H}_{\mathrm{null}}^{\mathrm{(1)}}=E\hspace{1.6em}A$ indicates no regulation from $E$ to $A$. Therefore, similarly with the correlation test, we start from $\tilde{A}_i\sim\mathrm{i.i.d\ }N(0,1)$ and normalize them to $A_i$ with zero mean and unit variance.

The expression of $\mathrm{LLR}^{\mathrm{(1)}}$ then becomes:

\[\mathrm{LLR}^{\mathrm{(1)}} = -\frac{n}{2}\ln\left(1-\sum_{j=0}^{n_a}\frac{n_j}{n}\frac{\left(\hat{\tilde{\mu}}_j-\bar{\tilde{A}}\right)^2}{\sigma_{\tilde{A}^2}}\right),\]

where

\[\hat{\tilde{\mu}}_j\equiv\frac{1}{n_j}\sum_{i=1}^n\tilde{A}_i\delta_{E_ij}.\]

For now, assume all possible genotypes are present, i.e. $n_j>0$ for $j=0,\dots,n_a$. Transform $\{\tilde{A}_i\}$ by defining

\[\begin{aligned} X_j &\equiv \sqrt{n_j}\,\hat{\tilde{\mu}}_j,\hspace{9em}\mathrm{for\ }j=0,\dots,n_a,\\ X_{n_a+1} &\equiv \Biggl(\sum_{i=1}^n\tilde{A}_i^2\Biggr)-\Biggl(\sum_{j=0}^{n_a}X_j^2\Biggr). \end{aligned}\]

Then we can similarly verify that $\{X_i\}$ are pairwise independent, and

\[\begin{aligned} X_i &\sim N(0,1),\ \mathrm{for\ }i=0,\dots,n_a,\\ X_{n_a + 1} &\sim \chi^2(n-n_a-1). \end{aligned}\]

Again transform $\{X_i\}$ by defining independent random variables

\[\begin{aligned} Y_1 &\equiv \sum_{j=0}^{n_a}\sqrt{\frac{n_j}{n}}\,X_j\sim N(0,1),\\ Y_2 &\equiv \left(\sum_{j=0}^{n_a}X_j^2\right)-Y_1^2\sim\chi^2(n_a),\\ Y_3 &\equiv X_{n_a+1}\sim\chi^2(n-n_a-1). \end{aligned}\]

Some calculation would reveal

\[\mathrm{LLR}^{\mathrm{(1)}}=-\frac{n}{2}\ln\left(1-\frac{Y_2}{Y_2+Y_3}\right),\]

i.e.

\[\mathrm{LLR}^{\mathrm{(1)}}/n\sim{\mathcal D}(n_a,n-n_a-1).\]

To account for genotypes that do not show up in the samples, define $n_v\equiv\sum_{j\in\{j\mid n_j>0\}}1$ as the number of different genotype values across all samples. Then

\[\mathrm{LLR}^{\mathrm{(1)}}/n\sim{\mathcal D}(n_v-1,n-n_v).\]

Secondary linkage test

Since the null hypotheses and LLRs of primary and secondary tests are identical, $\mathrm{LLR}^{\mathrm{(2)}}$ follows the same null distribution as $\mathrm{LLR}^{\mathrm{(1)}}$.

Conditional independence test

The conditional independence test verifies if $E$ and $B$ are uncorrelated when conditioning on $A$, with ${\mathcal H}_{\mathrm{null}}^{\mathrm{(3)}}=E\rightarrow A\rightarrow B$. For this purpose, we keep $E$ and $A$ intact while randomizing $\tilde{B}_i$ according to $B$'s correlation with $A$:

\[\tilde{B}_i\equiv\hat\rho A_i+\sqrt{1-\hat\rho^2}\,X_i,\hspace{2em}X_i\sim\mathrm{i.i.d\ }N(0,1).\]

Then $\tilde{B}_i$ is normalized to $B_i$ as before. The null distribution of $\mathrm{LLR}^{\mathrm{(3)}}$ can be obtained with similar but more complex computations as before, as

\[\mathrm{LLR}^{\mathrm{(3)}}/n\sim{\mathcal D}(n_v-1,n-n_v-1).\]

Relevance test

The null distribution of $\mathrm{LLR}^{\mathrm{(4)}}$ can be obtained similarly by randomizing $B_i$, as

\[\mathrm{LLR}^{\mathrm{(4)}}/n\sim{\mathcal D}(n_v,n-n_v-1).\]

Pleiotropy test

To compute the null distribution for the controlled test, we start from

\[\tilde{B}_i=\hat{\nu}_{E_i}+\hat{\sigma}_B X_i,\hspace{2em}X_i\sim N(0,1),\]

and then normalize $\tilde{B}_i$ into $B_i$. Some calculation reveals the null distribution as

\[\mathrm{LLR}^{\mathrm{(5)}}/n\sim{\mathcal D}(1,n-n_v-1).\]

Implementation

Null distribution family

The family of distributions $\mathcal{D}$, termed LBeta in the code, is implemented as a continuous, univariate distribution type using the Distributions package:

BioFindr.LBetaType
LBeta(α,β)

The LBeta distribution with parameters $\alpha$ and $\beta$ is defined as the distribution of a random variable

$X=-\frac{1}{2}(\ln(1-Y))$

where $Y\sim\operatorname{Beta}(\alpha/2, \beta/2)$

Its PDF is implemented explicitly from the definition above:

Distributions.pdfFunction
pdf(d,x)

Evaluate the probability density function of an LBeta distribution with support on $x\geq 0$.

Distributions.logpdfFunction
logpdf(d,x)

Evaluate the probability density function of an LBeta distribution.

The CDF is implemented by calling the corresponding functions for a Beta distribution:

Distributions.cdfFunction
cdf(d,x)

Evaluate the cumulative distribution function of an LBeta distribution using its relation to the Beta distribution.

Distributions.ccdfFunction
ccdf(d,x)

Evaluate the complementary cumulative distribution function of an LBeta distribution using its relation to the Beta distribution.

Distributions.logccdfFunction
logccdf(d,x)

Evaluate the log complementary cumulative distribution function of an LBeta distribution using its relation to the Beta distribution.

Test-specific null distributions

To construct a null distribution object for a specific test, use:

BioFindr.nulldistFunction
nulldist(ns,[ng,test])

Return an LBeta distribution that is the null distribution of the log-likelihood ratio for a given BioFindr test with sample size ns and number of genotype groups ng. The input variable test can take the values:

  • :corr - correlation test (test 0)
  • :link - linkage test (test 1/2)
  • :med - mediation test (test 3)
  • :relev - relevance test (test 4)
  • :pleio - pleiotropy test (test 5)

With only one input argument, the null distribution for the correlation test with ns samples is returned. With two input arguments, or with three arguments and test equal to :corr, the null distribution for the correlation test with ns samples is returned and the second argument is ignored.

Convenience functions are provided that wrap around the pdf, ccdf, or logccdf functions:

BioFindr.nullpdfFunction
nullpdf(llr,ns,[ng,test])

Return probability distribution function evaluations for a vector of log-likelihood ratio values llr under the null distribution of the log-likelihood ratio for a given BioFindr test with sample size ns and number of genotype groups ng. The input variable test can take the values:

  • :corr - correlation test (test 0)
  • :link - linkage test (test 1/2)
  • :med - mediation test (test 3)
  • :relev - relevance test (test 4)
  • :pleio - pleiotropy test (test 5)

With two input arguments, the correlation test with ns samples is used. With three input arguments, or with four arguments and test equal to :corr, the correlation test with ns samples is used and the third argument is ignored.

BioFindr.nullpvalFunction
nullpval(llr,ns,[ng,test])

Return p-values for a vector of log-likelihood ratio values llr under the null distribution of the log-likelihood ratio for a given BioFindr test with sample size ns and number of genotype groups ng. The input variable test can take the values:

  • :corr - correlation test (test 0)
  • :link - linkage test (test 1/2)
  • :med - mediation test (test 3)
  • :relev - relevance test (test 4)
  • :pleio - pleiotropy test (test 5)

With two input arguments, the correlation test with ns samples is used. With three input arguments, or with four arguments and test equal to :corr, the correlation test with ns samples is used and the third argument is ignored.

BioFindr.nulllog10pvalFunction
nulllog10pval(llr,ns,[ng,test])

Return negative log10 p-values for a vector of log-likelihood ratio values llr under the null distribution of the log-likelihood ratio for a given BioFindr test with sample size ns and number of genotype groups ng. The input variable test can take the values:

  • :corr - correlation test (test 0)
  • :link - linkage test (test 1/2)
  • :med - mediation test (test 3)
  • :relev - relevance test (test 4)
  • :pleio - pleiotropy test (test 5)

With two input arguments, the correlation test with ns samples is used. With three input arguments, or with four arguments and test equal to :corr, the correlation test with ns samples is used and the third argument is ignored.

Some helper functions to work with LBeta distribution objects:

BioFindr.compute_momentsFunction
compute_moments(d::LBeta)

Compute the first and second moments m1 and m2 of the Beta distribution corresponding to the given LBeta distribution.

BioFindr.test_momentsFunction
test_moments(m1,m2)

Test if the given first and second moments m1 and m2 satisfy the conditions for a valid Beta distribution. (see also the Beta distribution wiki):

``m_1>m_2 \wedge m_2>m_1^2``
BioFindr.fitFunction
fit(LBeta, x)

Fit an LBeta distribution to data x using the method of moments by exploiting its relationship to the Beta distribution.

See https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/beta.jl

Distributions.fit_mleFunction
fit_mle(LBeta, x)

Fit an LBeta distribution to data x using maximum-likelihood estimation by exploiting its relationship to the Beta distribution.

See https://github.com/JuliaStats/Distributions.jl/blob/master/src/univariate/continuous/beta.jl