General inference algorithm

Consider a set of observations sampled from a mixture distribution of a null and an alternative hypothesis. For instance in gene regulation, every observation can correspond to expression levels of a pair of genes wich are sampled from a bivariate normal distribution with zero (null hypothesis) or non-zero (alternative hypothesis) correlation coefficient. In BioFindr, we predict the probability that any sample follows the alternative hypothesis with the following algorithm (based on and modified from ^[Chen2007]):

1. For robustness against outliers, we convert every continuous variable into standard normally distributed $N(0,1)$ values using a rank-based inverse normal transformation across all samples. We name this step as supernormalization.

BioFindr.supernormalize — Function

supernormalize(X[, c])

Convert each column of matrix or dataframe X of reals into standard normally distributed values using a rank-based inverse normal transformation. Then scale each column to have variance one.

Note that after the inverse normal transformation, each column has mean zero and identical variance (if we use ordinal ranking). Hence rescaling can be done once on the whole matrix.

The formula and default value for the paramater c come from this paper

2. We propose a null and an alternative hypothesis for all Likelihood ratio tests of interest where, by definition, the null hypothesis space is a subset of the alternative hypothesis. Model parameters are replaced with their maximum likelihood estimators (MLEs) to obtain the log likelihood ratio (LLR) between the alternative and null hypotheses.

3. We derive the analytical expression for the probablity density function (PDF) of the Null distributions of the log-likelihood ratios when samples follow the null hypothesis.

4. We convert LLRs using Bayesian inference of posterior probabilities of the hypothesis of interest with empirical estimation of local false discovery rate.

5. We consider multiple Tests to evaluate, consisting of combinations of the basic Likelihood ratio tests, for common tasks in genome-wide studies:

Chen2007Chen L, Emmert-Streib F, Storey J. Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol 8, R219 (2007).