# Tests to evaluate

Based on the six Likelihood ratio tests, we use the following tests and test combinations for the inference of genetic regulations:

## Coexpression analysis

The correlation test is introduced as a benchmark, against which we can compare other methods involving genotype information. Pairwise correlation is a simple measure for the probability of two genes being functionally related either through direct or indirect regulation, or through coregulation by a third factor. Bayesian inference additionally considers different gene roles. Its predicted posterior probability for regulation is $P_0$.

Correlation analysis can be performed by calling `findr`

with one argument, a matrix or dataframe of gene expression values:

`BioFindr.findr`

— Method`findr(X::Matrix{T}; cols=[], method="moments", combination="none") where T<:AbstractFloat`

Compute posterior probabilities for nonzero pairwise correlations between columns of input matrix `X`

. The probabilities are directed (asymmetric) in the sense that they are estimated from a column-specific background distribution.

The optional parameter `cols`

(vector of integers) determines whether we consider all columns of `X`

as source nodes (`cols=[]`

, default), or only a subset of columns determined by the indices in the vector `cols`

.

The optional parameter `method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

The optional parameter `combination`

determines whether the output must be symmetrized. Possible values are `none`

(default), `prod`

, `mean`

, or `anti`

. If the optional parameter `cols`

is non-empty, symmetrization makes no sense and an error will be thrown unless `combination="none"`

.

See also `findr(::DataFrame)`

, `symprobs`

, `supernormalize`

, `pprob_col`

.

`BioFindr.findr`

— Method`findr(dX::T; colnames=[], method="moments", FDR=1.0, sorted=true, combination="none") where T<:AbstractDataFrame`

Wrapper for `findr(Matrix(dX))`

when the input `dX`

is in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`

, `Target`

, (Posterior) `Probability`

, and `qvalue`

columns.

The optional parameter `colnames`

(vector of strings) determines whether we consider all columns of `dX`

as source nodes (`colnames=[]`

, default), or only a subset of columns determined by the variable names in the vector `colnames`

.

The optional parameter `method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

The optional parameter `FDR`

can be used to return only a subset of interactions with a desired expected FDR value (q-value threshold) (default 1.0, no filtering).

The optional parameter `sorted`

determines if the output must be sorted by increasing q-value / decreasing posterior probability (`sorted=true`

, the default) or by causal factor (column names of `dX`

) (`sorted=false`

)

The optional parameter `combination`

determines whether the output must be symmetrized. Possible values are `none`

(default), `prod`

, `mean`

, or `anti`

. If the optional parameter `colnames`

is non-empty, symmetrization makes no sense and an error will be thrown unless `combination="none"`

.

See also `findr(::Matrix)`

, `symprobs`

, `stackprobs`

, `globalfdr!`

.

## Association analysis

The secondary linkage test is introduced to test association between genetic variants and gene expression levels, and can be used more generally to analyze differential expression of genes across groups defined by any kind of categorical variable. Its predicted posterior probability for differential expression is $P_2$.

Association analysis can be performed by calling `findr`

with two arguments, matrices or dataframes of continuous gene expression values and categorical genotype or more general grouping values, respectively:

`BioFindr.findr`

— Method`findr(X::Matrix{T},G::Array{S}; method="moments") where {T<:AbstractFloat, S<:Integer}`

Compute posterior probabilities for nonzero differential expression of colunns of input matrix `X`

across groups defined by one or more categorical variables (columns of `G`

).

Return a matrix of size ncols(X) x ncols(G)

The optional parameter `method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

See also `findr(::DataFrame,::DataFrame)`

, `supernormalize`

, `pprob_col`

.

`G`

is currently assumed to be an array (vector or matrix) of integers. CategoricalArrays will be supported in the future.

`BioFindr.findr`

— Method`findr(dX::T, dG::T; method="moments", FDR=1.0, sorted=true) where T<:AbstractDataFrame`

Wrapper for `findr(Matrix(dX), Matrix(dG))`

when the inputs `dX`

and `dG`

are in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`

, `Target`

(Posterior) `Probability`

, and `qvalue`

columns.

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

The optional parameter `FDR`

can be used to return only a subset of interactions with a desired expected FDR value (q-value threshold) (default 1.0, no filtering).

The optional parameter `sorted`

determines if the output must be sorted by increasing q-value / decreasing posterior probability (`sorted=true`

, the default) or by causal factor (column names of `dX`

) (`sorted=false`

)

Note that depending on the type of `Matrix(dG)`

, different matrix-based methods are called. If `Matrix(dG)`

consists of Floats, posterior probabilities for nonzero pairwise correlations between the variables in `dG`

and variables in `dX`

are computed. If `Matrix(dG)`

consists of integers, posterior probabilities for nonzero differential expression of variables in `dX`

across groups defined by the variables in `dG`

are computed

See also [`findr(::Matrix,::Array)`

], `stackprobs`

, `globalfdr!`

.

## Causal inference

### Mediation

The traditional causal inference test, as explained in ^{[Chen2007]}, suggested that the regulatory relation $E\to A\to B$can be confirmed with the combination of three separate tests: $E$ regulates $A$, $E$ regulates $B$, and $E$ only regulates $B$ through $A$ (i.e. $E$ and $B$ become independent when conditioning on $A$). They correspond to the primary, secondary, and conditional independence tests respectively. The regulatory relation $E\to A\to B$ is regarded positive only when all three tests return positive. The three tests filter the initial hypothesis space of all possible relations between $E$, $A$, and $B$, sequentially to $E\to A$ (primary test), $E\to A \wedge E\to B$ (secondary test), and $E\to A\to B \wedge$ (no confounder for $A$ and $B$) (conditional independence test). The resulting test is stronger than $E\to A\to B$ by disallowing confounders for A and B. So its probability can be broken down as

\[P_{\text{med}} \equiv P_1P_2P_3\]

BioFindr expects a set of significant eQTLs and their associated genes as input, and therefore $P_1=1$ is assured and not calculated separately in BioFindr. Note that $P_{\text{med}}$ is the estimated local precision, i.e. the probability that tests 2 and 3 are both true. Correspondinly, its local FDR (the probability that one of them is false) is $1-P_{\text{med}}$.

### Instrumental variables

The pleiotropy test is introduced to test if an $E\to B$ association is not independent of the $E\to A$ association, that is, if an independent pleiotropic effect of $E$ on both genes can be excluded. If $E$ regulates $A$ (is a cis-eQTL for $A$), and $E$ regulates $B$, and $B$ and $A$ are not independent given $E$, then we can regard $E$ as a proxy or instrumental variable for $A$ and infer a regulatory relation $E\to A\to B$ from the $E\to B$ association. The three tests verify the hypothesis that $B \leftarrow E \to A \wedge \lnot(A ⫫ B | E)$, a superset of $E\to A\to B$. Its probability can be broken down as

\[P_{\text{IV}} \equiv P_1P_2P_5\]

As before, $P_1=1$ is assured and not calculated separately in BioFindr. $P_{\text{IV}}$ is again the estimated local precision, i.e. the probability that tests 2 and 5 are both true, and its local FDR (the probability that one of them is false) is $1-P_{\text{IV}}$.

### Relevance

The relevance test is introduced to address weak interactions that are undetectable by the secondary test from existing data ($P_2$ close to 0). This term still grants higher-than-null significance to weak interactions, and verifies that $E\to A \wedge (E\to B \vee A - B)$, also a superset of $E\to A\to B$. Its probability can be broken down as

\[P_{\text{relev}} \equiv P_1P_4\]

The original Findr paper proposed to combine the instrumental variable and relevance tests in a novel test whose probability can be broken down as

\[P_{\text{orig}} \equiv \frac{1}{2} P_1 \bigl( P_4 + P_2P_5) = \frac{1}{2}\bigl( P_{\text{relev}} + P_{\text{IV}} \bigr)\]

In the extreme undetectable limit where $P_2 = 0$ but $P_4 \neq 0$, the novel test automatically reduces to one half of the relevance test, which assumes equal probability of either direction and assigns half of the relevance test probability to $A \to B$.

The composite design of the novel test aims not to miss any genuine regulation whilst distinguishing the full spectrum of possible interactions. When the signal level is too weak for tests 2 and 5, we expect $P_4$ to still provide distinguishing power better than random predictions. When the interaction is strong, $P_2 P_5$ is then able to pick up true targets regardless of the existence of hidden confounders.

### Implementation

Causal inference can be performed by calling `findr`

with three arguments, matrices or dataframes of gene expression and genotype values, and a mapping of matching $(E,A)$ pairs; the preferred test can be set through the `combination`

parameter:

`BioFindr.findr`

— Method`findr(X::Matrix{T},G::Matrix{S},pairGX::Matrix{S}; method="moments", combination="none") where {T<:AbstractFloat, S<:Integer}`

Compute posterior probabilities for nonzero causal relations between columns of input matrix `X`

. The probabilities are estimated for relations going from a subset of columns of `X`

that have a (discrete) instrumental variable in input matrix `G`

to all columns of `X`

, while excluding self-interactions (given default value 1). The matching between columns of `X`

and columns of `G`

is given by `pairGX`

, a two-column array where the first column corresponds to a column index in `G`

and the second to a column index in `X`

.

Posterior probabilities are computed for the following tests

- Test 2 (
**Linkage test**) - Test 3 (
**Mediation test**) - Test 4 (
**Relevance test**) - Test 5 (
**Pleiotropy test**)

which can be combined into the mediation test ($P_2 P_3$; `combination="mediation"`

), the instrumental variable or non-independence test ($P_2 P_5$; `combination="IV"`

), or BioFindr's original combination ($\frac{1}{2}(P_2 P_5 + P_4)$; `combination="orig"`

). By default, individual probability matrices for all tests are returned (`combination="none"`

).

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

If `combination="none"`

, then the output has size ncols(X) x 4 x ncols(G), where the middle index indexes the tests, and otherwise the output has size ncols(X) x ncols(G).

See also `findr(::DataFrame,::DataFrame,::DataFrame)`

, `supernormalize`

, `pprob_col`

, `combineprobs`

.

`G`

is currently assumed to be an array (vector or matrix) of integers. I intend to use CategoricalArrays in the future.

`BioFindr.findr`

— Method`findr(dX::T, dG::T, dE::T; colX=2, colG=1, method="moments", combination="IV", FDR=1.0, sorted=true) where T<:AbstractDataFrame`

Wrapper for `findr(Matrix(dX), Matrix(dG), pairGX)`

when the inputs are in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`

, `Target`

(Posterior) `Probability`

, and `qvalue`

columns. When DataFrames are used, only combined posterior probabilities can be returned (`combination="IV"`

(default), `"mediation"`

, or `"orig"`

).

The input dataframes are:

`dX`

- DataFrame with expression data, columns are genes`dG`

- DataFrame with genotype data, columns are variants (SNPs)`dE`

- DataFrame with eQTL results, must contain columns with gene and SNP IDs that can be mapped to column names in`dX`

and`dG`

, respectively

The numeric mapping between column indices in `Matrix(dG)`

and `Matrix(dX)`

is obtained from these inputs using the `getpairs`

function and the optional parameters:

`colG`

- name or number of variant ID column in`dE`

, default 1`colX`

- name or number of gene ID column in`dE`

, default 2`namesX`

- names of a possible subset of columns in`dX`

to be considered as potential causal regulators (default`names(dX)`

)

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

The optional parameter `FDR`

can be used to return only a subset of interactions with a desired expected FDR value (q-value threshold) (default 1.0, no filtering).

The optional parameter `sorted`

determines if the output must be sorted by increasing q-value / decreasing posterior probability (`sorted=true`

, the default) or by causal factor (column names of `dX`

) (`sorted=false`

)

See also `findr(::Matrix,::Array,::Matrix)`

, `getpairs`

, `combineprobs`

, `stackprobs`

, `globalfdr!`

.

## Bipartite causal inference

In the general case, we assume that there is one set of genes, of which the set of $A$-genes (genes with matching instrument $E$) is a subset, and that all possible directed regulations are tested. In some situations we are instead searching for a bipartite network from one set of potential causal factors (e.g. micro-RNAs) to another set of potential targets (e.g. protein-coding genes). In this case, causal inference can be performed by calling `findr`

with four arguments that include separate matrices or dataframes of expression values for the potential causes and targets:

`BioFindr.findr`

— Method`findr(X1::Matrix{T},X2::Array{T},G::Array{S},pairGX::Matrix{R}; method="moments", combination="none") where {T<:AbstractFloat, S<:Integer}`

Compute posterior probabilities for nonzero causal relations from columns of input matrix `X2`

to columns of input matrix `X1`

. The probabilities are estimated for a subset of columns of `X2`

that have a (discrete) instrumental variable in input matrix `G`

. The matching between columns of `X2`

and columns of `G`

is given by `pairGX`

, a two-column array where the first column corresponds to a column index in `G`

and the second to a column index in `X2`

.

Posterior probabilities are computed for the following tests

- Test 2 (
**Linkage test**) - Test 3 (
**Mediation test**) - Test 4 (
**Relevance test**) - Test 5 (
**Pleiotropy test**)

which can be combined into the mediation test ($P_2 P_3$; `combination="mediation"`

), the instrumental variable or non-independence test ($P_2 P_5$; `combination="IV"`

), or BioFindr's original combination ($\frac{1}{2}(P_2 P_5 + P_4)$; `combination="orig"`

). By default, individual probability matrices for all tests are returned (`combination="none"`

).

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

If `combination="none"`

, then the output has size ncols(X1) x 4 x ncols(X2), where the middle index indexes the tests, and otherwise the output has size ncols(X1) x ncols(X2).

See also `findr(::DataFrame,::DataFrame,::DataFrame,::DataFrame)`

, `combineprobs`

, `supernormalize`

, `pprob_col`

`G`

is currently assumed to be an array (vector or matrix) of integers. I intend to use CategoricalArrays in the future.

`BioFindr.findr`

— Method`findr(dX1::T, dX2::T, dG::T, dE::T; colG=1, colX=2, method="moments", combination="IV", FDR=1.0, sorted=true) where T<:AbstractDataFrame`

Wrapper for `findr(Matrix(dX1), Matrix(dX2), Matrix(dG), pairGX2)`

when the inputs `dX1`

, `dX2`

, and `dG`

are in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`

, `Target`

, (Posterior) `Probability`

, and `qvalue`

columns. When DataFrames are used, only combined posterior probabilities can be returned (`combination="IV"`

(default), `"mediation"`

, or `"orig"`

).

The numeric mapping between column indices in `Matrix(dG)`

and `Matrix(dX2)`

is obtained from these inputs using the `getpairs`

function and the optional parameters:

`colG`

- name or number of variant ID column in`dE`

, default 1`colX`

- name or number of gene ID column in`dE`

, default 2`namesX`

- names of a possible subset of columns in`dX`

to be considered as potential causal regulators (default`names(dX)`

)

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

`FDR`

can be used to return only a subset of interactions with a desired expected FDR value (q-value threshold) (default 1.0, no filtering).

The optional parameter `sorted`

determines if the output must be sorted by increasing q-value / decreasing posterior probability (`sorted=true`

, the default) or by causal factor (column names of `dX2`

) (`sorted=false`

)

See also `findr(::Matrix,::Array,::Array,::Matrix)`

, `combineprobs`

, `stackprobs`

, `globalfdr!`

.

## Summary

A summary of all possible calls to the `findr`

function:

`BioFindr.findr`

— Function`findr(X::Matrix{T}; cols=[], method="moments", combination="none") where T<:AbstractFloat`

Compute posterior probabilities for nonzero pairwise correlations between columns of input matrix `X`

. The probabilities are directed (asymmetric) in the sense that they are estimated from a column-specific background distribution.

The optional parameter `cols`

(vector of integers) determines whether we consider all columns of `X`

as source nodes (`cols=[]`

, default), or only a subset of columns determined by the indices in the vector `cols`

.

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

The optional parameter `combination`

determines whether the output must be symmetrized. Possible values are `none`

(default), `prod`

, `mean`

, or `anti`

. If the optional parameter `cols`

is non-empty, symmetrization makes no sense and an error will be thrown unless `combination="none"`

.

See also `findr(::DataFrame)`

, `symprobs`

, `supernormalize`

, `pprob_col`

.

`findr(dX::T; colnames=[], method="moments", FDR=1.0, sorted=true, combination="none") where T<:AbstractDataFrame`

Wrapper for `findr(Matrix(dX))`

when the input `dX`

is in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`

, `Target`

, (Posterior) `Probability`

, and `qvalue`

columns.

The optional parameter `colnames`

(vector of strings) determines whether we consider all columns of `dX`

as source nodes (`colnames=[]`

, default), or only a subset of columns determined by the variable names in the vector `colnames`

.

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

`FDR`

can be used to return only a subset of interactions with a desired expected FDR value (q-value threshold) (default 1.0, no filtering).

`sorted`

determines if the output must be sorted by increasing q-value / decreasing posterior probability (`sorted=true`

, the default) or by causal factor (column names of `dX`

) (`sorted=false`

)

The optional parameter `combination`

determines whether the output must be symmetrized. Possible values are `none`

(default), `prod`

, `mean`

, or `anti`

. If the optional parameter `colnames`

is non-empty, symmetrization makes no sense and an error will be thrown unless `combination="none"`

.

See also `findr(::Matrix)`

, `symprobs`

, `stackprobs`

, `globalfdr!`

.

`findr(X1::Matrix{T}, X2::Matrix{T}; method="moments") where T<:AbstractFloat`

Compute posterior probabilities for nonzero pairwise correlations between columns of input matrices `X1`

and `X2`

. The probabilities are directed (asymmetric) from the columns of `X2`

to the columns of `X1`

in the sense that they are estimated from a column-specific background distribution for each column of `X2`

.

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

Only use this method if `X1`

and `X2`

are distinct (no overlapping columns). For `X2`

consisting of a subset of columns with indices `idx`

, use `findr(X1; cols=idx)`

instead.

See also `findr(::DataFrame)`

, `symprobs`

, `supernormalize`

, `pprob_col`

.

`findr(X::Matrix{T},G::Array{S}; method="moments") where {T<:AbstractFloat, S<:Integer}`

Compute posterior probabilities for nonzero differential expression of colunns of input matrix `X`

across groups defined by one or more categorical variables (columns of `G`

).

Return a matrix of size ncols(X) x ncols(G)

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

See also `findr(::DataFrame,::DataFrame)`

, `supernormalize`

, `pprob_col`

.

`G`

is currently assumed to be an array (vector or matrix) of integers. CategoricalArrays will be supported in the future.

`findr(dX::T, dG::T; method="moments", FDR=1.0, sorted=true) where T<:AbstractDataFrame`

Wrapper for `findr(Matrix(dX), Matrix(dG))`

when the inputs `dX`

and `dG`

are in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`

, `Target`

(Posterior) `Probability`

, and `qvalue`

columns.

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

`FDR`

can be used to return only a subset of interactions with a desired expected FDR value (q-value threshold) (default 1.0, no filtering).

`sorted`

determines if the output must be sorted by increasing q-value / decreasing posterior probability (`sorted=true`

, the default) or by causal factor (column names of `dX`

) (`sorted=false`

)

Note that depending on the type of `Matrix(dG)`

, different matrix-based methods are called. If `Matrix(dG)`

consists of Floats, posterior probabilities for nonzero pairwise correlations between the variables in `dG`

and variables in `dX`

are computed. If `Matrix(dG)`

consists of integers, posterior probabilities for nonzero differential expression of variables in `dX`

across groups defined by the variables in `dG`

are computed

See also [`findr(::Matrix,::Array)`

], `stackprobs`

, `globalfdr!`

.

`findr(X::Matrix{T},G::Matrix{S},pairGX::Matrix{S}; method="moments", combination="none") where {T<:AbstractFloat, S<:Integer}`

Compute posterior probabilities for nonzero causal relations between columns of input matrix `X`

. The probabilities are estimated for relations going from a subset of columns of `X`

that have a (discrete) instrumental variable in input matrix `G`

to all columns of `X`

, while excluding self-interactions (given default value 1). The matching between columns of `X`

and columns of `G`

is given by `pairGX`

, a two-column array where the first column corresponds to a column index in `G`

and the second to a column index in `X`

.

Posterior probabilities are computed for the following tests

- Test 2 (
**Linkage test**) - Test 3 (
**Mediation test**) - Test 4 (
**Relevance test**) - Test 5 (
**Pleiotropy test**)

which can be combined into the mediation test ($P_2 P_3$; `combination="mediation"`

), the instrumental variable or non-independence test ($P_2 P_5$; `combination="IV"`

), or BioFindr's original combination ($\frac{1}{2}(P_2 P_5 + P_4)$; `combination="orig"`

). By default, individual probability matrices for all tests are returned (`combination="none"`

).

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

If `combination="none"`

, then the output has size ncols(X) x 4 x ncols(G), where the middle index indexes the tests, and otherwise the output has size ncols(X) x ncols(G).

See also `findr(::DataFrame,::DataFrame,::DataFrame)`

, `supernormalize`

, `pprob_col`

, `combineprobs`

.

`G`

is currently assumed to be an array (vector or matrix) of integers. I intend to use CategoricalArrays in the future.

`findr(dX::T, dG::T, dE::T; colX=2, colG=1, method="moments", combination="IV", FDR=1.0, sorted=true) where T<:AbstractDataFrame`

Wrapper for `findr(Matrix(dX), Matrix(dG), pairGX)`

when the inputs are in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`

, `Target`

(Posterior) `Probability`

, and `qvalue`

columns. When DataFrames are used, only combined posterior probabilities can be returned (`combination="IV"`

(default), `"mediation"`

, or `"orig"`

).

The input dataframes are:

`dX`

- DataFrame with expression data, columns are genes`dG`

- DataFrame with genotype data, columns are variants (SNPs)`dE`

- DataFrame with eQTL results, must contain columns with gene and SNP IDs that can be mapped to column names in`dX`

and`dG`

, respectively

The numeric mapping between column indices in `Matrix(dG)`

and `Matrix(dX)`

is obtained from these inputs using the `getpairs`

function and the optional parameters:

`colG`

- name or number of variant ID column in`dE`

, default 1`colX`

- name or number of gene ID column in`dE`

, default 2`namesX`

- names of a possible subset of columns in`dX`

to be considered as potential causal regulators (default`names(dX)`

)

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

`FDR`

can be used to return only a subset of interactions with a desired expected FDR value (q-value threshold) (default 1.0, no filtering).

`sorted`

determines if the output must be sorted by increasing q-value / decreasing posterior probability (`sorted=true`

, the default) or by causal factor (column names of `dX`

) (`sorted=false`

)

See also `findr(::Matrix,::Array,::Matrix)`

, `getpairs`

, `combineprobs`

, `stackprobs`

, `globalfdr!`

.

`findr(X1::Matrix{T},X2::Array{T},G::Array{S},pairGX::Matrix{R}; method="moments", combination="none") where {T<:AbstractFloat, S<:Integer}`

Compute posterior probabilities for nonzero causal relations from columns of input matrix `X2`

to columns of input matrix `X1`

. The probabilities are estimated for a subset of columns of `X2`

that have a (discrete) instrumental variable in input matrix `G`

. The matching between columns of `X2`

and columns of `G`

is given by `pairGX`

, a two-column array where the first column corresponds to a column index in `G`

and the second to a column index in `X2`

.

Posterior probabilities are computed for the following tests

- Test 2 (
**Linkage test**) - Test 3 (
**Mediation test**) - Test 4 (
**Relevance test**) - Test 5 (
**Pleiotropy test**)

`combination="mediation"`

), the instrumental variable or non-independence test ($P_2 P_5$; `combination="IV"`

), or BioFindr's original combination ($\frac{1}{2}(P_2 P_5 + P_4)$; `combination="orig"`

). By default, individual probability matrices for all tests are returned (`combination="none"`

).

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

If `combination="none"`

, then the output has size ncols(X1) x 4 x ncols(X2), where the middle index indexes the tests, and otherwise the output has size ncols(X1) x ncols(X2).

See also `findr(::DataFrame,::DataFrame,::DataFrame,::DataFrame)`

, `combineprobs`

, `supernormalize`

, `pprob_col`

`G`

is currently assumed to be an array (vector or matrix) of integers. I intend to use CategoricalArrays in the future.

`findr(dX1::T, dX2::T, dG::T, dE::T; colG=1, colX=2, method="moments", combination="IV", FDR=1.0, sorted=true) where T<:AbstractDataFrame`

Wrapper for `findr(Matrix(dX1), Matrix(dX2), Matrix(dG), pairGX2)`

when the inputs `dX1`

, `dX2`

, and `dG`

are in the form of a DataFrame. The output is then also wrapped in a DataFrame with `Source`

, `Target`

, (Posterior) `Probability`

, and `qvalue`

columns. When DataFrames are used, only combined posterior probabilities can be returned (`combination="IV"`

(default), `"mediation"`

, or `"orig"`

).

The numeric mapping between column indices in `Matrix(dG)`

and `Matrix(dX2)`

is obtained from these inputs using the `getpairs`

function and the optional parameters:

`colG`

- name or number of variant ID column in`dE`

, default 1`colX`

- name or number of gene ID column in`dE`

, default 2`namesX`

- names of a possible subset of columns in`dX`

to be considered as potential causal regulators (default`names(dX)`

)

`method`

determines the LLR mixture distribution fitting method and can be either `moments`

(default) for the method of moments, or `kde`

for kernel-based density estimation.

`FDR`

can be used to return only a subset of interactions with a desired expected FDR value (q-value threshold) (default 1.0, no filtering).

The optional parameter `sorted`

determines if the output must be sorted by increasing q-value / decreasing posterior probability (`sorted=true`

, the default) or by causal factor (column names of `dX2`

) (`sorted=false`

)

See also `findr(::Matrix,::Array,::Array,::Matrix)`

, `combineprobs`

, `stackprobs`

, `globalfdr!`

.

- Chen2007Chen L, Emmert-Streib F, Storey J. Harnessing naturally randomized transcription to infer regulatory relationships among genes. Genome Biol 8, R219 (2007).