BioFindr.jl Documentation

This is the documentation for BioFindr.jl, an implementation of the Findr software in Julia.

The methods implemented in BioFindr were developed by Lingfei Wang and Tom Michoel, and were first described in the paper "Efficient and accurate causal inference with hidden confounders from genome-transcriptome variation data".[Wang2017]

This documentation copies both the structure and (most of) the contents of the Materials and methods section of the original paper, directly linking the mathematical description of the method to the documentation and source code of its implementation. Maybe this is how methods papers in computational biology should be written in the first place?

If you haven't used BioFindr before, the BioFindrTutorials website may be useful.

If you are familiar with the original Findr software, you should be aware that BioFindr.jl is not a literal translation. In particular, pay attention to the following differences:

  • The main findr interface function for Causal inference takes as input expression and genotype matrices or DataFrames, and a list or DataFrame of pairs to match eQTLs with a subset of genes. This avoids the need to reshape the gene expression data every time findr is called with a different set of eQTLs.

  • Input and output are structured by columns, that is, in the gene expression and genotype data, columns are genes or SNPs and rows are samples, and in the posterior probability matrices, each column contains the probabilities of a causal relation from the gene corresponding to that column to all other genes. This is the opposite of the original software where variables corresponded to rows. This is to boost performance as Julia stores arrays in column-major format.

  • If you call findr with DataFrame inputs (where columns naturally correspond to variables, that is, genes or SNPs), you need not worry about remembering the role of rows and columns in the output, as the output is returned in the form of a DataFrame with Source, Target, Probability, and q-value columns.

  • You can pass a desired global FDR value for filtering inferred associations as a parameter when calling findr, no more need for manual post-processing of the output.

  • Estimation of the observed distribution of log-likelihood ratios uses either a new, parametric method of moments, or a kernel density estimation method, replacing the previous histogram-based method.

Table of contents