Bayesian network learning

BioFindr implements the method described in the paper High-dimensional bayesian network inference from systems genetics data using genetic node ordering[Wang2017] to learn a Bayesian network using a dataframe of posterior probabilities as prior edge weights.

DAG reconstruction

The first step in the algorithm is to convert a dataframe of posterior probabilities to a directed acyclic graph (DAG). [BioFindr][1] implements the original greedy algorithm where edges are added one by one in descending order of posterior probability and edges that would introduce a cycle in the dagfindr_greedy_edges! function. Two additional methods dagfindr_greedy_insertion! and dagfindr_heuristic_sort! developed by Kenneth Stoop and Pieter Audenaert in this paper are also implemented. The dagfindr! is the main user interface function.

BioFindr.dagfindr_greedy_edges!Function
dagfindr_greedy_edges!(dP::T) where T<:AbstractDataFrame

Convert a DataFrame of dP of findr results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph from the Graphs package. This function implements the method of Wang et al. (2019) where edges are added one-by-one in decreasing order of probability, and only if they do not create a cycle in the graph, using the incremental cycle detection algorithm from the Graphs package.

BioFindr.dagfindr_heuristic_sort!Function
dagfindr_heuristic_sort!(dP::T) where T<:AbstractDataFrame

Convert a DataFrame of dP of findr results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph from the Graphs package. This function implements the heuristic sort method of Stoop et al. (2023) where vertices are sorted by their ratio of out-degree to in-degree, and edges are added only if their source vertex precedes their target vertex in the sorted list. The output is a directed graph and a dictionary to map vertex names to numbers.

BioFindr.dagfindr_greedy_insertion!Function
dagfindr_greedy_insertion(dP::T) where T<:AbstractDataFrame

Convert a DataFrame of dP of findr results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph from the Graphs package. This function implements the greedy insertion method of Stoop et al. (2023) where vertices are sorted iteratively by inserting vertices in the position in the current ordering yields the maximum possible gain of edge weights, where the gain is counted as the difference between the sum of new edges weight included and the sum of old edge weights lost, where edges are counted only if their source vertex precedes their target vertex in the ordering. The output is a directed graph and a dictionary to map vertex names to numbers.

BioFindr.names_to_index!Function
names_to_index!(dP::T) where T<:AbstractDataFrame

Add columns with vertex numbers to a DataFrame dP of edges. The columns Source_idx and Target_idx are added to dP with the vertex numbers corresponding to the names in the Source and Target columns, respectively. The function returns a dictionary name2num to map vertex names to numbers.