Bayesian network learning
BioFindr implements the method described in the paper High-dimensional bayesian network inference from systems genetics data using genetic node ordering[Wang2017] to learn a Bayesian network using a dataframe of posterior probabilities as prior edge weights.
DAG reconstruction
The first step in the algorithm is to convert a dataframe of posterior probabilities to a directed acyclic graph (DAG). [BioFindr][1] implements the original greedy algorithm where edges are added one by one in descending order of posterior probability and edges that would introduce a cycle in the dagfindr_greedy_edges!
function. Two additional methods dagfindr_greedy_insertion!
and dagfindr_heuristic_sort!
developed by Kenneth Stoop and Pieter Audenaert in this paper are also implemented. The dagfindr!
is the main user interface function.
BioFindr.dagfindr!
— Functiondagfindr!(dP::T; method="greedy edges") where T<:AbstractDataFrame
Convert a DataFrame dP
of findr
results (list of edges) to a directed acyclic graph (DAG) using the specified method
. The output is a directed graph represented as a SimpleDiGraph
from the Graphs
package. The method
can be any of
"greedy edges"
(default), seedag_findr_greedy_edges
."heuristic sort"
, seedag_findr_heuristic_sort
."greedy insertion"
, seedag_findr_greedy_insertion
.
BioFindr.dagfindr_greedy_edges!
— Functiondagfindr_greedy_edges!(dP::T) where T<:AbstractDataFrame
Convert a DataFrame of dP
of findr
results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph
from the Graphs
package. This function implements the method of Wang et al. (2019) where edges are added one-by-one in decreasing order of probability, and only if they do not create a cycle in the graph, using the incremental cycle detection algorithm from the Graphs
package.
BioFindr.dagfindr_heuristic_sort!
— Functiondagfindr_heuristic_sort!(dP::T) where T<:AbstractDataFrame
Convert a DataFrame of dP
of findr
results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph
from the Graphs
package. This function implements the heuristic sort method of Stoop et al. (2023) where vertices are sorted by their ratio of out-degree to in-degree, and edges are added only if their source vertex precedes their target vertex in the sorted list. The output is a directed graph and a dictionary to map vertex names to numbers.
BioFindr.dagfindr_greedy_insertion!
— Functiondagfindr_greedy_insertion(dP::T) where T<:AbstractDataFrame
Convert a DataFrame of dP
of findr
results (list of edges) to a directed acyclic graph (DAG) represented as a SimpleDiGraph
from the Graphs
package. This function implements the greedy insertion method of Stoop et al. (2023) where vertices are sorted iteratively by inserting vertices in the position in the current ordering yields the maximum possible gain of edge weights, where the gain is counted as the difference between the sum of new edges weight included and the sum of old edge weights lost, where edges are counted only if their source vertex precedes their target vertex in the ordering. The output is a directed graph and a dictionary to map vertex names to numbers.
BioFindr.greedy_insertions!
— Functiongreedy_insertions!(sorted_vertices, weights)
TBW
BioFindr.edge_weights
— Functionedge_weights(dP::T) where T<:AbstractDataFrame
TBW
BioFindr.names_to_index!
— Functionnames_to_index!(dP::T) where T<:AbstractDataFrame
Add columns with vertex numbers to a DataFrame dP
of edges. The columns Source_idx
and Target_idx
are added to dP
with the vertex numbers corresponding to the names in the Source
and Target
columns, respectively. The function returns a dictionary name2num
to map vertex names to numbers.
- Wang2019Wang L, Audenaert P, Michoel T (2019) High-dimensional bayesian network inference from systems genetics data using genetic node ordering. Frontiers in Genetics, Special Topic Machine Learning and Network-Driven Integrative Genomics, 10, 1196.