# Bayesian network learning

BioFindr implements the method described in the paper High-dimensional bayesian network inference from systems genetics data using genetic node ordering^{[Wang2017]} to learn a Bayesian network using a dataframe of posterior probabilities as prior edge weights.

## DAG reconstruction

The first step in the algorithm is to convert a dataframe of posterior probabilities to a directed acyclic graph (DAG). [BioFindr][1] implements the original greedy algorithm where edges are added one by one in descending order of posterior probability and edges that would introduce a cycle in the `dagfindr_greedy_edges!`

function. Two additional methods `dagfindr_greedy_insertion!`

and `dagfindr_heuristic_sort!`

developed by Kenneth Stoop and Pieter Audenaert in this paper are also implemented. The `dagfindr!`

is the main user interface function.

`BioFindr.dagfindr!`

— Function`dagfindr!(dP::T; method="greedy edges") where T<:AbstractDataFrame`

Convert a DataFrame `dP`

of `findr`

results (list of edges) to a directed acyclic graph (DAG) using the specified `method`

. The output is a directed graph represented as a `SimpleDiGraph`

from the `Graphs`

package. The `method`

can be any of

`"greedy edges"`

(default), see`dag_findr_greedy_edges`

.`"heuristic sort"`

, see`dag_findr_heuristic_sort`

.`"greedy insertion"`

, see`dag_findr_greedy_insertion`

.

`BioFindr.dagfindr_greedy_edges!`

— Function`dagfindr_greedy_edges!(dP::T) where T<:AbstractDataFrame`

Convert a DataFrame of `dP`

of `findr`

results (list of edges) to a directed acyclic graph (DAG) represented as a `SimpleDiGraph`

from the `Graphs`

package. This function implements the method of Wang et al. (2019) where edges are added one-by-one in decreasing order of probability, and only if they do not create a cycle in the graph, using the incremental cycle detection algorithm from the `Graphs`

package.

`BioFindr.dagfindr_heuristic_sort!`

— Function`dagfindr_heuristic_sort!(dP::T) where T<:AbstractDataFrame`

Convert a DataFrame of `dP`

of `findr`

results (list of edges) to a directed acyclic graph (DAG) represented as a `SimpleDiGraph`

from the `Graphs`

package. This function implements the heuristic sort method of Stoop et al. (2023) where vertices are sorted by their ratio of out-degree to in-degree, and edges are added only if their source vertex precedes their target vertex in the sorted list. The output is a directed graph and a dictionary to map vertex names to numbers.

`BioFindr.dagfindr_greedy_insertion!`

— Function`dagfindr_greedy_insertion(dP::T) where T<:AbstractDataFrame`

Convert a DataFrame of `dP`

of `findr`

results (list of edges) to a directed acyclic graph (DAG) represented as a `SimpleDiGraph`

from the `Graphs`

package. This function implements the greedy insertion method of Stoop et al. (2023) where vertices are sorted iteratively by inserting vertices in the position in the current ordering yields the maximum possible gain of edge weights, where the gain is counted as the difference between the sum of new edges weight included and the sum of old edge weights lost, where edges are counted only if their source vertex precedes their target vertex in the ordering. The output is a directed graph and a dictionary to map vertex names to numbers.

`BioFindr.greedy_insertions!`

— Function`greedy_insertions!(sorted_vertices, weights)`

TBW

`BioFindr.edge_weights`

— Function`edge_weights(dP::T) where T<:AbstractDataFrame`

TBW

`BioFindr.names_to_index!`

— Function`names_to_index!(dP::T) where T<:AbstractDataFrame`

Add columns with vertex numbers to a DataFrame `dP`

of edges. The columns `Source_idx`

and `Target_idx`

are added to `dP`

with the vertex numbers corresponding to the names in the `Source`

and `Target`

columns, respectively. The function returns a dictionary `name2num`

to map vertex names to numbers.

- Wang2019Wang L, Audenaert P, Michoel T (2019) High-dimensional bayesian network inference from systems genetics data using genetic node ordering. Frontiers in Genetics, Special Topic Machine Learning and Network-Driven Integrative Genomics, 10, 1196.