# Fitting the Model

## Single-thread run

The `BayesianNetworkRegression.jl`

package implements the statistical inference method in Ozminkowski & Solís-Lemus (2022) in its main function `Fit!`

which takes three parameters:

`X`

: data matrix from microbial networks (networked covariates)`y`

: response vector`R`

: dimension of latent variables

Quality of inference will increase with increasing `R`

up to a point, after which it will approximately plateau. The time it takes to fit the model will increase (worse than linearly) with increasing `R`

. For our simulations, `R=7`

worked best.

We will use the data read in the Input Data section:

`X_a`

: vector of adjacency matrices`X_v`

: matrix of vectorized adjacency matrices

The type of input data matrix `X`

will inform the argument `x_transform`

so that:

`x_transform=true`

means that the input matrix`X`

needs to be vectorized (as for`X_a`

),`x_transform=false`

means that the input matrix`X`

has already been vectorized (as for`X_v`

).

To fit the model, we type this in julia:

```
using BayesianNetworkRegression
result = Fit!(X_a, y_a, 5, # we set R=5
nburn=200, nsamples=100, x_transform=true,
num_chains=1,
seed=1234)
```

Note that we are running a very small chain (300 generations: 200 burnin and 100 post burnin). For a real analysis, these number should be much larger (see the Simulation in the manuscript for more details).

The `result`

variable is a `Result`

type which contains five attributes:

`state::Table`

: Table with all the sampled parameters for all generations.`rhatξ::Table`

: Table with convergence information ($\hat{R}$ values) for the parameters $\xi$ representing whether specific nodes are influential or not.`rhatγ::Table`

: Table with convergence information ($\hat{R}$ values) for the parameters $\gamma$ representing the regression coefficients.`burn_in::Int`

: number of burnin samples.`sampled::Int`

: number of post burnin samples.

Each of these objects can be accessed with a `.`

, for example, `results.state`

will produce the table with all the samples.

We use the convergence criteria proposed in Vehtari et al (2019). Values close to 1 indicate convergence. Vehtari et al. suggest using a cutoff of $\hat{R} < 1.01$ to indicate convergence. Values are provided for all $\xi$ (in `rhatξ`

) and $\gamma$ (in `rhatγ`

) variables.

These attributes are not readily interpretable, and thus, they can be summarized with the `Summary`

function:

`out = Summary(result)`

Note that the `show`

function for a `Results`

object is already calling `Summary(result)`

internally, and thus, if you simply call:

`result`

you observe the same output as when calling

`out = Summary(result)`

The `out`

object is now a `BNRSummary`

object with two main data frames:

- DataFrame with edge coefficient point estimates and endpoints of credible intervals (default 95%):

```
julia> out.edge_coef
435×5 DataFrame
Row │ node1 node2 estimate lower_bound upper_bound
│ Int64 Int64 Float64 Float64 Float64
─────┼──────────────────────────────────────────────────
1 │ 1 2 2.385 -1.976 6.157
2 │ 1 3 1.823 -5.257 7.595
3 │ 1 4 1.763 -4.514 6.009
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮
433 │ 28 29 3.51 -1.251 7.376
434 │ 28 30 1.949 -3.425 6.135
435 │ 29 30 2.785 -0.772 6.042
429 rows omitted
```

- DataFrame with probabilities of being influencial for each node

```
julia> out.prob_nodes
30×1 DataFrame
Row │ probability
│ Float64
─────┼─────────────
1 │ 0.86
2 │ 0.92
3 │ 0.78
⋮ │ ⋮
28 │ 0.93
29 │ 0.89
30 │ 0.87
24 rows omitted
```

The `BNRSummary`

object also keeps the level for the credible interval, set at 95% by default:

```
julia> out.ci_level
95
```

The level can be changed when running the `Summary`

function:

`out2=Summary(result,interval=97)`

We will use these summary data frames in the Interpretation section.

Note that we ran the case when we need to transform the data matrix. If we already have the adjacency matrices vectorized, we simply need to set `x_transform=false`

:

```
result2 = Fit!(X_v, y_v, 5, # we set R=5
nburn=200, nsamples=100, x_transform=false,
num_chains=1,
seed=1234)
```

## Multi-thread run

You can run multiple chains in parallel by setting up multiple processors as shown next.

```
using Distributed
addprocs(2)
@everywhere begin
using BayesianNetworkRegression,CSV,DataFrames,StaticArrays
matrix_networks = joinpath(dirname(pathof(BayesianNetworkRegression)), "..","examples","matrix_networks.csv")
data_in = DataFrame(CSV.File(matrix_networks))
X = Matrix(data_in[:,names(data_in,Not("y"))])
y = Vector{Float64}(data_in[:,:y])
end
```

Compared to the single-thread run, we only need to change `num_chains=2`

:

```
result3 = Fit!(X, y, 5,
nburn=200, nsamples=100, x_transform=false,
num_chains=2,
seed=1234
)
```

## Error reporting

Please report any bugs and errors by opening an issue.