Docstrings · BayesNets.jl

BayesNets.BDeuPrior — Type

Assigns equal scores to Markov equivalent structures

α_ijk = x/{q_i * r_i} for each j, k and some given x

see DMU section 2.4.3

BayesNets.BayesNetSampler — Type

Abstract type for sampling with:

Random.rand(BayesNet, BayesNetSampler)
Random.rand(BayesNet, BayesNetSampler, nsamples)
Random.rand!(Assignment, BayesNet, BayesNetSampler)

BayesNets.DirectSampler — Type

Straightforward sampling from a BayesNet. The default sampler.

BayesNets.DirichletPrior — Type

Baysian Structure learning seeks to maximize P(G|D) In the Bayesian fashion, we can provide a prior over the parameters in our learning network. This is described using a Dirichlet Prior.

BayesNets.DiscreteBayesNet — Type

DiscreteBayesNets are Bayesian Networks where every variable is an integer within 1:Nᵢ and every distribution is Categorical.

This representation is very common, and allows for the use of factors, for example in Probabilistic Graphical Models by Koller and Friedman

BayesNets.ExactInference — Type

Exact inference using factors and variable eliminations

BayesNets.Factor — Type

Factor(bn, name, evidence=Assignment())

Create a factor for a node, given some evidence.

BayesNets.Factor — Type

Factor(dims, potential)

Create a Factor corresponding to the potential.

BayesNets.Factor — Method

Factor(dims, lengths, fill_value=0)

Create a factor with dimensions dims, each with lengths corresponding to lengths. fill_value will fill the potential array with that value. To keep uninitialized, use fill_value=nothing.

BayesNets.GibbsSampler — Type

The GibbsSampler type houses the parameters of the Gibbs sampling algorithm. The parameters are defined below:

burnin: The first burnin samples will be discarded. They will not be returned. The thinning parameter does not affect the burn in period. This is used to ensure that the Gibbs sampler converges to the target stationary distribution before actual samples are drawn.

thinning: For every thinning + 1 number of samples drawn, only the last is kept. Thinning is used to reduce autocorrelation between samples. Thinning is not used during the burn in period. e.g. If thinning is 1, samples will be drawn in groups of two and only the second sample will be in the output.

timelimit: The number of milliseconds to run the algorithm. The algorithm will return the samples it has collected when either nsamples samples have been collected or timelimit milliseconds have passed. If time_limit is null then the algorithm will run until nsamples have been collected. This means it is possible that zero samples are returned.

erroriftimeout: If erroriftimeout is true and the timelimit expires, an error will be raised. If erroriftimeout is false and the time limit expires, the samples that have been collected so far will be returned. This means it is possible that zero samples are returned. Burn in samples will not be returned. If time_limit is null, this parameter does nothing.

consistent_with: the assignment that all samples must be consistent with (ie, Assignment(:A=>1) means all samples must have :A=1). Use to sample conditional distributions.

maxcachesize: If null, cache as much as possible, otherwise cache at most "maxcachesize" distributions

variableorder: variableorder determines the order of variables changed when generating a new sample. If null use a random order for every sample (this is different from updating the variables at random). Otherwise should be a list containing all the variables in the order they should be updated.

initial_sample: The inital assignment to variables to use. If null, the initial sample is chosen by briefly using a LikelihoodWeightedSampler.

BayesNets.GibbsSamplerState — Type

Used to cache various things the Gibbs sampler needs

BayesNets.GibbsSamplingFull — Type

infer(im, inf)

Run Gibbs sampling for N iterations. Each iteration changes all nodes. Discareds first burn_in samples and keeps only the thin-th sample. Ex, if thin=3, will discard the first two samples and keep the third.

BayesNets.GibbsSamplingNodewise — Type

infer(GibbsSampling, state::Assignment, InferenceState)

Run Gibbs sampling for N iterations. Each iteration changes one node.

Discareds first burn_in samples and keeps only the thin-th sample. Ex, if thin=3, will discard the first two samples and keep the third.

BayesNets.K2GraphSearch — Type

K2GraphSearch

A GraphSearchStrategy following the K2 algorithm. Takes polynomial time to find the optimal structure assuming a topological variable ordering.

BayesNets.LikelihoodWeightedSampler — Type

Likelihood Weighted Sampling

BayesNets.LikelihoodWeightingInference — Type

Approximates p(query|evidence) with N weighted samples using likelihood weighted sampling

BayesNets.LoopyBelief — Type

Loopy belief propogation for a network.

Early stopping if change is messages < tol for `itersforconvergence' iterations. For no stopping, use tol < 0.

BayesNets.NegativeBayesianInformationCriterion — Type

NegativeBayesianInformationCriterion

A ScoringFunction for the negative Bayesian information criterion.

BIC = -2⋅L + k⋅ln(n)

   L - the log likelihood of the data under the cpd
   k - the number of free parameters to be estimated
   n - the sample size

BayesNets.RejectionSampler — Type

Rejection Sampling in which the assignments are forced to be consistent with the provided values. Each sampler is attempted at most max_nsamples times before returning an empty assignment.

BayesNets.ScoreComponentCache — Type

ScoreComponentCache

Used to store scores in a priority queue such that graph search algorithms know when a particular construction has already been made. cacheⱼ for the ith variable with parents parents

BayesNets.ScoreComponentCache — Method

ScoreComponentCache(data::DataFrame)

Construct an empty ScoreComponentCache the size of ncol(data)

BayesNets.ScoringFunction — Type

ScoringFunction

An abstract type for which subtypes allow extracting CPD score components, which are to be maximized: score_component(::ScoringFunction, cpd::CPD, data::DataFrame)

BayesNets.UniformPrior — Type

A uniform Dirichlet prior such that all α are the same

Defaults to the popular K2 prior, α = 1, which is similar to Laplace Smoothing

https://en.wikipedia.org/wiki/Additive_smoothing

Base.:* — Method

Table multiplication

Base.Broadcast.broadcast! — Method

broadcast!(f, ϕ, dims, values)

Broadcast a vector (or array of vectors) across the dimension(s) dims Each vector in values will be broadcast acroos its respective dimension in dims

See Base.broadcast for more info.

Base.Broadcast.broadcast — Method

broadcast(f, ϕ, dims, values)

Broadcast a vector (or array of vectors) across the dimension(s) dims Each vector in values will be broadcast acroos its respective dimension in dims

See Base.broadcast for more info.

Base.Sort.partialsort — Method

Given a Table, extract the rows which match the given assignment

Base.convert — Method

Convert a Factor to a DataFrame

Base.convert — Method

convert(DiscreteCPD, cpd)

Construct a Factor from a DiscreteCPD.

Base.count — Method

Base.count(bn::BayesNet, name::NodeName, data::DataFrame)

returns a table containing all observed assignments and their corresponding counts

Base.delete! — Method

delete!(bn::BayesNets, target::NodeName)

Removing cpds will alter the vertex indeces. In particular, removing the ith cpd will swap i and n and then remove n.

Base.eltype — Method

Returns Float64

Base.getindex — Method

getindex(ϕ, a)

Get values with dimensions consistent with an assignment. Colons select entire dimension.

Base.in — Method

in(dim, ϕ) -> Bool

Return true if dim is in the Factor ϕ

Base.indexin — Method

indexin(dims, ϕ)

Return the index of dimension dim in ϕ, or 0 if not in ϕ.

Base.join — Function

join(op, ϕ1, ϕ2, :outer, [v0])
join(op, ϕ1, ϕ2, :inner, [reducehow], [v0])

Performs either an inner or outer join,

An outer join returns a Factor with the union of the two dimensions The two factors are combined with Base.broadcast(op, ...)

An inner join keeps the dimensions in common between the two Factors. The extra dimensions are reduced with reducedim(reducehow, ...) and then the two factors are combined with: op(ϕ1[commondims].potential, ϕ2[commondims].potential)

Base.length — Method

Total number of elements in Factor (potential)

Base.names — Method

Returns the ordered list of NodeNames

Base.names — Method

Names of each dimension

Base.push! — Method

Appends a new dimension to a Factor

Base.rand — Method

Generates a DataFrame containing a dataset of variable assignments. Always return a DataFrame with nsamples rows.

Base.rand — Method

Returns an assignment sampled from the bn using the provided sampler

Base.rand — Method

Implements Gibbs sampling. (https://en.wikipedia.org/wiki/Gibbs_sampling) For finite variables, the posterior distribution is sampled by building the exact distribution. For continuous variables, the posterior distribution is sampled using Metropolis Hastings MCMC. Discrete variables with infinite support are currently not supported. The Gibbs Sampler only supports CPDs that return Univariate Distributions. (CPD{D<:UnivariateDistribution})

Sampling requires a GibbsSampler object which contains the parameters for Gibbs sampling. See the GibbsSampler documentation for parameter details.

Base.similar — Method

similar(ϕ)

Return a factor similar to ϕ with unitialized values

Base.size — Method

size(ϕ, [dims...])

Returns a tuple of the dimensions of ϕ

Base.write — Method

write(io, text/plain, bn)

Writes a text file containing the sufficient statistics for a discrete Bayesian network. This was inspired by the format listed in Appendix A of "Correlated Encounter Model for Cooperative Aircraft in the National Airspace System Version 1.0" by Mykel Kochenderfer.

The text file contains the following parameters:

variable labels: A space-delimited list specifies the variable labels, which are symbols. The ordering of the variables in this list determines the ordering of the variables in the other tables. Note that the ordering of the variable labels is not necessarily topological.
graphical structure: A binary matrix is used to represent the graphical structure of the Bayesian network. A 1 in the ith row and jth column means that there is a directed edge from the ith varible to the jth variable in the Bayesian network. The ordering of the variables are as defined in the variable labels section of the file. The entries are 0 or 1 and are not delimited.
variable instantiations: A list of integers specifying the number of instantiations for each variable. The list is space-delimited.
sufficient statistics: A list of space-delimited integers Pₐⱼₖ which specifies the sufficient statistics. The array is ordered first by increasing k, then increasing j, then increasing i. The variable ordering is defined in the variable labels section of the file. The list is a flattened matrices, where each matrix is rₐ × qₐ where rₐ is the number of instantiations of variable a and qₐ is the number of instantiations of the parents of variable a. The ordering is the same as the ordering of the distributions vector in the CategoricalCPD type. The entires in Pₐⱼₖ are floating point probability values.

For example, the network Success -> Forecast with Success ∈ [1, 2] and P(1) = 0.2, P(2) = 0.8 and Forecast ∈ [1, 2, 3] with P(1 | 1) = 0.4, P(2 | 1) = 0.4, P(3 | 1) = 0.2 P(1 | 2) = 0.1, P(2 | 2) = 0.3, P(3 | 2) = 0.6

Is output as:

Success Forecast 01 00 2 3 2 4 4 1 3

BayesNets.CPDs.ProbabilisticGraphicalModels.infer — Method

Approximates p(query|evidence) with nsamples likelihood weighted samples.

Since this uses a Factor, it is only efficient if the number of samples is (signifcantly) greater than the number of possible instantiations for the query variables

BayesNets.CPDs.ProbabilisticGraphicalModels.is_independent — Method

Returns whether the set of node names x is d-separated from the set y given the set given

BayesNets.CPDs.ProbabilisticGraphicalModels.markov_blanket — Method

Return the children, parents, and parents of children (excluding target) as a Set of NodeNames

BayesNets.CPDs.parents — Method

Returns the parents as a list of NodeNames

BayesNets._evidence_lambda — Method

Get the lambda-message to itself for an evidence node. If it isn't an evidence node, this will break

BayesNets._get_parent_indeces — Method

score_component(a::ScoringFunction, cpd::CPD, data::DataFrame, cache::ScoreComponentCache)

As score_component(ScoringFunction, cpd, data), but returns pre-computed values from the cache if they exist, and populates the cache if they don't

BayesNets._init_gibbs_sample — Function

_init_gibbs_sample(bn, evidence)

A random sample of non-evidence nodes uniformly over their domain

BayesNets.bayesian_score — Function

bayesian_score(G::DAG, names::Vector{Symbol}, data::DataFrame[, ncategories::Vector{Int}[, prior::DirichletPrior]])

Compute the bayesian score for graph structure g, with the data in data. names containes a symbol corresponding to each vertex in g that is the name of a column in data. ncategories is a vector of the number of values that each variable in the Bayesian network can take.

Note that every entry in data must be an integer greater than 0

BayesNets.bayesian_score_component — Method

Computes the Bayesian score component for the given target variable index and Dirichlet prior counts given in alpha

INPUT: i - index of the target variable parents - list of indeces of parent variables (should not contain self) r - list of instantiation counts accessed by variable index r[1] gives number of discrete states variable 1 can take on data - matrix of sufficient statistics / counts d[j,k] gives the number of times the target variable took on its kth instantiation given the jth parental instantiation

OUTPUT: the Bayesian score, Float64

BayesNets.children — Method

Returns the children as a list of NodeNames

BayesNets.duplicate — Method

duplicate(A, dims)

Repeates an array only through higer dimensions dims.

Custom version of repeate, but only outer repetition, and only duplicates the array for the number of times specified in dims for dimensions greater than ndims(A). If dims is empty, returns a copy of A.

julia> duplicate(collect(1:3), (2,))
3×2 Array{Int64,2}:
 1  1
 2  2
 3  3

julia> duplicate([1 3; 2 4], (3,))
2×2×3 Array{Int64,3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 1  3
 2  4

[:, :, 3] =
 1  3
 2  4

BayesNets.eval_mb_cpd — Method

eval_mb_cpd(node, ncategories, assignment, mb_cpds)

Return the potential of all instances of a node given its markove blanket as a WeightVec: P(node | panode) * Prod (c in children) P(c | pac)

Trys out all possible values of node (assumes categorical) Assignment should have values for all in the Markov blanket, including the variable itself.

BayesNets.get_asia_bn — Method

An ergodic version of the asia network, with the E variable removed

Orignal network: Lauritzen, Steffen L. and David J. Spiegelhalter, 1988

BayesNets.get_finite_distribution! — Method

Helper to sampleposteriorfinite

Modifies a and gss

BayesNets.get_mb_cpds — Method

Get the cpd's of a node and its children

BayesNets.get_sat_fail_bn — Method

Satellite failure network from DMU, pg 17

BayesNets.get_sprinkler_bn — Method

The usual sprinkler problem

BayesNets.get_weighted_dataframe — Method

A dataset of variable assignments is obtained with an additional column of weights in accordance with the likelihood of each assignment.

BayesNets.get_weighted_sample! — Method

Draw an assignment from the Bayesian network but set any variables in the evidence accordingly. Returns the assignment and the probability weighting associated with the evidence.

BayesNets.gibbs_sample — Method

bn:: A Bayesian Network to sample from. bn should only contain CPDs that return UnivariateDistributions.

nsamples: The number of samples to return.

consistent_with: the assignment that all samples must be consistent with (ie, Assignment(:A=>1) means all samples must have :A=1). Use to sample conditional distributions.

maxcachesize: If null, cache as much as possible, otherwise cache at most "maxcachesize" distributions

initialsample: The inital assignment to variables to use. If null, the initial sample is chosen by briefly running rand(bn, getweighted_dataframe).

BayesNets.gibbs_sample_main_loop — Method

The main loop associated with Gibbs sampling Returns a data frame with nsamples samples

Supports the various parameters supported by gibbssample Refer to gibbssample for parameter meanings

BayesNets.ndgrid_fill! — Method

???

BayesNets.pattern — Method

pattern(ϕ, [dims])

Return an array with the pattern of each dimension's state for all possible instances

BayesNets.rand_bn_inference — Function

rand_bn_inference(bn, num_query=2, num_evidence=3)

Generate a random inference state for a Bayesian Network with an evidence assignment sample uniformly over the chosen nodes' domain.

BayesNets.rand_cpd — Function

rand_cpd(bn::DiscreteBayesNet, ncategories::Int, target::NodeName, parents::NodeNames=NodeName[])

Return a CategoricalCPD with the given number of categories with random categorical distributions

BayesNets.rand_discrete_bn — Function

rand_discrete_bn(num_nodes16, max_num_parents=3,
        max_num_states=5, connected=true)

Generate a random DiscreteBayesNet.

Creates DiscreteBayesNet with num_nodes nodes, with each node having a random number of states and parents, up to max_num_parents and max_num_parents, respectively. If connected, each node (except the first) will be guaranteed at least one parent, making the graph connected.

BayesNets.readxdsl — Method

readxdsl( filename::AbstractString )

Return a DiscreteBayesNet read from the xdsl file

BayesNets.reducedim — Function

reducedim(op, ϕ, dims, [v0])

Reduce dimensions dims in ϕ using function op.

BayesNets.sample_posterior! — Method

set a[varname] ~ P(varname | not varname)

Modifies a and caches in gss

BayesNets.sample_posterior_continuous! — Method

Implements Metropolis-Hastings with a normal distribution proposal with mean equal to the previous value of the variable "varname" and stddev equal to 10 times the standard deviation of the distribution of the target variable given its parents ( var_distribution should be get(bn, varname)(a) )

MH will go through nsamples iterations. If no proposal is accepted, the original value will remain

This function expects that a[varname] is within the support of the distribution, it will not check to make sure this is true

Helper to sample_posterior Should only be used to sampling continuous distributions

set a[varname] ~ P(varname | not varname)

Modifies a and caches in gss

BayesNets.sample_posterior_finite! — Method

Helper to sample_posterior Should only be called if the variable associated with varname is discrete

set a[varname] ~ P(varname | not varname)

Modifies both a and gss

BayesNets.sample_weighted_dataframe! — Method

Chooses a sample at random from a weighted dataframe

BayesNets.score_component — Method

score_component(a::ScoringFunction, cpd::CPD, data::DataFrame)

Extract a Float64 score for a cpd given the data. One seeks to maximize the score.

BayesNets.score_components — Method

score_components(a::ScoringFunction, cpd::CPD, data::DataFrame)
score_components(a::ScoringFunction, cpds::Vector{CPD}, data::DataFrame, cache::ScoreComponentCache)

Get a list of score components for all cpds

BayesNets.statistics — Method

statistics(
    targetind::Int,
    parents::AbstractVector{Int},
    ncategories::AbstractVector{Int},
    data::AbstractMatrix{Int}
    )

outputs a sufficient statistics table for the target variable that is r × q where r = ncategories[i] is the number of variable instantiations and q is the number of parental instantiations of variable i

The r-values are ordered from 1 → ncategories[i] The q-values are ordered in the same ordering as ind2sub() in Julia Base Thus the instantiation of the first parent (by order given in parents[i]) is varied the fastest.

ex: Variable 1 has parents 2 and 3, with r₁ = 2, r₂ = 2, r₃ = 3 q for variable 1 is q = r₂×r₃ = 6 N will be a 6×2 matrix where: N[1,1] is the number of time v₁ = 1, v₂ = 1, v₃ = 1 N[2,1] is the number of time v₁ = 1, v₂ = 2, v₃ = 1 N[3,1] is the number of time v₁ = 1, v₂ = 1, v₃ = 2 N[4,1] is the number of time v₁ = 1, v₂ = 2, v₃ = 2 N[5,1] is the number of time v₁ = 1, v₂ = 1, v₃ = 3 N[6,1] is the number of time v₁ = 1, v₂ = 2, v₃ = 3 N[6,2] is the number of time v₁ = 2, v₂ = 1, v₃ = 1 ...

BayesNets.statistics — Method

statistics(
    parent_list::Vector{Vector{Int}},
    ncategories::AbstractVector{Int},
    data::AbstractMatrix{Int},
    )

Computes sufficient statistics from a discrete dataset for a Discrete Bayesian Net structure

INPUT: parents: list of lists of parent indices A variable with index i has ncategories[i] and row in data[i,:] No acyclicity checking is done ncategories: list of variable bin counts, or number of discrete values the variable can take on, v ∈ {1 : ncategories[i]} data: table of discrete values [n×m] where n is the number of nodes and m is the number of samples

OUTPUT: N :: Vector{Matrix{Int}} a sufficient statistics table for each variable Variable with index i has statistics table N[i], which is r × q where r = ncategories[i] is the number of variable instantiations and q is the number of parental instantiations of variable i

    The r-values are ordered from 1 → ncategories[i]
    The q-values are ordered in the same ordering as ind2sub() in Julia Base
        Thus the instantiation of the first parent (by order given in parents[i])
        is varied the fastest.

    ex:
        Variable 1 has parents 2 and 3, with r₁ = 2, r₂ = 2, r₃ = 3
        q for variable 1 is q = r₂×r₃ = 6
        N[1] will be a 6×2 matrix where:
            N[1][1,1] is the number of time v₁ = 1, v₂ = 1, v₃ = 1
            N[1][2,1] is the number of time v₁ = 1, v₂ = 2, v₃ = 1
            N[1][3,1] is the number of time v₁ = 1, v₂ = 1, v₃ = 2
            N[1][4,1] is the number of time v₁ = 1, v₂ = 2, v₃ = 2
            N[1][5,1] is the number of time v₁ = 1, v₂ = 1, v₃ = 3
            N[1][6,1] is the number of time v₁ = 1, v₂ = 2, v₃ = 3
            N[1][6,2] is the number of time v₁ = 2, v₂ = 1, v₃ = 1
            ...

This function uses sparse matrix black magic and was mercilessly stolen from Ed Schmerling.

BayesNets.sumout — Method

sumout(t, v)

Table marginalization

BayesNets.table — Method

table(bn::DiscreteBayesNet, name::NodeName)

Constructs the CPD factor associated with the given node in the BayesNet

Distributions.logpdf — Method

The logpdf of a given assignment after conditioning on the values

Distributions.ncategories — Method

Distributions.ncategories(bn::DiscreteBayesNet, node::Symbol)

Return the number of categories for a node in the network.

Distributions.pdf — Method

The pdf of a given assignment after conditioning on the values

Graphs.dst — Method

Returns all descendants as a list of NodeNames.

Graphs.has_edge — Method

Whether the BayesNet contains the given edge

Graphs.neighbors — Method

Returns all neighbors as a list of NodeNames.

LinearAlgebra.normalize! — Method

normalize!(ϕ, dims; p=1)
normalize!(ϕ; p=1)

Normalize the factor so all instances of dims have (or the entire factors has) p-norm of 1

LinearAlgebra.normalize! — Method

Table normalization Ensures that the :potential column sums to one

LinearAlgebra.normalize — Method

normalize!(ϕ, dims; p=1)
normalize!(ϕ; p=1)

Return a normalized copy of the factor so all instances of dims have (or the entire factors has) p-norm of 1

Random.rand! — Method

Overwrites assignment with a sample from bn using the sampler

Random.rand! — Method

NOTE: this is inefficient. Use rand(bn, GibbsSampler, nsamples) whenever you can

Random.rand! — Method

rand!(ϕ)

Fill with random values

StatsAPI.fit — Method

takes a list of observations of assignments represented as a DataFrame or a set of data samples (without :potential), takes the unique assignments, and estimates the associated probability of each assignment based on its frequency of occurrence.

StatsAPI.fit — Method

fit{C<:CPD}(::Type{BayesNet{C}}, ::DataFrame, ::GraphSearchStrategy)

Run the graph search algorithm defined by GraphSearchStrategy

StatsAPI.fit — Method

fit(::Type{BayesNet}, data, edges)

Fit a Bayesian Net whose variables are the columns in data and whose edges are given in edges

ex: fit(DiscreteBayesNet, data, (:A=>:B, :C=>B))