Mitosis
Incorporate discrete and continuous time Markov processes as building blocks into probabilistic graphical models.
Based on MeasureTheory.jl
Mitosis defines its probability distributions, densities in terms of MeasureTheory.jl.
m = [1.0, 0.5]
K = Matrix(1.0I, 2, 2)
p = Gaussian(μ=m, Σ=K)
mean(p) == m
# output
true
Key concepts
Kernels or distribution valued maps
The core concept of Mitosis is the Markov kernel
.
A kernel κ = kernel(Gaussian, μ=f, Σ=g)
returns a callable which returns a measure with parameters determined by functions f
, g
...
f(x) = x*m
g(_) = K
k = kernel(Gaussian; μ=f, Σ=g)
mean(k(3.0)) == 3*m && cov(k(3.0)) == K
# output
true
Linear and affine Gaussian kernel
Gaussian kernel
become especially powerful if combined with linear and affine mean functions, AffineMap
, LinearMap
, ConstantMap
:
B = [0.8 0.5; -0.1 0.8]
β = [0.1, 0.2]
Q = [0.2 0.0; 0.0 1.0]
x = [0.112, -1.22]
b = AffineMap(B, β)
b(x) == B*x + β
# output
true
Kernels with affine mean
and constant covariance propagate Gaussian uncertainty:
k = kernel(Gaussian, μ = AffineMap(B, β), Σ=ConstantMap(Q))
m = [1.0, 0.5]
K = Matrix(1.0I, 2, 2)
p = Gaussian(μ=m, Σ=K)
k(p) isa Gaussian
# output
true
Backward and forward passes
Backward and forward functions with signature `
message, marginal = backward(BF(), kernel, argument)
marginal = forward(BF(), kernel, message)(argument)
define a generic interface to a 2-pass backward filtering, forward smoothing algorithm. For each transition, the backward pass produces a message for the forward pass.
Example: Bayesian regression with BF()
BF()
specifies the exact (conjugate) linear-Gaussian backward filter, forward smoothing version without importance weights. BFFG()
defines a more general approach which also works for non-linear transitions. Let's first consider the simpler case, in a Bayesian regression example:
$\beta \sim N(\mu_0, \sigma^2 V_0).$
$Y \mid \beta \sim N(X\beta, \sigma^2)$
Data
Small data set.
x = [18.25 19.75 16.5 18.25 19.50 16.25 17.25 19.00 16.25 17.50][:]
y = [36 42 33 39 43 34 37 41 27 30][:]
n = length(x)
# output
10
Prior
The conjugate prior on the parameter $\beta$ is Gaussian,
$\beta \sim N(\mu_0, \sigma^2 V_0).$
We write it as kernel (without arguments) as well:
σ2 = 8.0 # noise level
μ0 = zeros(2)
V0 = 10*I(2)
Σ0 = σ2*V0 # prior
prior = kernel(Gaussian; μ=ConstantMap(μ0), Σ=ConstantMap(Σ0))
mean(prior()) == μ0
# output
true
Model
Conditional on the parameter vector $\beta$, a regression model:
$Y \mid \beta \sim N(X\beta, \sigma^2)$ where X
is the design matrix.
Thus we can express this as linear Gaussian kernel:
X = [x ones(n)] # Design matrix
Σ = Diagonal(σ2*ones(n)) # noise covariance
model = kernel(Gaussian; μ=LinearMap(X), Σ=ConstantMap(Σ))
nothing
# output
Combined forward model
Summarizing, with prior and model, we have
β = rand(prior())
y = rand(model(β))
Think of this as the composition of kernels.
Backward pass
The backward pass takes observations $y$ into account and propagates uncertainty backward through the model.
m2, p2 = backward(BF(), model, y)
m1, p1 = backward(BF(), prior, p2)
nothing
# output
At each step it produces a filtered distribution p1, p2
and a message m1, m2
for the forward pass.
Forward pass
This BF()
forward pass computes marginal distributions of latents. Because the parameters $\beta$ are the latent outcome of the prior, we need at least one step of the forward pass. A second step of the forward pass would just give the observations back.
posterior = forward(BF(), prior, m1)()
# observations = forward(BF(), model, m2)(posterior)
mean(posterior), cov(posterior)
# output
([2.4874650784715016, -8.120051139323095], [0.1700435105146037 -3.00522441850067; -3.00522441850067 53.904213732907884])
References
- Frank van der Meulen, Moritz Schauer (2020): Automatic Backward Filtering Forward Guiding for Markov processes and graphical models. [arXiv:2010.03509].