# Sampling of diffusion bridges

In this tutorial we will illustrate how to sample diffusion bridges using rejection sampling and importance sampling with Brownian bridge proposals. We will also embed the latter routine in a Metropolis–Hastings algorithm. We will illustrate the techniques on the example of a bivariate double-well potential.

Warning

Sampling of diffusion bridges using rejection sampling or importance sampling with Brownian bridge proposals is—in general—not an efficient method and for plenitude of problems it is not even applicable. It is, however, an excellent starting point for learning about simulation of conditioned diffusions. The tutorial is meant as a didactic explanation of the common mechanisms behind most samplers of conditioned diffusions. Nevertheless, it is NOT meant to be used for production purposes. For that we refer the user to GuidedProposals.jl, which has been designed to specifically address the problem of sampling conditioned diffusions.

## Why is there so much text in this tutorial?

Sampling of diffusion bridges might seem like an innocent problem if you haven't seen it before. It is however a considerable obstacle that many mathematicians, statisticians and physicists have been attacking for tens of years. And for good reasons—efficient methods that solve this problem are considered to be extremely valuable tools for a variety of real-world applications. No universal solution has been found, however, there are a few contending methodologies that perform particularly well across a variety of problems of this kind. Many of those techniques are far too complicated to be explained in a single tutorial. Fortunately, simple versions of rejection and importance sampling fit the bill just right. Even these however require from us to cover some additional details. Hence the length of the tutorial.

## Rejection sampling

Rejection sampling is a simple, but extremely powerful technique for obtaining independent samples from some probability law $\mathbb{P}$ (with density $\dd \mathbb{P}$) using independent samples from another probability law $\mathbb{Q}$ (with density $\dd \mathbb{Q}$). You may, for instance, check out the wikipedia article to learn more.

The basic algorithm is as follows:

struct RejectionSampling end
# these need to be appropriately defined
function density(P, at=x) ... end
function some_max_bound_on_dP_over_dQ(P, Q) ... end
P, Q = ...

function Base.rand(P, Q, ::RejectionSampling)
M = some_max_bound_on_dP_over_dQ(P, Q)
while true
X° = rand(Q) # proposal
acceptance_prob = density(P, at=X) / (density(Q, at=X) * M)
# accept/reject; if reject then re-sample and retry; if accept then return X°
if rand() <= acceptance_prob
return X°
end
end
end

A sample returned by the algorithm above:

X = rand(P, Q, RejectionSampling())

is distributed exactly according to $\mathbb{P}$ even though all sampling was done exclusively through the law $\mathbb{Q}$.

Definition

The law $\mathbb{P}$ is called the target law, whereas $\mathbb{Q}$ is called the proposal law.

## Rejection sampling on a path space

Rejection sampling on a path space is a fancy name for the algorithm above as applied to diffusion processes, i.e. when the laws $\mathbb{P}$ and $\mathbb{Q}$ are some diffusion laws.

Computation of each individual term in the acceptance probability

acceptance_prob = density(P, at=X) / (density(Q, at=X) * M)

which in a mathematical notation becomes:

$$$\label{eq:accpprobrs} \frac{1}{M}\frac{\dd \mathbb{P}}{\dd \mathbb{Q}}(X)$$$

is not possible when working with diffusion processes. Nevertheless, the overall term can sometimes be found.

Some mathematical details

In particular, let's assume that the target diffusion law is of the following form:

$$$\label{eq:target_rejection} \dd X_t = α(X_t)\dd t + \dd W_t,\quad X_0=x_0, \quad t\in[0,T],$$$

with $α:\RR^d→\RR^d$ (at least once continuously differentiable), a volatility coefficient given by the identity matrix and $W$ denoting a $d$-dimensional Brownian motion. Additionally, let's assume that there exists a function $A:\RR^d→\RR$ such that:

$∇A(x) = α(x).$

(We refer to $A$ as a potential function). Then, if we let $\mathbb{P}$ denote the law of \eqref{eq:target_rejection} conditioned on an end-point, i.e. of:

$\dd X_t = α(X_t)\dd t + \dd W_t,\quad X_0=x_0,\quad X_T=x_T, \quad t\in[0,T],$

and we let $\mathbb{Q}$ denote the law of a $d$–dimensional Brownian bridge joining $x_0$ and $x_T$ on $[0,T]$, then the Radon–Nikodym derivative between the two is proportional to:

$$$\label{eq:rnd} \frac{\dd \mathbb{P}}{\dd \mathbb{Q}}(X)\propto \exp\left\{- \int_0^Tϕ(X_t)\dd t \right\}\leq 1,$$$

where

$$$\label{eq:phi_function} \phi(x):=\frac{1}{2}\left[ \alpha'\alpha + Δ A \right](x) - l.$$$

and $l$ is any constant, which we may take in particular to be

$l\leq \inf_{x\in\RR^d}\left\{ \frac{1}{2}\left[ \alpha'\alpha + Δ A \right](x) \right\},$

that guarantee the inequality $\leq 1$ in \eqref{eq:rnd} and where the infimum on the right is assumed to exist (if it does not, then rejection sampling with Brownian bridge proposals cannot be applied).

The main take-away message from the box above is that if the stochastic differential equation admits a nice enough expression \eqref{eq:target_rejection}, then it is possible to find a closed form formula for the computation of the acceptance probability

$\frac{1}{M}\frac{\dd \mathbb{P}}{\dd \mathbb{Q}}(X),$

and rejection sampling on a path space can be performed by simply repeatedly drawing from $\mathbb{Q}$–the law of Brownian bridges (see this how-to-guide for more info)—and accepting the samples with probability given in \eqref{eq:rnd}.

Lamperti transformation

The restriction to identity matrix volatility coefficient may be slightly relaxed thanks to existence of Lamperti transformation, which allows one to transform a diffusion with a non-identity volatility coefficient to the one with identity volatility. This transform is applicable to all scalar diffusions, however, in multiple dimensions it is more restrictive. See [Ait Sahalia 2008] for more details.

### Example

Let's look at a simple example of a bivariate double-well potential model solving the following SDE

$\dd X_t = -\frac{1}{2}∇A(X_t)\dd t + \dd W_t,\quad X_0 = x_0,\quad t\in[0,T],$

where

$A(x) := ρ_1\left[ (x_2)^2 - μ_1 \right]^2 + ρ_2\left[ x_2 - μ_2 x_1 \right]^2,$

and $ρ_1,ρ_2,μ_1,μ_2>0$ are parameters. We can define it very simply

using DiffusionDefinition
const DD = DiffusionDefinition

using StaticArrays, LinearAlgebra

@diffusion_process BivariateDoubleWell{T} begin
:dimensions
process --> 2
wiener --> 2

:parameters
(ρ1, ρ2, μ1, μ2) --> T

constdiff --> true
end

DD.b(t, x, P::BivariateDoubleWell) = @SVector [
P.μ2*P.ρ2*(x[2]-P.μ2*x[1]),
2.0*P.ρ1*P.μ1*x[2] - 2.0*P.ρ1*x[2]^3 - P.ρ2*(x[2]-P.μ2*x[1]),
]

@inline DD.σ(t, x, P::BivariateDoubleWell) = SDiagonal{2,Float64}(I)
Note

In fact, below, we will never use the volatility coefficient. We defined it simply for completeness.

Tedious calculations reveal that function $ϕ$ achieves a global minimum:

$\inf_xϕ(x)= -\frac{p_1+p_2}{54μ_2^2(1+μ_2^2)},$

where

\begin{align*} p_1&:=2μ_2\sqrt{2ρ_1}\left[ 9 + μ_2^2\left(9+2ρ_1μ_1^2\right) \right]^{3/2}\\ p_2&:=μ_2^2\left[ 54ρ_1μ_1(1+μ_2^2) - 8ρ_1^2μ_1^3μ_2^2 + 27ρ_2(1+μ_2^2)^2 \right]. \end{align*}

In Julia this becomes

function inf_ϕ(P::BivariateDoubleWell)
-( compute_p1(P) + compute_p2(P) )/( 54.0*P.μ2^2*(1.0 + P.μ2^2) )
end

function compute_p1(P::BivariateDoubleWell)
2.0*P.μ2*sqrt(2.0*P.ρ1)*(
9.0 + P.μ2^2*(9.0 + 2.0*P.ρ1*P.μ1^2)
)^(3/2)
end

function compute_p2(P::BivariateDoubleWell)
P.μ2^2*(
54.0*P.ρ1*P.μ1*(1.0+P.μ2^2)
- 8.0*P.ρ1^2*P.μ1^3*P.μ2^2
+ 27*P.ρ2*(1.0+P.μ2^2)^2
)
end

We should also have a routine for evaluating the $ϕ$ function:

function ϕ(x, P::DD.DiffusionProcess)
b = DD.b(nothing, x, P)
0.5*(dot(b,b) + sum(b_prime(x, P)))
end

b_prime(x, P::BivariateDoubleWell) = @SVector [
-P.μ2^2*P.ρ2,
2.0*P.ρ1*P.μ1 - 6.0*P.ρ1*x[2]^2 - P.ρ2,
]

As well as for computing the acceptance probability for the simulated proposal path:

function acc_prob(X, P::DD.DiffusionProcess, l=inf_ϕ(P))
logprob = 0.0
for i in 1:length(X)-1
dt = X.t[i+1]-X.t[i]
logprob -= (ϕ(X.x[i], P) - l)*dt
end
exp(logprob)
end

Finally, we want to have a routine for sampling from the proposal law, which in our case is the law of Brownian bridges (see the relevant how-to-guide for more details).

using Random
struct BrownianBridge end

function Random.rand!(::BrownianBridge, X, x0=zero(X.x[1]), xT=zero(X.x[1]))
rand!(Wiener(), X)
T, B_T = X.t[end], X.x[end]
λ(t, x) = ( x0 + x + (xT - x0 - B_T) * t/T )
X.x .= λ.(X.t, X.x)
end

We are now ready to implement rejection sampling on a path space.

function Base.rand(P::DD.DiffusionProcess, tt, x0, xT, ::RejectionSampling)
X° = trajectory(tt, P).process
# compute constant l only once, it won't change anyway
l = inf_ϕ(P)
while true
# proposal draw
rand!(BrownianBridge(), X°, x0, xT)

# accept/reject; if reject then re-sample and retry; if accept then return X°
(rand() <= acc_prob(X°, P, l)) && return X°
end
end

That's it! As you can see for yourself: a beautifully simple algorithm. Let's test it. We will be using the following parameterization

θ = (
ρ1 = 0.5,
ρ2 = 0.5,
μ1 = 2.0,
μ2 = 1.0
)

The bivariate double-well potential is—as the name suggests it–a diffusion with a stationary distribution that has two modes. These modes are at:

mode1 = @SVector [sqrt(θ.μ1)/θ.μ2, sqrt(θ.μ1)]
mode2 = @SVector [-sqrt(θ.μ1)/θ.μ2,-sqrt(θ.μ1)]

Consequently, when we simulate the trajectory over long intervals we should observe the trajectories being attracted to these two poles.

Let's start modestly and try to simulate $30$ bridge trajectories over the interval $[0,3]$ that join the two diffusion modes

P = BivariateDoubleWell(θ...)
tt = 0.0:0.01:3.0

N = 30
paths = [rand(P, tt, mode1, mode2, RejectionSampling()) for i in 1:N]

using Plots
p = plot(size=(800, 500), layout=(2,1))
for i in 1:N
plot!(p, paths[i], Val(:vs_time), alpha=[0.2 0.2], color=["steelblue" "steelblue"], label=["X₁" "X₂"])
end
display(p)

When you run the code above you might be slightly surprised because of two reasons.

1. First, where is the bimodality behavior we talked about?
2. Second, why did the code not execute immediately? Seemingly there aren't that many computations to be performed! On my computer it takes around 1.5second to simulate the trajectories. Why so long?

To answer the first concern, the bimodal behavior becomes pronounced only over long enough time-spans. The time span we used was simply too short and the trajectories ended-up being stretched around a straight line that connects the end-points. Had we taken longer time-spans the behavior would have been more pronounced. But let's not jump ahead of ourselves by trying to run the algorithm above with larger $T$. Indeed, the reason for the slow performance of the code above is that an already seemingly short time-span of $[0,3]$ is troublesome for the sampler and most proposals end up being rejected. We can modify the routine a little to examine the magnitude of the problem

function Base.rand(P::DD.DiffusionProcess, tt, x0, xT, ::RejectionSampling)
X° = trajectory(tt, P).process
# compute constant l only once, it won't change anyway
l = inf_ϕ(P)
num_prop = 0
while true
# proposal draw
rand!(BrownianBridge(), X°, x0, xT)
num_prop += 1

# accept/reject; if reject then re-sample and retry; if accept then return X°
(rand() <= acc_prob(X°, P, l)) && (