Usage
A very simple example of a "bijector"/diffeomorphism, i.e. a differentiable transformation with a differentiable inverse, is the exp
function:
- The inverse of
exp
islog
. - The derivative of
exp
at an inputx
is simplyexp(x)
, hencelogabsdetjac
is simplyx
.
julia> using Bijectors
julia> transform(exp, 1.0)
2.718281828459045
julia> logabsdetjac(exp, 1.0)
1.0
julia> with_logabsdet_jacobian(exp, 1.0)
(2.718281828459045, 1.0)
Some transformations are well-defined for different types of inputs, e.g. exp
can also act elementwise on an N
-dimensional Array{<:Real,N}
. To specify that a transformation should act elementwise, we use the elementwise
method:
julia> x = ones(2, 2)
2×2 Matrix{Float64}: 1.0 1.0 1.0 1.0
julia> transform(elementwise(exp), x)
2×2 Matrix{Float64}: 2.71828 2.71828 2.71828 2.71828
julia> logabsdetjac(elementwise(exp), x)
4.0
julia> with_logabsdet_jacobian(elementwise(exp), x)
([2.718281828459045 2.718281828459045; 2.718281828459045 2.718281828459045], 4.0)
These methods also work nicely for compositions of transformations:
julia> transform(elementwise(log ∘ exp), x)
2×2 Matrix{Float64}: 1.0 1.0 1.0 1.0
Unlike exp
, some transformations have parameters affecting the resulting transformation they represent, e.g. Logit
has two parameters a
and b
representing the lower- and upper-bound, respectively, of its domain:
julia> using Bijectors: Logit
julia> f = Logit(0.0, 1.0)
Bijectors.Logit{Float64, Float64}(0.0, 1.0)
julia> f(rand()) # takes us from `(0, 1)` to `(-∞, ∞)`
-0.7571049267334344
User-facing methods
Without mutation:
Bijectors.transform
— Functiontransform(b, x)
Transform x
using b
, treating x
as a single input.
Bijectors.logabsdetjac
— Functionlogabsdetjac(b, x)
Return log(abs(det(J(b, x))))
, where J(b, x)
is the jacobian of b
at x
.
with_logabsdet_jacobian
With mutation:
Bijectors.transform!
— Functiontransform!(b, x[, y])
Transform x
using b
, storing the result in y
.
If y
is not provided, x
is used as the output.
Bijectors.logabsdetjac!
— Functionlogabsdetjac!(b, x[, logjac])
Compute log(abs(det(J(b, x))))
and store the result in logjac
, where J(b, x)
is the jacobian of b
at x
.
Bijectors.with_logabsdet_jacobian!
— Functionwith_logabsdet_jacobian!(b, x[, y, logjac])
Compute transform(b, x)
and logabsdetjac(b, x)
, storing the result in y
and logjac
, respetively.
If y
is not provided, then x
will be used in its place.
Defaults to calling with_logabsdet_jacobian(b, x)
and updating y
and logjac
with the result.
Implementing a transformation
Any callable can be made into a bijector by providing an implementation of ChangeOfVariables.with_logabsdet_jacobian(b, x)
.
You can also optionally implement transform
and logabsdetjac
to avoid redundant computations. This is usually only worth it if you expect transform
or logabsdetjac
to be used heavily without the other.
Similarly with the mutable versions with_logabsdet_jacobian!
, transform!
, and logabsdetjac!
.
Working with Distributions.jl
Bijectors.bijector
— Functionbijector(d::Distribution)
Returns the constrained-to-unconstrained bijector for distribution d
.
Bijectors.transformed
— Methodtransformed(d::Distribution)
transformed(d::Distribution, b::Bijector)
Couples distribution d
with the bijector b
by returning a TransformedDistribution
.
If no bijector is provided, i.e. transformed(d)
is called, then transformed(d, bijector(d))
is returned.
Utilities
Bijectors.elementwise
— Functionelementwise(f)
Alias for Base.Fix1(broadcast, f)
.
In the case where f::ComposedFunction
, the result is Base.Fix1(broadcast, f.outer) ∘ Base.Fix1(broadcast, f.inner)
rather than Base.Fix1(broadcast, f)
.
Bijectors.isinvertible
— Functionisinvertible(t)
Return true
if t
is invertible, and false
otherwise.
Bijectors.isclosedform
— Methodisclosedform(b::Transform)::bool
isclosedform(b⁻¹::Inverse{<:Transform})::bool
Returns true
or false
depending on whether or not evaluation of b
has a closed-form implementation.
Most transformations have closed-form evaluations, but there are cases where this is not the case. For example the inverse evaluation of PlanarLayer
requires an iterative procedure to evaluate.
API
Bijectors.Transform
— TypeAbstract type for a transformation.
Implementing
A subtype of Transform
of should at least implement transform(b, x)
.
If the Transform
is also invertible:
- Required:
- Either of the following:
transform(::Inverse{<:MyTransform}, x)
: thetransform
for its inverse.InverseFunctions.inverse(b::MyTransform)
: returns an existingTransform
.
logabsdetjac
: computes the log-abs-det jacobian factor.
- Either of the following:
- Optional:
with_logabsdet_jacobian
:transform
andlogabsdetjac
combined. Useful in cases where we can exploit shared computation in the two.
For the above methods, there are mutating versions which can optionally be implemented:
Bijectors.Bijector
— TypeAbstract type of a bijector, i.e. differentiable bijection with differentiable inverse.
Bijectors.Inverse
— Typeinverse(b::Transform)
Inverse(b::Transform)
A Transform
representing the inverse transform of b
.
Bijectors
Bijectors.CorrBijector
— TypeCorrBijector <: Bijector
A bijector implementation of Stan's parametrization method for Correlation matrix: https://mc-stan.org/docs/2_23/reference-manual/correlation-matrix-transform-section.html
Basically, a unconstrained strictly upper triangular matrix y
is transformed to a correlation matrix by following readable but not that efficient form:
K = size(y, 1)
z = tanh.(y)
for j=1:K, i=1:K
if i>j
w[i,j] = 0
elseif 1==i==j
w[i,j] = 1
elseif 1<i==j
w[i,j] = prod(sqrt(1 .- z[1:i-1, j].^2))
elseif 1==i<j
w[i,j] = z[i,j]
elseif 1<i<j
w[i,j] = z[i,j] * prod(sqrt(1 .- z[1:i-1, j].^2))
end
end
It is easy to see that every column is a unit vector, for example:
w3' w3 ==
w[1,3]^2 + w[2,3]^2 + w[3,3]^2 ==
z[1,3]^2 + (z[2,3] * sqrt(1 - z[1,3]^2))^2 + (sqrt(1-z[1,3]^2) * sqrt(1-z[2,3]^2))^2 ==
z[1,3]^2 + z[2,3]^2 * (1-z[1,3]^2) + (1-z[1,3]^2) * (1-z[2,3]^2) ==
z[1,3]^2 + z[2,3]^2 - z[2,3]^2 * z[1,3]^2 + 1 -z[1,3]^2 - z[2,3]^2 + z[1,3]^2 * z[2,3]^2 ==
1
And diagonal elements are positive, so w
is a cholesky factor for a positive matrix.
x = w' * w
Consider block matrix representation for x
x = [w1'; w2'; ... wn'] * [w1 w2 ... wn] ==
[w1'w1 w1'w2 ... w1'wn;
w2'w1 w2'w2 ... w2'wn;
...
]
The diagonal elements are given by wk'wk = 1
, thus x
is a correlation matrix.
Every step is invertible, so this is a bijection(bijector).
Note: The implementation doesn't follow their "manageable expression" directly, because their equation seems wrong (7/30/2020). Insteadly it follows definition above the "manageable expression" directly, which is also described in above doc.
Bijectors.LeakyReLU
— TypeLeakyReLU{T}(α::T) <: Bijector
Defines the invertible mapping
x ↦ x if x ≥ 0 else αx
where α > 0.
Bijectors.Stacked
— TypeStacked(bs)
Stacked(bs, ranges)
stack(bs::Bijector...)
A Bijector
which stacks bijectors together which can then be applied to a vector where bs[i]::Bijector
is applied to x[ranges[i]]::UnitRange{Int}
.
Arguments
bs
can be either aTuple
or anAbstractArray
of 0- and/or 1-dimensional bijectors- If
bs
is aTuple
, implementations are type-stable using generated functions - If
bs
is anAbstractArray
, implementations are not type-stable and use iterative methods
- If
ranges
needs to be an iterable consisting ofUnitRange{Int}
length(bs) == length(ranges)
needs to be true.
Examples
b1 = Logit(0.0, 1.0)
b2 = identity
b = stack(b1, b2)
b([0.0, 1.0]) == [b1(0.0), 1.0] # => true
Bijectors.RationalQuadraticSpline
— TypeRationalQuadraticSpline{T} <: Bijector
Implementation of the Rational Quadratic Spline flow [1].
- Outside of the interval
[minimum(widths), maximum(widths)]
, this mapping is given by the identity map. - Inside the interval it's given by a monotonic spline (i.e. monotonic polynomials connected at intermediate points) with endpoints fixed so as to continuously transform into the identity map.
For the sake of efficiency, there are separate implementations for 0-dimensional and 1-dimensional inputs.
Notes
There are two constructors for RationalQuadraticSpline
:
RationalQuadraticSpline(widths, heights, derivatives)
: it is assumed thatwidths
,
heights
, and derivatives
satisfy the constraints that makes this a valid bijector, i.e.
widths
: monotonically increasing andlength(widths) == K
,heights
: monotonically increasing andlength(heights) == K
,derivatives
: non-negative andderivatives[1] == derivatives[end] == 1
.RationalQuadraticSpline(widths, heights, derivatives, B)
: other than than the lengths, no assumptions are made on parameters. Therefore we will transform the parameters s.t.:widths_new
∈ [-B, B]ᴷ⁺¹, whereK == length(widths)
,heights_new
∈ [-B, B]ᴷ⁺¹, whereK == length(heights)
,derivatives_new
∈ (0, ∞)ᴷ⁺¹ withderivatives_new[1] == derivates_new[end] == 1
, where(K - 1) == length(derivatives)
.
Examples
Univariate
julia> using StableRNGs: StableRNG; rng = StableRNG(42); # For reproducibility.
julia> using Bijectors: RationalQuadraticSpline
julia> K = 3; B = 2;
julia> # Monotonic spline on '[-B, B]' with `K` intermediate knots/"connection points".
b = RationalQuadraticSpline(randn(rng, K), randn(rng, K), randn(rng, K - 1), B);
julia> b(0.5) # inside of `[-B, B]` → transformed
1.1943325397834206
julia> b(5.) # outside of `[-B, B]` → not transformed
5.0
julia> b = RationalQuadraticSpline(b.widths, b.heights, b.derivatives);
julia> b(0.5) # inside of `[-B, B]` → transformed
1.1943325397834206
julia> d = 2; K = 3; B = 2;
julia> b = RationalQuadraticSpline(randn(rng, d, K), randn(rng, d, K), randn(rng, d, K - 1), B);
julia> b([-1., 1.])
2-element Vector{Float64}:
-1.5660106244288925
0.5384702734738573
julia> b([-5., 5.])
2-element Vector{Float64}:
-5.0
5.0
julia> b([-1., 5.])
2-element Vector{Float64}:
-1.5660106244288925
5.0
References
[1] Durkan, C., Bekasov, A., Murray, I., & Papamakarios, G., Neural Spline Flows, CoRR, arXiv:1906.04032 [stat.ML], (2019).
Bijectors.Coupling
— TypeCoupling{F, M}(θ::F, mask::M)
Implements a coupling-layer as defined in [1].
Examples
julia> using Bijectors: Shift, Coupling, PartitionMask, coupling, couple
julia> m = PartitionMask(3, [1], [2]); # <= going to use x[2] to parameterize transform of x[1]
julia> cl = Coupling(Shift, m); # <= will do `y[1:1] = x[1:1] + x[2:2]`;
julia> x = [1., 2., 3.];
julia> cl(x)
3-element Vector{Float64}:
3.0
2.0
3.0
julia> inverse(cl)(cl(x))
3-element Vector{Float64}:
1.0
2.0
3.0
julia> coupling(cl) # get the `Bijector` map `θ -> b(⋅, θ)`
Shift
julia> couple(cl, x) # get the `Bijector` resulting from `x`
Shift([2.0])
julia> with_logabsdet_jacobian(cl, x)
([3.0, 2.0, 3.0], 0.0)
References
[1] Kobyzev, I., Prince, S., & Brubaker, M. A., Normalizing flows: introduction and ideas, CoRR, (), (2019).
Bijectors.OrderedBijector
— TypeOrderedBijector()
A bijector mapping unordered vectors in ℝᵈ to ordered vectors in ℝᵈ.
See also
- Stan's documentation
- Note that this transformation and its inverse are the opposite of in this reference.
Bijectors.NamedTransform
— TypeNamedTransform <: AbstractNamedTransform
Wraps a NamedTuple
of key -> Bijector
pairs, implementing evaluation, inversion, etc.
Examples
julia> using Bijectors: NamedTransform, Scale
julia> b = NamedTransform((a = Scale(2.0), b = exp));
julia> x = (a = 1., b = 0., c = 42.);
julia> b(x)
(a = 2.0, b = 1.0, c = 42.0)
julia> (a = 2 * x.a, b = exp(x.b), c = x.c)
(a = 2.0, b = 1.0, c = 42.0)
Bijectors.NamedCoupling
— TypeNamedCoupling{target, deps, F} <: AbstractNamedTransform
Implements a coupling layer for named bijectors.
See also: Coupling
Examples
julia> using Bijectors: NamedCoupling, Scale
julia> b = NamedCoupling(:b, (:a, :c), (a, c) -> Scale(a + c));
julia> x = (a = 1., b = 2., c = 3.);
julia> b(x)
(a = 1.0, b = 8.0, c = 3.0)
julia> (a = x.a, b = (x.a + x.c) * x.b, c = x.c)
(a = 1.0, b = 8.0, c = 3.0)