# DataAssim.jl

Documentation for DataAssim.jl

## Simulation driver

The observations yo are provided as a vector of vectors (possibly of different length). Similarily their error covariance R is a vector of matrices. The elements of the vectors yo and R can be constructed as needed by the type DataAssim.VectorFun from a function:

using DataAssim, LinearAlgebra
fun = n -> (n < 5 : Matrix(I,3,3) : Matrix(I,5,5))
R = DataAssim.VectorFun(Matrix{Bool},10,fun)

where Matrix{Bool} is the return type of the function fun which can be used for n from 1 to 10 in this example.

The model $ℳ$ implementes the following API:

• $ℳ(n,x)$ apply the model to $x$ to forecast the index state vector. $n$ is the time index.
• $tgl(ℳ,n,x,Δx)$ apply the tangent linear model for a perturbation $Δx$ around $x$ (foreward differentiation).
• $adj(ℳ,n,x,Δx)$ apply the adjoin model (reverse differentiation).

For any $x$, $Δx₁$, $Δx₂$ and $n$, tangent linear and adjoint model are related by:

$$$Δx₂ ⋅ tgl(ℳ,n,x,Δx₁) = adj(ℳ,n,x,Δx₂) ⋅ Δx₁$$$

where ⋅ is the inner product.

The same API should also be implemented for the observation 𝓗 where 𝓗$(n,x)$ represents the observed part of the state vector (for 4D-Var). Note that for the Kalman Filter method the adjoint (of the model or the observation operator) is not needed.

For the ensemble analysis methods, only the application of the model and observations operator to every ensemble member is needed.

DataAssim.FreeRunFunction
x,Hx = FreeRun(ℳ,xi,Q,H,nmax,no)

Performs a free-run with the model ℳ and nmax time-steps starting at the initial condition xi. Observations at the time steps given in no are extracted with the observation operator H.

DataAssim.KalmanFilterFunction
x,P = KalmanFilter(xi,Pi,ℳ,Q,yo,R,𝓗,nmax,no)

Kalman Filter with the model ℳ and nmax time-steps starting at the initial condition xi and error covariance Pi. Observations yo (and error covariance R) at the time steps given in no are assimilated with the observation operator H.

DataAssim.fourDVarFunction
x,J = fourDVar(
xi,Pi,ℳ,yo,R,H,nmax,no;
innerloops = 10,
outerloops = 2,
tol = 1e-5)

Incremental 4D-Var with the model ℳ (AbstractModel) and nmax time-steps starting at the initial condition xi and error covariance Pi with the specified numbers of inner and outer loops. Observations yo (vector of vectors) and error covariance R (vector of matrices) at the time steps given in no are assimilated with the observation operator H (AbstractModel).

## Ensemble methods

DataAssim.ETKFFunction
Xa,xa = ETKF(Xf,HXf,y,R,H,...)

Computes analysis ensemble Xa based on forecast ensemble Xf and observations y using the ETKF ensemble scheme.

Input arguments:

• Xf: forecast ensemble (n x N)
• HXf: the observation operator applied on the ensemble (product H*Xf)
• y: observations (m)
• R: observation error covariance (m x m).
• H: operator (m x n). Except for the serialEnSRF it is never used and can be empty

Optional keywords arguments:

• debug: set to true to enable debugging. Default (false) is no debugging.
• tolerance: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

Output arguments:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

DataAssim.EnKFFunction
Xa,xa = EnKF(Xf,HXf,y,R,H,...)

Computes analysis ensemble Xa based on forecast ensemble Xf and observations y using the EnKF ensemble scheme.

Input arguments:

• Xf: forecast ensemble (n x N)
• HXf: the observation operator applied on the ensemble (product H*Xf)
• y: observations (m)
• R: observation error covariance (m x m).
• H: operator (m x n). Except for the serialEnSRF it is never used and can be empty

Optional keywords arguments:

• debug: set to true to enable debugging. Default (false) is no debugging.
• tolerance: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

Output arguments:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

DataAssim.EnSRFFunction
Xa,xa = EnSRF(Xf,HXf,y,R,H,...)

Computes analysis ensemble Xa based on forecast ensemble Xf and observations y using the EnSRF ensemble scheme.

Input arguments:

• Xf: forecast ensemble (n x N)
• HXf: the observation operator applied on the ensemble (product H*Xf)
• y: observations (m)
• R: observation error covariance (m x m).
• H: operator (m x n). Except for the serialEnSRF it is never used and can be empty

Optional keywords arguments:

• debug: set to true to enable debugging. Default (false) is no debugging.
• tolerance: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

Output arguments:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

DataAssim.EAKFFunction
Xa,xa = EAKF(Xf,HXf,y,R,H,...)

Computes analysis ensemble Xa based on forecast ensemble Xf and observations y using the EAKF ensemble scheme.

Input arguments:

• Xf: forecast ensemble (n x N)
• HXf: the observation operator applied on the ensemble (product H*Xf)
• y: observations (m)
• R: observation error covariance (m x m).
• H: operator (m x n). Except for the serialEnSRF it is never used and can be empty

Optional keywords arguments:

• debug: set to true to enable debugging. Default (false) is no debugging.
• tolerance: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

Output arguments:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

DataAssim.SEIKFunction
Xa,xa = SEIK(Xf,HXf,y,R,H,...)

Computes analysis ensemble Xa based on forecast ensemble Xf and observations y using the SEIK ensemble scheme.

Input arguments:

• Xf: forecast ensemble (n x N)
• HXf: the observation operator applied on the ensemble (product H*Xf)
• y: observations (m)
• R: observation error covariance (m x m).
• H: operator (m x n). Except for the serialEnSRF it is never used and can be empty

Optional keywords arguments:

• debug: set to true to enable debugging. Default (false) is no debugging.
• tolerance: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

Output arguments:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

DataAssim.ESTKFFunction
Xa,xa = ESTKF(Xf,HXf,y,R,H,...)

Computes analysis ensemble Xa based on forecast ensemble Xf and observations y using the ESTKF ensemble scheme.

Input arguments:

• Xf: forecast ensemble (n x N)
• HXf: the observation operator applied on the ensemble (product H*Xf)
• y: observations (m)
• R: observation error covariance (m x m).
• H: operator (m x n). Except for the serialEnSRF it is never used and can be empty

Optional keywords arguments:

• debug: set to true to enable debugging. Default (false) is no debugging.
• tolerance: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

Output arguments:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

DataAssim.serialEnSRFFunction
Xa,xa = serialEnSRF(Xf,HXf,y,R,H,...)

Computes analysis ensemble Xa based on forecast ensemble Xf and observations y using the serialEnSRF ensemble scheme.

Input arguments:

• Xf: forecast ensemble (n x N)
• HXf: the observation operator applied on the ensemble (product H*Xf)
• y: observations (m)
• R: observation error covariance (m x m).
• H: operator (m x n). Except for the serialEnSRF it is never used and can be empty

Optional keywords arguments:

• debug: set to true to enable debugging. Default (false) is no debugging.
• tolerance: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

Output arguments:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

DataAssim.local_ETKFFunction
Xa,xa = local_ETKF(Xf,H,y,diagR,part,selectObs,...)

Computes analysis ensemble Xa based on forecast ensemble Xf using the observation y using the local ETKF.

Inputs:

• Xf: forecast ensemble (n x N)
• H: observation operator (m x n)
• y: observation (m x 1)
• diagR: diagonal of the observation error covariance R (m x 1)
• part: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain
• selectObs: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:
     selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );

or

     selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));

where x and y is the horizontal model grid, xobs and yobs are the locations of the observations and L is a correlation length-scale

Optional inputs:

• display: if true, then display progress (false is the default)
• minweight: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)
• HXf: if non empty, then it is the product H*Xf. In this case, H is not used

Output:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n x 1)

DataAssim.local_EnKFFunction
Xa,xa = local_EnKF(Xf,H,y,diagR,part,selectObs,...)

Computes analysis ensemble Xa based on forecast ensemble Xf using the observation y using the local EnKF.

Inputs:

• Xf: forecast ensemble (n x N)
• H: observation operator (m x n)
• y: observation (m x 1)
• diagR: diagonal of the observation error covariance R (m x 1)
• part: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain
• selectObs: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:
     selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );

or

     selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));

where x and y is the horizontal model grid, xobs and yobs are the locations of the observations and L is a correlation length-scale

Optional inputs:

• display: if true, then display progress (false is the default)
• minweight: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)
• HXf: if non empty, then it is the product H*Xf. In this case, H is not used

Output:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n x 1)

DataAssim.local_EnSRFFunction
Xa,xa = local_EnSRF(Xf,H,y,diagR,part,selectObs,...)

Computes analysis ensemble Xa based on forecast ensemble Xf using the observation y using the local EnSRF.

Inputs:

• Xf: forecast ensemble (n x N)
• H: observation operator (m x n)
• y: observation (m x 1)
• diagR: diagonal of the observation error covariance R (m x 1)
• part: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain
• selectObs: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:
     selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );

or

     selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));

where x and y is the horizontal model grid, xobs and yobs are the locations of the observations and L is a correlation length-scale

Optional inputs:

• display: if true, then display progress (false is the default)
• minweight: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)
• HXf: if non empty, then it is the product H*Xf. In this case, H is not used

Output:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n x 1)

DataAssim.local_EAKFFunction
Xa,xa = local_EAKF(Xf,H,y,diagR,part,selectObs,...)

Computes analysis ensemble Xa based on forecast ensemble Xf using the observation y using the local EAKF.

Inputs:

• Xf: forecast ensemble (n x N)
• H: observation operator (m x n)
• y: observation (m x 1)
• diagR: diagonal of the observation error covariance R (m x 1)
• part: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain
• selectObs: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:
     selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );

or

     selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));

where x and y is the horizontal model grid, xobs and yobs are the locations of the observations and L is a correlation length-scale

Optional inputs:

• display: if true, then display progress (false is the default)
• minweight: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)
• HXf: if non empty, then it is the product H*Xf. In this case, H is not used

Output:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n x 1)

DataAssim.local_SEIKFunction
Xa,xa = local_SEIK(Xf,H,y,diagR,part,selectObs,...)

Computes analysis ensemble Xa based on forecast ensemble Xf using the observation y using the local SEIK.

Inputs:

• Xf: forecast ensemble (n x N)
• H: observation operator (m x n)
• y: observation (m x 1)
• diagR: diagonal of the observation error covariance R (m x 1)
• part: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain
• selectObs: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:
     selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );

or

     selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));

where x and y is the horizontal model grid, xobs and yobs are the locations of the observations and L is a correlation length-scale

Optional inputs:

• display: if true, then display progress (false is the default)
• minweight: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)
• HXf: if non empty, then it is the product H*Xf. In this case, H is not used

Output:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n x 1)

DataAssim.local_ESTKFFunction
Xa,xa = local_ESTKF(Xf,H,y,diagR,part,selectObs,...)

Computes analysis ensemble Xa based on forecast ensemble Xf using the observation y using the local ESTKF.

Inputs:

• Xf: forecast ensemble (n x N)
• H: observation operator (m x n)
• y: observation (m x 1)
• diagR: diagonal of the observation error covariance R (m x 1)
• part: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain
• selectObs: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:
     selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );

or

     selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));

where x and y is the horizontal model grid, xobs and yobs are the locations of the observations and L is a correlation length-scale

Optional inputs:

• display: if true, then display progress (false is the default)
• minweight: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)
• HXf: if non empty, then it is the product H*Xf. In this case, H is not used

Output:

• Xa: the analysis ensemble (n x N)
• xa: the analysis ensemble mean (n x 1)

## Models

DataAssim.ModelFunType
ℳ = ModelFun(nonlinear_forecast,tangent_linear_model,adjoint_model)

Model defined by the functions nonlinear_forecast,tangent_linear_model and adjoint_model.

DataAssim.LinShallowWater1DModelType
ℳ = LinShallowWater1DModel(dt,g,h,L,imax)

Linear 1D shallow water model.

Example

dt = 1.
g = 9.81
h = 100
imax = 101
L = 10000
LinShallowWater1DModel(dt,g,h,L,imax)
DataAssim.Lorenz63ModelType
ℳ = Lorenz63Model(dt,σ=10.,β = 8/3.,ρ = 28.)

Lorenz, 1963 model[1] integrated with a 2nd order Runge-Kutta scheme.

[1] https://doi.org/10.1175/1520-0469(1963)020%3C0130:DNF%3E2.0.CO;2

## Utility functions

DataAssim.compact_locfunFunction
 fun = compact_locfun(r)

Smooth compact localization function at the (scaled) distance r. fun is zero if r > 2 and one if r is 0. (Gaspari et al. (1999), equation 4.10, [1])

[1] http://dx.doi.org/10.1002/qj.49712555417

DataAssim.talagrand_diagramFunction
freq = talagrand_diagram(x,y)

Compute the frequency of each bins for a Talagrand Diagram where x is the ensemble (array of the size sample × ensemble size) and y are the obervations (sample).

Example

using PyPlot
Nens = 100
Nsample = 10000
# Ensemble
x = randn(Nsample,Nens)
# Observation
y = randn(Nsample)

freq = talagrand_diagram(x,y)
plot(freq)
plot(ones(size(freq)) / size(freq,1))
ylim(0,ylim()[2])
xlabel("bins")
ylabel("frequency")