# DataAssim.jl

Documentation for DataAssim.jl

## Simulation driver

The observations `yo`

are provided as a vector of vectors (possibly of different length). Similarily their error covariance `R`

is a vector of matrices. The elements of the vectors `yo`

and `R`

can be constructed as needed by the type `DataAssim.VectorFun`

from a function:

```
using DataAssim, LinearAlgebra
fun = n -> (n < 5 : Matrix(I,3,3) : Matrix(I,5,5))
R = DataAssim.VectorFun(Matrix{Bool},10,fun)
```

where `Matrix{Bool}`

is the return type of the function `fun`

which can be used for `n`

from `1`

to `10`

in this example.

The model $ℳ$ implementes the following API:

- $ℳ(n,x)$ apply the model to $x$ to forecast the index state vector. $n$ is the time index.
- $tgl(ℳ,n,x,Δx)$ apply the tangent linear model for a perturbation $Δx$ around $x$ (foreward differentiation).
- $adj(ℳ,n,x,Δx)$ apply the adjoin model (reverse differentiation).

For any $x$, $Δx₁$, $Δx₂$ and $n$, tangent linear and adjoint model are related by:

\[Δx₂ ⋅ tgl(ℳ,n,x,Δx₁) = adj(ℳ,n,x,Δx₂) ⋅ Δx₁\]

where ⋅ is the inner product.

The same API should also be implemented for the observation 𝓗 where 𝓗$(n,x)$ represents the observed part of the state vector (for 4D-Var). Note that for the Kalman Filter method the adjoint (of the model or the observation operator) is not needed.

For the ensemble analysis methods, only the application of the model and observations operator to every ensemble member is needed.

`DataAssim.FreeRun`

— Function`x,Hx = FreeRun(ℳ,xi,Q,H,nmax,no)`

Performs a free-run with the model `ℳ`

and `nmax`

time-steps starting at the initial condition `xi`

. Observations at the time steps given in `no`

are extracted with the observation operator `H`

.

`DataAssim.KalmanFilter`

— Function`x,P = KalmanFilter(xi,Pi,ℳ,Q,yo,R,𝓗,nmax,no)`

Kalman Filter with the model `ℳ`

and `nmax`

time-steps starting at the initial condition `xi`

and error covariance `Pi`

. Observations `yo`

(and error covariance `R`

) at the time steps given in `no`

are assimilated with the observation operator `H`

.

`DataAssim.fourDVar`

— Function```
x,J = fourDVar(
xi,Pi,ℳ,yo,R,H,nmax,no;
innerloops = 10,
outerloops = 2,
tol = 1e-5)
```

Incremental 4D-Var with the model `ℳ`

(`AbstractModel`

) and `nmax`

time-steps starting at the initial condition `xi`

and error covariance `Pi`

with the specified numbers of inner and outer loops. Observations `yo`

(vector of vectors) and error covariance `R`

(vector of matrices) at the time steps given in `no`

are assimilated with the observation operator `H`

(`AbstractModel`

).

## Ensemble methods

`DataAssim.ETKF`

— Function`Xa,xa = ETKF(Xf,HXf,y,R,H,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

and observations `y`

using the ETKF ensemble scheme.

**Input arguments:**

`Xf`

: forecast ensemble (n x N)`HXf`

: the observation operator applied on the ensemble (product H*Xf)`y`

: observations (m)`R`

: observation error covariance (m x m).`H`

: operator (m x n). Except for the serialEnSRF it is never used and can be empty

**Optional keywords arguments:**

`debug`

: set to true to enable debugging. Default (false) is no debugging.`tolerance`

: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

**Output arguments:**

`Xa`

: the analysis ensemble (n x N)`xa`

: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

`DataAssim.EnKF`

— Function`Xa,xa = EnKF(Xf,HXf,y,R,H,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

and observations `y`

using the EnKF ensemble scheme.

**Input arguments:**

`Xf`

: forecast ensemble (n x N)`HXf`

: the observation operator applied on the ensemble (product H*Xf)`y`

: observations (m)`R`

: observation error covariance (m x m).`H`

: operator (m x n). Except for the serialEnSRF it is never used and can be empty

**Optional keywords arguments:**

`debug`

: set to true to enable debugging. Default (false) is no debugging.`tolerance`

: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

**Output arguments:**

`Xa`

: the analysis ensemble (n x N)`xa`

: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

`DataAssim.EnSRF`

— Function`Xa,xa = EnSRF(Xf,HXf,y,R,H,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

and observations `y`

using the EnSRF ensemble scheme.

**Input arguments:**

`Xf`

: forecast ensemble (n x N)`HXf`

: the observation operator applied on the ensemble (product H*Xf)`y`

: observations (m)`R`

: observation error covariance (m x m).`H`

: operator (m x n). Except for the serialEnSRF it is never used and can be empty

**Optional keywords arguments:**

`debug`

: set to true to enable debugging. Default (false) is no debugging.`tolerance`

: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

**Output arguments:**

`Xa`

: the analysis ensemble (n x N)`xa`

: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

`DataAssim.EAKF`

— Function`Xa,xa = EAKF(Xf,HXf,y,R,H,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

and observations `y`

using the EAKF ensemble scheme.

**Input arguments:**

`Xf`

: forecast ensemble (n x N)`HXf`

: the observation operator applied on the ensemble (product H*Xf)`y`

: observations (m)`R`

: observation error covariance (m x m).`H`

: operator (m x n). Except for the serialEnSRF it is never used and can be empty

**Optional keywords arguments:**

`debug`

: set to true to enable debugging. Default (false) is no debugging.`tolerance`

: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

**Output arguments:**

`Xa`

: the analysis ensemble (n x N)`xa`

: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

`DataAssim.SEIK`

— Function`Xa,xa = SEIK(Xf,HXf,y,R,H,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

and observations `y`

using the SEIK ensemble scheme.

**Input arguments:**

`Xf`

: forecast ensemble (n x N)`HXf`

: the observation operator applied on the ensemble (product H*Xf)`y`

: observations (m)`R`

: observation error covariance (m x m).`H`

: operator (m x n). Except for the serialEnSRF it is never used and can be empty

**Optional keywords arguments:**

`debug`

: set to true to enable debugging. Default (false) is no debugging.`tolerance`

: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

**Output arguments:**

`Xa`

: the analysis ensemble (n x N)`xa`

: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

`DataAssim.ESTKF`

— Function`Xa,xa = ESTKF(Xf,HXf,y,R,H,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

and observations `y`

using the ESTKF ensemble scheme.

**Input arguments:**

`Xf`

: forecast ensemble (n x N)`HXf`

: the observation operator applied on the ensemble (product H*Xf)`y`

: observations (m)`R`

: observation error covariance (m x m).`H`

: operator (m x n). Except for the serialEnSRF it is never used and can be empty

**Optional keywords arguments:**

`debug`

: set to true to enable debugging. Default (false) is no debugging.`tolerance`

: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

**Output arguments:**

`Xa`

: the analysis ensemble (n x N)`xa`

: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

`DataAssim.serialEnSRF`

— Function`Xa,xa = serialEnSRF(Xf,HXf,y,R,H,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

and observations `y`

using the serialEnSRF ensemble scheme.

**Input arguments:**

`Xf`

: forecast ensemble (n x N)`HXf`

: the observation operator applied on the ensemble (product H*Xf)`y`

: observations (m)`R`

: observation error covariance (m x m).`H`

: operator (m x n). Except for the serialEnSRF it is never used and can be empty

**Optional keywords arguments:**

`debug`

: set to true to enable debugging. Default (false) is no debugging.`tolerance`

: expected rounding error (default 1e-10) for debugging checks. This is not used if debug is false.

**Output arguments:**

`Xa`

: the analysis ensemble (n x N)`xa`

: the analysis ensemble mean (n)

Notations follows: Sangoma D3.1 http://data-assimilation.net/Documents/sangomaDL3.1.pdf

`DataAssim.local_ETKF`

— Function`Xa,xa = local_ETKF(Xf,H,y,diagR,part,selectObs,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

using the observation `y`

using the local ETKF.

**Inputs:**

`Xf`

: forecast ensemble (n x N)`H`

: observation operator (m x n)`y`

: observation (m x 1)`diagR`

: diagonal of the observation error covariance R (m x 1)`part`

: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain`selectObs`

: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:

` selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );`

or

```
selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));
```

where `x`

and `y`

is the horizontal model grid, `xobs`

and `yobs`

are the locations of the observations and `L`

is a correlation length-scale

**Optional inputs:**

`display`

: if true, then display progress (false is the default)`minweight`

: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)`HXf`

: if non empty, then it is the product`H*Xf`

. In this case,`H`

is not used

**Output:**

`Xa`

: the analysis ensemble (n x N)`xa`

`: the analysis ensemble mean (n x 1)

See also: compact_locfun

`DataAssim.local_EnKF`

— Function`Xa,xa = local_EnKF(Xf,H,y,diagR,part,selectObs,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

using the observation `y`

using the local EnKF.

**Inputs:**

`Xf`

: forecast ensemble (n x N)`H`

: observation operator (m x n)`y`

: observation (m x 1)`diagR`

: diagonal of the observation error covariance R (m x 1)`part`

: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain`selectObs`

: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:

` selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );`

or

```
selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));
```

where `x`

and `y`

is the horizontal model grid, `xobs`

and `yobs`

are the locations of the observations and `L`

is a correlation length-scale

**Optional inputs:**

`display`

: if true, then display progress (false is the default)`minweight`

: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)`HXf`

: if non empty, then it is the product`H*Xf`

. In this case,`H`

is not used

**Output:**

`Xa`

: the analysis ensemble (n x N)`xa`

`: the analysis ensemble mean (n x 1)

See also: compact_locfun

`DataAssim.local_EnSRF`

— Function`Xa,xa = local_EnSRF(Xf,H,y,diagR,part,selectObs,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

using the observation `y`

using the local EnSRF.

**Inputs:**

`Xf`

: forecast ensemble (n x N)`H`

: observation operator (m x n)`y`

: observation (m x 1)`diagR`

: diagonal of the observation error covariance R (m x 1)`part`

: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain`selectObs`

: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:

` selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );`

or

```
selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));
```

where `x`

and `y`

is the horizontal model grid, `xobs`

and `yobs`

are the locations of the observations and `L`

is a correlation length-scale

**Optional inputs:**

`display`

: if true, then display progress (false is the default)`minweight`

: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)`HXf`

: if non empty, then it is the product`H*Xf`

. In this case,`H`

is not used

**Output:**

`Xa`

: the analysis ensemble (n x N)`xa`

`: the analysis ensemble mean (n x 1)

See also: compact_locfun

`DataAssim.local_EAKF`

— Function`Xa,xa = local_EAKF(Xf,H,y,diagR,part,selectObs,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

using the observation `y`

using the local EAKF.

**Inputs:**

`Xf`

: forecast ensemble (n x N)`H`

: observation operator (m x n)`y`

: observation (m x 1)`diagR`

: diagonal of the observation error covariance R (m x 1)`part`

: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain`selectObs`

: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:

` selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );`

or

```
selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));
```

`x`

and `y`

is the horizontal model grid, `xobs`

and `yobs`

are the locations of the observations and `L`

is a correlation length-scale

**Optional inputs:**

`display`

: if true, then display progress (false is the default)`minweight`

: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)`HXf`

: if non empty, then it is the product`H*Xf`

. In this case,`H`

is not used

**Output:**

`Xa`

: the analysis ensemble (n x N)`xa`

`: the analysis ensemble mean (n x 1)

See also: compact_locfun

`DataAssim.local_SEIK`

— Function`Xa,xa = local_SEIK(Xf,H,y,diagR,part,selectObs,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

using the observation `y`

using the local SEIK.

**Inputs:**

`Xf`

: forecast ensemble (n x N)`H`

: observation operator (m x n)`y`

: observation (m x 1)`diagR`

: diagonal of the observation error covariance R (m x 1)`part`

: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain`selectObs`

: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:

` selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );`

or

```
selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));
```

`x`

and `y`

is the horizontal model grid, `xobs`

and `yobs`

are the locations of the observations and `L`

is a correlation length-scale

**Optional inputs:**

`display`

: if true, then display progress (false is the default)`minweight`

: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)`HXf`

: if non empty, then it is the product`H*Xf`

. In this case,`H`

is not used

**Output:**

`Xa`

: the analysis ensemble (n x N)`xa`

`: the analysis ensemble mean (n x 1)

See also: compact_locfun

`DataAssim.local_ESTKF`

— Function`Xa,xa = local_ESTKF(Xf,H,y,diagR,part,selectObs,...)`

Computes analysis ensemble `Xa`

based on forecast ensemble `Xf`

using the observation `y`

using the local ESTKF.

**Inputs:**

`Xf`

: forecast ensemble (n x N)`H`

: observation operator (m x n)`y`

: observation (m x 1)`diagR`

: diagonal of the observation error covariance R (m x 1)`part`

: vector of integer "labels". Every element of the state vector with the same number belong to the same subdomain`selectObs`

: callback routine to select observations with a within a subdomain. As input is takes an integer representing the index of the state vector and returns a vector of weights (m x 1). For example:

` selectObs(i) = exp(- ((x[i] - xobs[:]).^2 + (y(i) - yobs[:]).^2)/L^2 );`

or

```
selectObs(i) = compact_locfun(L,...
sqrt((x[i] - xobs[:]).^2 + (y[i] - yobs[:]).^2));
```

`x`

and `y`

is the horizontal model grid, `xobs`

and `yobs`

are the locations of the observations and `L`

is a correlation length-scale

**Optional inputs:**

`display`

: if true, then display progress (false is the default)`minweight`

: analysis is performed using observations for which weights is larger than minweight. (default 1e-8)`HXf`

: if non empty, then it is the product`H*Xf`

. In this case,`H`

is not used

**Output:**

`Xa`

: the analysis ensemble (n x N)`xa`

`: the analysis ensemble mean (n x 1)

See also: compact_locfun

## Models

`DataAssim.AbstractModel`

— TypeAbstract base-class of models. A model should implement forecast step, tangent-linear and adjoint step

`DataAssim.ModelMatrix`

— Type`ℳ = ModelMatrix(M)`

Linear model defined by the matrix `M`

.

`DataAssim.ModelFun`

— Type`ℳ = ModelFun(nonlinear_forecast,tangent_linear_model,adjoint_model)`

Model defined by the functions `nonlinear_forecast`

,`tangent_linear_model`

and `adjoint_model`

.

`DataAssim.LinShallowWater1DModel`

— Type`ℳ = LinShallowWater1DModel(dt,g,h,L,imax)`

Linear 1D shallow water model.

**Example**

```
dt = 1.
g = 9.81
h = 100
imax = 101
L = 10000
LinShallowWater1DModel(dt,g,h,L,imax)
```

`DataAssim.Lorenz63Model`

— Type`ℳ = Lorenz63Model(dt,σ=10.,β = 8/3.,ρ = 28.)`

Lorenz, 1963 model[1] integrated with a 2nd order Runge-Kutta scheme.

[1] https://doi.org/10.1175/1520-0469(1963)020%3C0130:DNF%3E2.0.CO;2

## Utility functions

`DataAssim.compact_locfun`

— Function` fun = compact_locfun(r)`

Smooth compact localization function at the (scaled) distance `r`

. `fun`

is zero if `r`

> 2 and one if `r`

is 0. (Gaspari et al. (1999), equation 4.10, [1])

[1] http://dx.doi.org/10.1002/qj.49712555417

`DataAssim.talagrand_diagram`

— Function`freq = talagrand_diagram(x,y)`

Compute the frequency of each bins for a Talagrand Diagram where `x`

is the ensemble (array of the size sample × ensemble size) and `y`

are the obervations (sample).

**Example**

```
using PyPlot
Nens = 100
Nsample = 10000
# Ensemble
x = randn(Nsample,Nens)
# Observation
y = randn(Nsample)
freq = talagrand_diagram(x,y)
plot(freq)
plot(ones(size(freq)) / size(freq,1))
ylim(0,ylim()[2])
xlabel("bins")
ylabel("frequency")
```