DIVAnd

# DIVAnd.jl documentation

## API reference

`DIVAnd.diva3d`

— Function`dbinfo = diva3d(xi,x,value,len,epsilon2,filename,varname)`

Create a 3D analysis (or a series of 3D analysis) with DIVAnd using the observations `value`

(vector) at the locations `x`

(tuple of vectors) onto the regular grid defined by the vectors `xi`

using the scaled observational error variance `epsilon2`

and the correlation length `len`

. The result will be saved in the netCDF file `filename`

under the variable `varname`

.

**Inputs**

`xi`

: tuple with n elements. Every element represents a coordinate of the final grid on which the observations are interpolated`x`

: tuple with n elements. Every element represents a coordinate of the observations`value`

: value of the observations`len`

: tuple with n elements. Every element represents the correlation length. If`fitcorrlen`

is`false`

(default), the correlation length should be expressed in meters. If`fitcorrlen`

is`true`

, then`len`

can be the empty tuple`()`

or a tuple containing 3 arrays of normalized (or relative) correlation lengths which will be multiplied by the horizontal and vertical correlation lengths. In the case where`fitcorrlen`

is`true`

and`len`

is provided, the elements of the tuple`len`

are adimensional and mostly of the order of 1. Where the elements of`len`

are less than 1, the correlation length (obtained via fitting, DIVAnd.fithorzlen and DIVAnd.fitvertlen) is effectively decreased and where it is larger than 1 it is increased. This is useful for example to decrease the correlation length locally near steep topography. It is advised to check the range of the scaled correlation length which is printed on the screen while running DIVAnd. The fitted values are also returned in the structure`dbinfo`

. The correlation length fitting can produce irrealistic results for inhomogenous data coverage.`epsilon2`

: error variance of the observations (normalized by the error variance of the background field).

`epsilon2`

can be a scalar (all observations have the same error variance and their errors are decorrelated), a vector (all observations can have a different error variance and their errors are decorrelated) or a matrix (all observations can have a different error variance and their errors can be correlated). If `epsilon2`

is a scalar, it is thus the *inverse of the signal-to-noise ratio*.

`filename`

: the output netCDF filename.`varname`

: the name of the variable (used in the netCDF file).

**Optional input arguments:**

`bathname`

: path to the netCDF bathymetry (default`../../DIVAnd-example-data/Global/Bathymetry/gebco_30sec_16.nc`

relative to this source file)`bathisglobal`

: true (default) is the bathymetry is a global data set`plotres`

: call-back routine for plotting ((timeindex,sel,fit,erri) -> nothing)`timeorigin`

: time origin (default`DateTime(1900,1,1,0,0,0)`

)`moddim`

: modulo for cyclic dimension (vector with n elements). Zero is used for non-cyclic dimensions. Halo points should not be included for cyclic dimensions. For example if the first dimension is cyclic, then the grid point corresponding to`mask[1,j]`

should be between`mask[end,1]`

(left neighbor) and`mask[2,j]`

(right neighbor). The default is [0,0,0],`zlevel`

:`:surface`

(default) for surface analysis and`:floor`

for analysis from the bottom floor.`ncvarattrib`

: dictionary of netCDF variable attributes.`ncglobalattrib`

: dictionary of netCDF global attributes.`transform`

: Anamorphosis transformation function (default:`Anam.notransform()`

).`fitcorrlen`

: true if the correlation length is determined from the observation (default`false`

). Note that the parameter`len`

is interpreted differently when`fitcorrlen`

is set to`true`

.`fithorzcorrlen`

: true if the horizontal correlation length is determined from the observation (default: the value of`fitcorrlen`

) Note that the parameter`len`

is interpreted differently when`fithorzcorrlen`

is set to`true`

.`fitvertcorrlen`

: true if the vertical correlation length is determined from the observation (default: the value of`fitcorrlen`

) Note that the parameter`len`

is interpreted differently when`fitvertcorrlen`

is set to`true`

.`fithorz_param`

: dictionary with additional optional parameters for`fithorzlen`

, for example:`Dict(:smoothz => 200., :searchz => 50.)`

.`fitvert_param`

: dictionary with additional optional parameters for`fitvertlen`

.`distfun`

: function to compute the distance (default`(xi,xj) -> DIVAnd.distance(xi[2],xi[1],xj[2],xj[1])`

).`mask`

: if different from`nothing`

, then this mask overrides land-sea mask based on the bathymetry

(default `nothing`

).

`background`

: if different from`nothing`

, then this parameter allows one

to load the background from a call-back function (default `nothing`

). The call-back functions has the parameters `(x,n,trans_value,trans)`

where `x`

represent the position of the observations, `n`

the time index, `trans_value`

, the observations (possibly transformed) and `trans`

the transformation function. The output of this function is the gridded background field and the observations minus the background field. See also `DIVAnd.backgroundfile`

.

`background_epsilon2_factor`

: multiplication for`epsilon2`

when computing a vertical profile as a background estimate (default: computed internally based on the amount of data). This parameter is not used when the parameter`background`

or`background_lenz`

is provided.`background_lenz`

: vertical correlation for background computation (default 20 m). This parameter is not used when the parameter`background`

is provided.`background_len`

: deprecated option replaced by`background_lenz`

.`filterbackground`

: number of iterations to filter the background profile (default 0, no filtering)`memtofit`

: keyword controlling how to cut the domain depending on the memory remaining available for inversion. It is not the total memory (default 3). Use a large value (e.g. 100) to force the usage for the more efficient direct solver if you are not limited by the amount of RAM memory.`minfield`

: if the analysed field is below`minfield`

, its value is replace by`minfield`

(default -Inf, i.e. no substitution is done).`maxfield`

: if the analysed field is above`maxfield`

, its value is replace by`maxfield`

(default +Inf, i.e. no substitution is done).`saveindex`

: controls if just a subset of the analysis should be saved to the netCDF file. Per default,`saveindex`

is`(:,:,:)`

(corresponding to longitude, latitude and depth indices) meaning that everything is saved. If however, for example the first layer should not be saved then`saveindex`

should be`(:,:,2:length(depthr))`

where`depthr`

is the 3rd element of`xi`

.`niter_e`

: Number of iterations to estimate the optimal scale factor of`epsilon2`

using Desroziers et al. 2005 (doi: 10.1256/qj.05.108). The default is 1 (i.e. no optimization is done).`coeff_derivative2`

(vector of 3 floats): for every dimension where this value is non-zero, an additional term is added to the cost function penalizing the second derivative. A typical value of this parameter is`[0.,0.,1e-8]`

.`surfextend`

: create an additional layer on top of the surface layer so that the effective background error variance is more similar to the deep ocean.`false`

is the default value.`stat_per_timeslice`

(default false): if true, then the residual values (and possibly qcvalues) are also returned by time slices which is useful if the time slices overlap (see example below).`error_thresholds`

(default`[("L1", 0.3), ("L2", 0.5)]`

). This is a list of tuples with the applied error thresholds and the variable names suffixes. If the variable is named e. g.`"Salinity"`

, then the variables`"Salinity_L1"`

(resp.`"Salinity_L2"`

) will be created where the analysis is masked if the relative error exceeds 0.3 (resp. 0.5).`divamethod`

:`DIVAndgo`

(default) or`DIVAndrun`

Any additional keywoard arguments understood by `DIVAndgo`

/`DIVAndrun`

can also be used here (e.g. velocity constrain)

The output is a dictionary with the followings keys:

`:residuals`

: the difference between the observations and the analysis (interpolated linearly to the

location of the observations). The residual is `NaN`

if the observations are not within the domain as defined by the mask and the coordinates of the observations `x`

.

`:qcvalues`

: quality control scores (if activated)

**Example**

Example on how to aggredate the residuals per time slice and to retain the maximum residual:

```
selection_per_timeslice = dbinfo[:selection_per_timeslice]
residuals_per_timeslice = dbinfo[:residuals_per_timeslice]
selection_per_timeslice = dbinfo[:selection_per_timeslice]
max_residuals = fill(-Inf,length(value))
for n = 1:length(selection_per_timeslice)
sel = selection_per_timeslice[n]
max_residuals[sel] = max.(max_residuals[sel],residuals_per_timeslice[n])
end
```

At all vertical levels, there should at least one sea point. The function assumes a spherical Earth with a radius equal to the mean Earth radius.

`DIVAnd.DIVAndrun`

— Function`DIVAndrun(mask,pmn,xi,x,f,len,epsilon2; <keyword arguments>)`

Perform an n-dimensional variational analysis of the observations `f`

located at the coordinates `x`

. The array `fi`

represent the interpolated field at the grid defined by the coordinates `xi`

and the scales factors `pmn`

.

**Input:**

`mask`

: binary mask delimiting the domain. true is inside and false outside.

For oceanographic application, this is the land-sea mask where sea is true and land is false.

`pmn`

: scale factor of the grid. pmn is a tuple with n elements. Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension. For example, in two dimensions,`pmn`

is a tuple`(pm,pn)`

where`pm`

is the inverse of the local resolution in first dimension and`pn`

is the the inverse of the local resolution in second dimension.`xi`

: tuple with n elements. Every element represents a coordinate of the final grid on which the observations are interpolated.`x`

: tuple with n elements. Every element represents a coordinate of the observations.`f`

: value of the observations*minus*the background estimate (vector of`m`

elements where`m`

is the number of observations). See also note.`len`

: tuple with n elements. Every element represents the correlation length for a given dimension.`epsilon2`

: error variance of the observations (normalized by the error variance of the background field).`epsilon2`

can be a scalar (all observations have the same error variance and their errors are decorrelated), a vector (all observations can have a different error variance and their errors are decorrelated) or a matrix (all observations can have a different error variance and their errors can be correlated). If`epsilon2`

is a scalar, it is thus the*inverse of the signal-to-noise ratio*.

**Optional input arguments specified as keyword arguments**

`velocity`

: velocity of the advection constraint. It is a tuple of n arrays and each array represents a single velocity component. The individual array should have the same size as the final grid. The first (second,..) element of the velocity is the velocity compomenent along the first (second,...) dimension. The`velocity`

has the units of a length-scale. If this parameter is derived from ocean currents, then the later must be multiplied by a factor (to be determined for example by cross-validation) and this factor has the units of a time-scale. The default is no-advection constraint.`alpha`

: alpha is vector of coefficients multiplying various terms in the cost function. The first element multiplies the norm. The other i-th element of alpha multiplies the (i+1)-th derivative. Per default, the highest derivative is m = ceil(1+neff/2) where neff is the effective dimension of the problem (the number of dimensions with a nonzero correlation length) and`ceil`

is the ceiling function (rounding up).

```
The values of alpha is the (m+1)th row of the Pascal triangle:
m=0 1
m=1 1 1
m=1 1 2 1 (n=1,2)
m=2 1 3 3 1 (n=3,4)
...
```

`constraints`

: a structure with user specified quandratic constraints (see`DIVAnd_addc`

).`ineqconstraints`

: a structure with user specified inequality constraints such that the analysis`x`

satisfies`Hx >= y0`

. There is no check if the inequality constraints make sense are compatible with each other or with the data. Inequalities will not be satisfied exactly everywhere unless they are already satisfied with a normal analysis. You can increase the number of iterations by increasing`ntriesmax`

.`moddim`

: modulo for cyclic dimension (vector with n elements). Zero is used for non-cyclic dimensions. One should not include a boundary zone (sometimes called a ghost zone or halo) for cyclic dimensions. For example if the first dimension is cyclic, then the grid point corresponding to`mask[1,j]`

should be between`mask[end,j]`

(left neighbor) and`mask[2,j]`

(right neighbor).`fracindex`

: fractional indices (n-by-m array). If this array is specified, then x and xi are not used.`inversion`

: direct solver (`:chol`

for Cholesky factorization), an interative solver (`:pcg`

for preconditioned conjugate gradient [1]) can be used or`:cg_amg_sa`

for a multigrid method with preconditioned conjugate gradient. The two last methods are iterative methods who a controlled by the number of iterations`maxit`

and the tolerance`tol`

.`compPC`

: function that returns a preconditioner for the primal formulation if inversion is set to`:pcg`

. The function has the following arguments:`fun = compPC(iB,H,R)`

where iB is the inverse background error covariance, H the observation operator and R the error covariance of the observation. The function

`compPC`

returns the preconditioner`fun(x,fx)`

computing fx =`M \ x`

(the inverse of M times x) where`M`

is a positive defined symmetric matrix [1]. Effectively, the system E⁻¹ A (E⁻¹)ᵀ (E x) = E⁻¹ b is solved for (E x) where E Eᵀ = M. Ideally, M should this be similar to A, so that E⁻¹ A (E⁻¹)ᵀ is close to the identity matrix.`fi0`

: starting field for iterative primal algorithm (same size as`mask`

).`f0`

: starting field for iterative dual algorithm (same size as the observations`f`

).`operatortype`

: Val{:sparse} for using sparse matrices (default) or Val{:MatFun} or using functions to define the constrains.`scale_len`

: true (default) if the correlation length-scale should be scaled such that the analytical kernel reaches 0.6019072301972346 (besselk(1.,1.)) at the same distance than in 2D. The kernel behaves thus similar to the default kernel in two dimensions (alpha = [1,2,1]).`alphabc`

: numerical value defining how the last grid points are stretched outward. If`alphabc`

is 1, the default value mimics an infinite domain. To have previous behaviour of finite domain use alphabc equal to`0`

.`btrunc`

: if provided defines where to truncate the calculation of the covariance matrix B. Only values up and including alpha[btrunc] will be calculated. If the iterative solution is calculated, the missing terms will be calculated on the fly during the conjugate gradient calculations. Default value is none and full covariance calculation.

**Output:**

`fi`

: the analysed field`s`

: a structure with an array`s.P`

representing the analysed error covariance

**Note:**

If zero is not a valid first guess for your variable (as it is the case for e.g. ocean temperature), you have to subtract the first guess from the observations before calling DIVAnd and then add the first guess back in.

**Example:**

see `DIVAnd_simple_example.jl`

**References**

`DIVAnd.DIVAndgo`

— Function`fi, erri, residuals, qcvalues, scalefactore = DIVAndgo(mask,pmn,xi,x,f,len,epsilon2,errormethod; ...);`

**Input:**

- Same arguments as DIVAndrun with in addition
`errormethod`

: you have the choice between`:cpme`

(clever poorman's method, default method if parameter not provided),`:none`

or`:exact`

(only available if windowed analysis are done with DIVAndrun)`MEMTOFIT=`

: keyword controlling how to cut the domain depending on the memory remaining available for inversion (not total memory)`RTIMESONESCALES=`

: if you provide a tuple of length scales, data are weighted differently depending on the numbers of neighbours they have. See`weight_RtimesOne`

for details`QCMETHOD=`

: if you provide a qc method parameter, quality flags are calculated. See`DIVAnd_cv`

for details`solver`

(default`:auto`

:). :direct for the direct solver or :auto for automatic choice between the direct solver or the iterative solver.`overlapfactor`

: describes how many times the length scale is used for the overlapping. default is 3.3. use lower values ONLY for very good data coverage.

**Output:**

`fi`

: the analysed field`erri`

: relative error field on the same grid as fi. () if errormethod is fixed to`:none`

`residuals`

: array of residuals at data points. For points not on the grid or on land:`NaN`

`qcvalues`

: if`QCMETHOD=`

is provided, the output array contains the quality flags otherwise qcvalues is (). For points on land or not on the grid: 0`scalefactore`

: Desroziers et al. 2005 (doi: 10.1256/qj.05.108) scale factor for`epsilon2`

Perform an n-dimensional variational analysis of the observations `f`

located at the coordinates `x`

. The array `fi`

represent the interpolated field at the grid defined by the coordinates `xi`

and the scales factors `pmn`

.

IMPORTANT: DIVAndgo is very similar to DIVAndrun and is only interesting to use if DIVAndrun cannot fit into memory or if you want to parallelize. (In the latter case do not forget to define the number of workers; see `addprocs`

for example)

`DIVAnd.DIVAnd_averaged_bg`

— Function`fma,faanom = DIVAnd_averaged_bg(mask,pmn,xi,x,f,len,epsilon2,toaverage;moddim=[])`

**Input:**

As for DIVAndrun, including all dimensions before averaging

**additional argument:**

- toaverage: Array of ndims of boolean telling if in the corresponding direction averaging must be done

**Presently NO optional arguments from DIVAndrun supported except moddim**

**Output:**

- fma: Analysis where in the directions where toaverage is true, the same value is found
- faanom: Data anomalies when the analysis is subtracted from the input field.

`DIVAnd.SDNMetadata`

— Function`ncglobalattrib,ncvarattrib = SDNMetadata(metadata,fi)`

Based on the information in the dictionary `metadata`

and the analysed 4D field `fi`

produce a list of NetCDF global and variable attributes for `DIVAnd_save2`

. To list all registered projects call `keys(DIVAnd.PROJECTS)`

. For example:

```
julia> using DIVAnd
julia> keys(DIVAnd.PROJECTS)
Base.KeySet for a Dict{String,Dict{String,String}} with 3 entries. Keys:
"EMODNET-chemistry"
"SeaDataNet"
"SeaDataCloud"
```

`DIVAnd.save`

— Function```
save(filename,xyi,fi,varname;
ncvarattrib = Dict(), ncglobalattrib = Dict(), ...)
```

Save the result of the analysis in a netCDF file.

**Input arguments**

`filename`

: the name of the NetCDF file`xyi`

: tuple with n vectors. Every element in this tuple represents a coordinate of the final grid on which the observations are interpolated`fi`

: the analysed field`varname`

: the name of the NetCDF variable

**Optional arguments:**

`ncglobalattrib`

: a dictionary with the global attributes`ncvarattrib`

: a dictionary with the variable attributes`relerr`

: relative error

`DIVAnd.loadbigfile`

— Function`value,lon,lat,depth,time,obsid = loadbigfile(filename)`

Load data from the text file `filename`

and returns vectors with the value, longitude, latitude, depth and time (as DateTime). A list string identifiers is also returned.

`DIVAnd.checkobs`

— Function```
checkobs(x,v,ids)
checkobs(io::IO,x,v,ids)
```

Print some basic information about the coordinates `x`

(tuple of vector) and values `v`

(vector) having the identifier `ids`

(vector of strings) to check erroneous data. It prints wheter NaNs or Infs are found and the minimum and maximum value.

If the argument `io`

is provided, the information is input/output stream `io`

.

`DIVAnd.smoothfilter`

— Function`ff = smoothfilter(x,f,scale)`

Smooth the function `f`

defined on `x`

by solving the diffusion equation

∂ₜ ϕ = ν ∂²ₓ ϕ

`scale`

is the spatial scales of the removed length-scales. It is defined as 2Tν where T is the integration time.

It uses the Greens functions for 1D diffusion: 1/sqrt(4 π ν t) * exp(-x^2 / (4νt))

`DIVAnd.Anam.loglin`

— Function`trans,invtrans = loglin(t; epsilon = 0.)`

Provide the following transform `log(x + epsilon)`

(for x < t) and its inverse. Beyond the threshold `t`

(x ≥ t), the function is extended linearly in a continous way.

`trans`

,`invtrans`

are scalar functions such that for any `x`

(x > epsilon), `x == invtrans(trans(x))`

.

For any array `X`

, we have: `X == invtrans.(trans.(X))`

.

`DIVAnd.Anam.logit`

— Function`trans,invtrans = logit(; min = 0., max = 1.)`

Provide the logit transform and its inverse. Per default the logit transform maps values within the interval from 0 and 1. This can be changed with the `min`

and `max`

parameters. Note that trans(min) = -∞ and trans(max) = +∞. The use safety-margin might be necessary.

`DIVAnd.divadoxml`

— Function```
DIVAnd.divadoxml(filepath,varname,project,cdilist,xmlfilename;
ignore_errors = false,
WMSlayername = [],
previewindex = 1,
additionalcontacts = [],
additionalvars = Dict{String,Any}(),
default_error_level = "L2",
```

)

Generate the XML metadata file `xmlfilename`

from the NetCDF file `filepath`

(or list of files) with the NetCDF variable `varname`

. Project is either "SeaDataNet", "EMODNET-chemistry" or "SeaDataCloud". `cdilist`

is the file from https://emodnet-chemistry.maris.nl/download/export.zip.

The XML file contains a list of the data the originators. divadoxml will abort with an error if some combinations of EDMO code, local CDI ID are not present in the `cdilist`

. Such errors can be ignored if `ignore_errors`

is set to true. To understand why some EDMO code/local CDI ID could not be found, one can the decompress file from https://emodnet-chemistry.maris.nl/download/export.zip which contains a file called `export.csv`

. This file has the columns `author_edmo`

and `cdi_identifier`

which this function uses to find the data originators (column `originator_edmo`

).

Information can be overridden with the dictionary `additionalvars`

. The keys should corresponds to the template tags found the in `template`

directory. Template tags are the strings inside {{ and }}.

NetCDF_URL should be suppplied since it's a URL of a ZIP file which is usually not from OceanBrowser.

If `filepath`

is a vector of file names, the argument `WMSlayername`

can be provided to give additional information to distinguish between the NetCDF files. The elements of the vector of string will be appended to the description of the WMS layer.

The resulting XML file includes the file names (provided by `filepath`

). Do not change the file names after running this function, otherwise the XML will still contain a reference to the old file names. If you must change the file names please do so before running this script.

If the data is present in a subfolder (e.g. "Winter") later on the OceanBrowser webserver, the `filepath`

should also contain this subfolder (e.g. "Winter/somefile.nc"). The local directories should mirror the directory structure on OceanBrowser. Relative paths should be used, and if the Julia code isn't right above the NetCDF files, use cd("<path>") before each setting the files parameter which use paths relative to this path.

If the products will go into a sub-directory (which is different from the domain name), the part of the path should with provided with the parameter `url_path`

.

The link for the NetCDF download for EMODNET-chemistry for example is the following:

`PROJECTS["EMODNET-chemistry"]["baseurl_http"] * "/" * url_path * "/" * filepath`

where `url_path`

defaults to the the domain name (from the C19 vocabulary) if it is not provided. The script will print the URLs for verification.

`additionalcontacts`

is a list of dictionaries with additional condact information to be added in the XML file. Elements are typically create by the function `DIVAnd.getedmoinfo`

.

`WMSexclude`

is a list of string with NetCDF variables not be included the XML under the WMS layer section.

`default_error_level`

is either L1 or L2 corresponding to the masked variables using relative error threshold 0.3 or 0.5. The default is L2.

**Example**

```
download("https://emodnet-chemistry.maris.nl/download/export.zip","export.zip")
files = [
"Winter (January-March) - 6-year running averages/Water_body_chlorophyll-a.4Danl.nc",
"Spring (April-June) - 6-year running averages/Water_body_chlorophyll-a.4Danl.nc",
"Summer (July-September) - 6-year running averages/Water_body_chlorophyll-a.4Danl.nc",
"Autumn (October-December) - 6-year running averages/Water_body_chlorophyll-a.4Danl.nc"
];
additionalcontacts = [
DIVAnd.getedmoinfo(1977,"originator"), # US NODC for World Ocean Database
DIVAnd.getedmoinfo(4630,"originator"), # CORIOLIS for CORA
]
DIVAnd.divadoxml(files,"Water_body_chlorophyll-a","EMODNET-chemistry","export.zip","test.xml";
ignore_errors = true,
additionalvars = Dict("abstract" => "Here goes the abstract"),
additionalcontacts = additionalcontacts,
WMSlayername = ["winter","spring","summer","autumn"]
)
```

For this function the following global NetCDF attributes are mandatory:

`product_id`

: UUID identifier`parameter_keyword_urn`

or`parameter_keyword`

(or`parameter_keywords`

): P35 code (e.g. "SDN:P35::EPC00007") or preferred label (e.g. "Water body phosphate") respectively. Note the singular form (`parameter_keyword`

) is preferred because there should be only a single P35 parameter per NetCDF file, but singular and plural forms are valid.`search_keywords_urn`

or`search_keywords`

: P02 code or preferred label`area_keywords_urn`

or`area_keywords`

: C19 code or preferred label`institution_edmo_code`

: EDMO code number`product_version`

: version of the product

Adding a global attribute, can be done using the following:

```
ds = NCDatafile("DIVA_file.nc","a")
ds.attrib["attribute_name"] = "attribute value"
close(ds)
```

`DIVAnd.getedmoinfo`

— Function`contact = DIVAnd.getedmoinfo(edmo_code,role)`

Returns a dictionary with the contact information from the EDMO registry based on the prodivided `emdo_code`

. `role`

is the Sextant contact information role, i.e. either "originator" or "author".

`DIVAnd.random`

— Function`field = DIVAnd.random(mask,pmn,len,Nens)`

Create `Nens`

random fields with the correlation length `len`

in a domain with the mask `mask`

and the metric `pmn`

.

See `DIVAnd.DIVAndrun`

for more information about these parameters.

`DIVAnd.distance`

— Function`d = distance(lat1,lon1,lat2,lon2)`

Compute the great-circle distance between the points (`lat1,`

lon1`) and (`

lat2,`lon2`

). The units of all input and output parameters are degrees.

`d = distance([lon1,lat1],[lon2,lat2])`

The same as `distance(lat1,lon1,lat2,lon2)`

but there the arguments are vectors and the order is longitude then latitude.

The units of all input and output parameters are degrees.

`DIVAnd.interp`

— Function`f = interp(xi,fi,x)`

Interpolate field `fi`

(n-dimensional array) defined at `xi`

(tuble of n-dimensional arrays or vectors) onto grid `x`

(tuble of n-dimensional arrays). The grid in `xi`

must be align with the axis (e.g. produced by DIVAnd.ndgrid).

`DIVAnd.backgroundfile`

— Function`fun = backgroundfile(fname,varname)`

Return a function `fun`

which is used in DIVAnd to make anomalies out of observations based relative to the field defined in the NetCDF variable `varname`

in the NetCDF file `fname`

. It is assumed that the NetCDF file has the variable `lon`

, `lat`

and `depth`

. And that the NetCDF variable is defined on the same grid as the analysis.

`fun = backgroundfile(fname,varname,TS)`

Return a function `fun`

which is used in DIVAnd to make anomalies out of observations based relative to the field defined in the NetCDF variable `varname`

in the NetCDF file `fname`

. It is assumed that the NetCDF file has the variable `lon`

, `lat`

and `depth`

. And that the NetCDF variable is defined on the same grid as the analysis and was generated according to the provided time selector `TS`

(TimeSelectorYearListMonthList or TimeSelectorRunningAverage).

At all vertical levels, there should be at least one sea point.

`DIVAnd.Quadtrees.checkduplicates`

— Function`dupl = checkduplicates(x,value,delta,deltavalue)`

Based on the coordinates `x`

(a tuple of longitudes `lons`

, latitudes `lats`

, depths (`zs`

) and times (`times`

vector of `DateTime`

)), search for points which are in the same spatio-temporal bounding box of length `delta`

. `delta`

is a vector with 4 elements corresponding to longitude, latitude, depth and time (in days). `dupl`

a vector of vectors containing the indices of the duplicates.

Observations and coordinates should not be NaN or Inf.

`dupl = checkduplicates(x1,value1,x2,v2,value2,delta,deltavalue)`

Report duplicates of observations in data set (x2,v2) which are also in data set (x1,v1). `x1`

and `x2`

are tuples of vectors with the coordinates, `v1`

and `v2`

are the corresponding values.

Observations and coordinates should not be NaN or Inf.

`DIVAnd.DIVAnd_heatmap`

— Function```
dens,Ltuple,LCV,LSCV = DIVAnd_heatmap(mask,pmn,xi,x,inflation,Labs;Ladaptiveiterations=0,myheatmapmethod="DataKernel",
optimizeheat=true,nmax=1000,otherargs...)
```

Computes a heatmap based on locations of observations using kernel density estimation (probability density field whose integral over the domain is one)

**Input:**

`mask`

: mask as usual`pmn`

: tuple of metrics as usual`xi`

: tuple of coordinates of the grid for the heatmap`x`

: tuple of coordinates of observations`inflation`

: array generally of ones. For some applications an observation can carry a different weight which is then encoded in the array`Labs`

: the length scales for DIVAnd. Here their meaning is the spread (bandwidth) of the observations for the Kernel calculation. If zero is provided, the routine applies an empirical estimate, returned in the Ltuple output.`Ladaptiveiterations`

: adaptive scaling where the length scales are adapted on the data density already estimated. You can iterate. Default "0"`optimizeheat`

: boolean which can turn on or off an algorithmic optimisation. Results should be identical. Default is to optimize`myheatmapmethod`

: can be "Automatic", "GridKernel" or "DataKernel" (Results should be very similar except near boundaries)`nmax`

: maximum number of data points. If actual data size is larger, approximatively nmax superobservations are calculated and a warning issued.`otherargs...`

: all other optional arguments DIVAndrun can take (advection etc)

**Output:**

`dens`

: data density field (integral is one)`Ltuple`

: The bandwidthth used (either the input value or the calculated ones)`LCV`

: Likelihood Cross Validation estimator value (the higher the better) leave one out approach`LSCV`

: Least Square Cross Validation estimator (the lower the better) leave-one-out approach

`DIVAnd.DIVAndfun`

— Function`myfunny=DIVAndfun(x,f;mask=nothing,pmn=nothing,xi=nothing,len=nothing,epsilon2=nothing,kwargs...)`

Provides a simplified interface to DIVAndrun and in return provides an interpolation FUNCTION rather than the gridded field of DIVAndrun

You can use ALL paramters you can use with DIVAndrun, but some of them are made optional here by using keywords.

The necessary input is the tuple of coordinates at which you have data points and the corresponding vector of data.

The output is an interpolation function you can call at any coordinate in a hypercube defined by the bounding box of your input data or the domain explicitely defined by keyword xi

If the user want more control on the grid he needs at least to provide xi, here with option to just provide vectors of coordinates in each direction (only works anyway for this case) so xi can be a tuple of vectors

You can use all keyword parameters of divand

**Input:**

`x`

: tuple of arrays of coordinates`f`

: the value of the function to interpolate at the coordinates x`

**Output:**

`myfunny`

: the interpolation function. If you had two dimensional input (i.e. x was a tuple of two coordinates), you can evaluate the interpolation as myfunny(0.1,0.2) for example

### Bathymetry and spatial-temporal domain

`DIVAnd.load_bath`

— Function`xi,yi,bath = DIVAnd.load_bath(bath_name,isglobal,xi,yi)`

Load the bathymetry from the netCDF file `bathname`

. The parameter `isglobal`

is true if the NetCDF file covers the whole globe and thus the last longitude point can be considered to be right next to the first longitude point. `xi`

and `yi`

are vectors containing the longitude and latitude grid onto which the bathymetry should be interpolated.

`DIVAnd.extract_bath`

— Function`bx,by,b = DIVAnd.extract_bath(bath_name,isglobal,xi,yi)`

Extract the bathymetry from the NetCDF file `bathname`

. The parameter `isglobal`

is true if the NetCDF file covers the whole globe and thus the last longitude point can be considered to be right next to the first longitude point. `xi`

and `yi`

are vectors defining the bounding box of the data. No interpolation is performed.

**Convention:** b is positive in the water and negative in the air.

The NetCDF file is expected to have the one dimensional variables `lon`

and `lat`

with the longitude (degrees East) and latitude (degrees North) and the two dimentional array `bat`

with the digital terrain model (negative in water and positive above water). The order of the dimension should follow be: longitude and then latitude in Column-major ordering (or latitude and then longitude if the tool `ncdump`

is used, which is based on Row-major ordering).

Example of the output of `ncdump -h`

:

```
netcdf gebco_30sec_8 {
dimensions:
lat = 2702 ;
lon = 5400 ;
variables:
double lat(lat) ;
lat:long_name = "Latitude" ;
lat:standard_name = "latitude" ;
lat:units = "degrees_north" ;
double lon(lon) ;
lon:long_name = "Longitude" ;
lon:standard_name = "longitude" ;
lon:units = "degrees_east" ;
float bat(lat, lon) ;
bat:long_name = "elevation above sea level" ;
bat:standard_name = "height" ;
bat:units = "meters" ;
// global attributes:
:title = "GEBCO" ;
}
```

`DIVAnd.load_mask`

— Function`xi,yi,mask = load_mask(bath_name,isglobal,xi,yi,level::Number)`

Generate a land-sea mask based on the topography from the NetCDF file `bathname`

. The parameter `isglobal`

is true if the NetCDF file covers the whole globe and thus the last longitude point can be considered to be right next to the first longitude point. `xi`

and `yi`

are vectors containing the longitude and latitude grid onto which the bathymetry should be interpolated.

**Convention:** in the water, `level`

is positive and in the air `level`

is negative.

`DIVAnd.DIVAnd_metric`

— Function`pm,pn = DIVAnd_metric(lon,lat)`

Compute metric scale factors `pm`

and `pn`

based on the arrays longitude `lon`

and latitude `lat`

. The variables pm and pn represent the inverse of the local resolution in meters using the mean Earth radius.

`DIVAnd.domain`

— Function`mask,(pm,pn),(xi,yi) = DIVAnd.domain(bathname,bathisglobal,lonr,latr)`

Generate a 2D geospatial domain based on the topography from the netCDF file `bathname`

. `lonr`

and `latr`

are expressed in degrees East and North respectively (as are `xi`

and `yi`

). `pm`

and `pn`

are in m⁻¹ (using the the mean Earth radius).

`mask,(pm,pn,po),(xi,yi,zi) = DIVAnd.domain(bathname,bathisglobal,lonr,latr,depthr; zlevel = :surface)`

Generate a 3D geospatial domain based on the topography from the netCDF file `bathname`

. If `zlevel`

is `:surface`

, then `depthr`

is zero for the sea surface and positive in water (positive is down). If `zlevel`

is `:floor`

, then `depthr`

is zero for the sea floor and positive in water (positive is up).

`lonr`

and `latr`

are expressed in degrees East and North respectively (as are `xi`

and `yi`

). `depthr`

and `zi`

are in meters and `pm`

, `pn`

and `po`

are in m⁻¹ (using the the mean Earth radius).

`mask,(pm,pn,po,pp),(xi,yi,zi,ti) = domain(bathname,bathisglobal,lonr,latr,depthr,timer)`

Generate a geospatial domain based on the topography from the netCDF file `bathname`

.

`DIVAnd.DIVAnd_rectdom`

— Function`mask,pmn,xyi = DIVAnd_rectdom(coord1,coord2,...)`

Create a "rectangular" domain in `n`

dimensions with the coordinates `coord1`

`coord2`

... assuming a Catersian metric. This functions returns the mask `mask`

, the coordinates `(xi,yi,...)`

and the metric `(pm,pn...)`

.

For example:

`mask,(pm,pn),(xi,yi) = DIVAnd_rectdom(range(0,stop=1,length=50),linspace(0,stop=1,length=50))`

`DIVAnd.DIVAnd_squaredom`

— Function`mask,pmn,xyi = DIVAnd_squaredom(n,coord)`

Create a "square" domain in `n`

dimensions with the coordinates `coord`

assuming a Cartesian metric. This functions returns the mask `mask`

, the coordinates `(xi,yi,...)`

and the metrics `(pm,pn...)`

.

**Example**

`mask,(pm,pn),(xi,yi) = DIVAnd_squaredom(2,range(0,stop=1,length=50))`

`DIVAnd.TimeSelectorYW`

— Function`TS = TimeSelectorYW(years,yearwindow,monthlists)`

The structure `TS`

handles the time aggregation based on `years`

and `monthlists`

. It is similar to `TimeSelectorYearListMonthList`

except that the elements of `yearlists`

are centred around `years`

and span `yearwindow`

years. `yearlists`

is in fact constructed by adding and subtracting `yearwindow/2`

to every element of years.

`DIVAnd.TimeSelectorYearListMonthList`

— Type`TS = TimeSelectorYearListMonthList(yearlists,monthlists)`

The structure `TS`

handles the time aggregation based on `yearlists`

and `monthlists`

. `yearlists`

is a vector of ranges (containing start and end years), for example `[1980:1989,1990:1999,2000:2009,2010:2019]`

.

`monthlists`

is a vector of two-element vector (containing start and end months), for example `[1:3,4:6,7:9,10:12]`

.

The upper bound of a `yearlist`

and `monthlist`

element is considered inclusive. The range of years of 2010:2019 consideres all years upto and including the year 2009.

If a month range spans beyond December, then all Months must be specified, e.g. example `[2:4,5:6,7:9,[10,11,12,1]]`

or `[2:4,5:6,7:9,[10:12;1]]`

. However using `[2:4,5:6,7:9,10:1]`

(bug!) will result in an empty month range.

**Example**

```
# seasonal climatology using all data from 1900 to 2020
# for winter (December-February), spring, summer, autumn
TS = DIVAnd.TimeSelectorYearListMonthList([1900:2020],[[12,1,2],[3,4,5],[6,7,8],[9,10,11]])
```

Note that for seasonal analyses, DIVAnd will only select observations within the provided year range (and not pick year-1 for December), for example

```
using DIVAnd
TS = DIVAnd.TimeSelectorYearListMonthList([1900:2020],[[12,1,2],[3,4,5],[6,7,8],[9,10,11]])
DIVAnd.select(TS,1,[DateTime(1899,12,31), DateTime(1900,1,1)])
```

This returns `[0,1]`

i.e. the 1st observation is not used, while the second is used. There is no special case for the month 12.

If the data from e.g. December 1899 should be considered for a seasonal analysis for the year 1900-2020, one should shift the observations as follows:

```
obstime_shifted = copy(obstime)
obstime_shifted[Dates.month.(obstime) .== 12] .+= Dates.Year(1)
```

The analysis function should then use `obstime_shifted`

while for the function `saveobs`

is it recommended to use the original `obstime`

vector.

### Load observations

`DIVAnd.saveobs`

— Function```
DIVAnd.saveobs(filename,xy,ids;
type_save = Float32,
timeorigin = DateTime(1900,1,1,0,0,0),
used = trues(size(ids)),
)
```

Save the location and time of the observations in the netCDF file `filename`

and their identifier `ids`

. `xy`

is a tuple with the vectors longitude, latitude, depth and time (as a vector of `DateTime`

).

**Optional arguments:**

`type_save`

: the type to save the data (default Float32). However, the time is always saved as`Float64`

.`timeorigin`

: time origin for the time units attribute (default is

1900-01-01 00:00:00)

`used`

: allows one to subset the data to save only used variables in the netCDF file

```
DIVAnd.saveobs(filename,varname,value,xy,ids;
type_save = Float32,
timeorigin = DateTime(1900,1,1,0,0,0),
used = trues(size(ids)),
chunksize = 10_000,
)
```

Save `value`

and the location and time of the observation in the NetCDF file `filename`

and their identifier `ids`

. `xy`

is a tuple with the vectors longitude, latitude, depth and time (as a vector of `DateTime`

). The values will be saved in the variable called `varname`

.

**Optional arguments:**

`type_save`

: the type to save the data (default Float32). However, the time is always saved as`Float64`

.`timeorigin`

: time origin for the time units attribute (default is

1900-01-01 00:00:00)

`used`

: allows to subset the data to save only used variables in the netCDF file

**Note:** in order to save only the observations used in the interpolation, one can use the `dbinfo`

object as follows:

```
obsused = dbinfo[:used] # returned by diva3d
DIVAnd.saveobs(filename,(obslon,obslat,obsdepth,obstime),obsids,used=obsused)
```

`DIVAnd.loadobs`

— Function```
obsvalue,obslon,obslat,obsdepth,obstime,obsids = loadobs(T,filename,varname)
obsvalue,obslon,obslat,obsdepth,obstime,obsids = loadobs(T,filenames,varname)
```

Load the variable `varname`

from the netCDF file `filename`

(or list of filenames). Coordinates (the netCDF variables "obslon", "obslat", "obsdepth"), time ("obstime") and identifiers ("obsids") will also be loaded. Numeric output arguments will have the type `T`

.

`DIVAnd.NCSDN.load`

— Function```
obsvalue,obslon,obslat,obsdepth,obstime,obsids = load(T,
fname,param; qualityflags = [GOOD_VALUE, PROBABLY_GOOD_VALUE])
```

`data,lon,lat,z,time,ids = SDN.load(T,fnames,param; qualityflags = ...)`

Load all data in the vector of file names `fnames`

corresponding to the parameter `param`

as the data type `T`

. Only the data with the quality flags `SDN.good_data`

and `SDN.probably_good_data`

are loaded per default. The output parameters correspond to the data, longitude, latitude, depth, time (as `DateTime`

) and an identifier (as `String`

).

`DIVAnd.NCSDN.loadvar`

— Function```
data = loadvar(ds,param;
fillvalue::T = NaN,
qualityflags = [GOOD_VALUE, PROBABLY_GOOD_VALUE],
qfname = param * QC_SUFFIX,
)
```

Load the netCDF variable `param`

from the NCDataset `ds`

. Data points not having the provide quality flags will be masked by `fillvalue`

. `qfname`

is the netCDF variable name for the quality flags.

`DIVAnd.NCODV.load`

— Function```
obsvalue,obslon,obslat,obsdepth,obstime,obsids = NCODV.load(T,fname,long_name;
qv_flags = ["good_value","probably_good_value"],
nchunk = 10
```

)

Load all profiles in the file `fname`

corresponding to netCDF variable with the `long_name`

attribute equal to the parameter `long_name`

. `qv_flags`

is a list of strings with the quality flags to be kept. The filtering of the quality flags is applied to the data variables, time and depth coordinates. `obsids`

is a vector of strings with the EDMO code and local CDI id concatenated by a hyphen.

`nchunk`

is the number of profiles read at a time. Large values of `nchunk`

can increase performance but requirer also more memory.

The variable with the following standard_name should exits:

- longitude
- latitude
- time

As well as the variable with the following long_name:

- LOCAL_CDI_ID
- EDMO_code or EDMO_CODE
- Depth

A guide how to export NetCDF files from ODV is available here

`DIVAnd.ODVspreadsheet.loaddata`

— Function`data = loaddata(sheet,profile,locname,fillvalue; fillmode = :repeat)`

Load a single column referred by the local name `locname`

in the profile `profile`

from the ODV spreadsheet `sheet`

. Empty values are either replaced by `fillvalue`

(if fillmode is :fill) or the previous value if repeated (if fillmode is :repeat)

`DIVAnd.ODVspreadsheet.parsejd`

— Function`dt = parsejd(t)`

Convert a Chronological Julian Day Number to a DateTime object. The reference value is taken from Chronological Julian Date

From the SDN standard: "A real number representing the Chronological Julian Date, which is defined as the time elapsed in days from 00:00 on January 1 st 4713 BC. ... "

The time origin is *not* noon (12:00) on Monday, January 1, 4713 BC as for the Julia Date Number.

`DIVAnd.ODVspreadsheet.myparse`

— Function`v = myparse(T,s, i)`

Parse the string `s`

as a type `T`

. Unlike Julia's parse function an error message contains the string `s`

(which could not be parsed) for debugging.

### Parameter optimization

`DIVAnd.DIVAnd_cv`

— Function`bestfactorl,bestfactore, cvval,cvvalues, x2Ddata,y2Ddata,cvinter,xi2D,yi2D = DIVAnd_cv(mask,pmn,xi,x,f,len,epsilon2,nl,ne,method;...);`

Performs a cross validation to estimate the analysis parameters (correlation length and signal-to-noise ratio).

**Input**

Same as for `DIVAndrun`

with three more parameters `nl`

,`ne`

and `method`

`mask`

: binary mask delimiting the domain. true is inside and false outside.

For oceanographic application, this is the land-sea mask.

`pmn`

: scale factor of the grid. pmn is a tuple with n elements. Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension.`xi`

: tuple with n elements. Every element represents a coordinate of the final grid on which the observations are interpolated`x`

: tuple with n elements. Every element represents a coordinate of the observations`f`

: value of the observations*minus*the background estimate (m-by-1 array). (see note)`len`

: correlation length`epsilon2`

: error variance of the observations (normalized by the error variance of the background field).`epsilon2`

can be a scalar (all observations have the same error variance and their errors are decorrelated), a vector (all observations can have a difference error variance and their errors are decorrelated) or a matrix (all observations can have a difference error variance and their errors can be correlated). If`epsilon2`

is a scalar, it is thus the*inverse of the signal-to-noise ratio*.`nl`

: number of testing points around the current value of L.`1`

means one additional point on both sides of the current L.`0`

is allowed and means the parameter is not optimised.`ne`

: number of testing points around the current value of epsilon2.`0`

is allowed as for`nl`

`method`

: cross validation estimator method 1: full CV 2: sampled CV 3: GCV 0: automatic choice between the three possible ones, default valueOptional input arguments specified via keyword arguments are the same as for

`DIVAnd`

**Output:**

`bestfactorl`

: best estimate of the multiplication factor to apply to len`bestfactore`

: best estimate of the multiplication factor to apply to epsilon2`cvvales`

: the cross validation values calculated`factors`

: the tested multiplication factors`cvinter`

: interpolated cv values for final optimisation`X2Data, Y2Data`

: coordinates of sampled cross validation in`L,epsilon2`

space . Normally only used for debugging or plotting`Xi2D, Yi2D`

: coordinates of interpolated estimator . Normally only used for debugging or plotting

The output `bestfactorl`

and `bestfactore`

represent multiplication factors which should be applied to `L`

and `epsilon2`

.

The `len`

and `epsilon2`

provided should be close the real one as the tests will be performed around.

`DIVAnd.empiriccovar`

— Function```
distx,covar,corr,varx,count = empiriccovar(x,v,distbin,mincount;
maxpoints = 10000,
distfun = (xi,xj) -> sqrt(sum(abs2,xi-xj)))
```

Compute the covariance, correlation and variance of a cloud of data points with the value `v`

(a vector) and the location `x`

(a tuple of vectors) grouped by distance. Random pairs are choosen and grouped by their distance (computed by `distfun`

) in bins defined by `distbin`

. The function try to fill at least `mincount`

of data points in each bin but always stop after considering `maxpoints`

pairs.

`DIVAnd.fithorzlen`

— Function`lenxy,dbinfo = DIVAnd.fithorzlen(x,value,z)`

Determines the horizontal correlation length `lenxy`

based on the measurements `value`

at the location `x`

(tuple of 3 vectors corresponding to longitude, latitude and depth) at the depth levels defined in `z`

.

Optional arguments:

`smoothz`

(default 100): spatial filter for the correlation scale`searchz`

(default 50): vertical search distance (can also be a function of the depth)`maxnsamp`

(default 5000): maximum number of samples`limitlen`

(default false): limit correlation length by mean distance between observations`limitfun`

(default no function): a function with two arguments (depth and

estimated correlation length) which returns an adjusted correlation length. For example to force the correlation length to be between 300 km and 50 km one would use the following: `limitfun = (z,len) -> max(min(len,300_000),10_000))`

. If provided `limitfun`

is used before and after the smoothing.

`epsilon2`

(default is a vector of the same size as`value`

with all elements equal to 1): the relative error variance of the observations. Less reliable observations would have a larger corresponding value.`distfun`

: function computing the distance between the points`xi`

and`xj`

.

Per default it represents the Euclidian distance.

`DIVAnd.fitvertlen`

— Function`lenz,dbinfo = DIVAnd.fitvertlen(x,value,z,...)`

See also DIVAnd.fithorzlen

`DIVAnd.lengraddepth`

— Function```
RL = lengraddepth(pmn,h, L;
h2 = h,
hmin = 0.001
)
```

Create the relative correlation length-scale field `RL`

based on the bathymetry `h`

and the metric `pmn`

(tuple of arrays). Effectively the correlation-length scale is close to zero if the relative bathymetry gradients (|∇h|/h) are smaller than the length-scale `L`

(in consistent units as `pmn`

).

R_L = 1 / (1 + L |∇h| / max(h2,hmin))

Per default `h2`

is equal to `h`

. The depth `h`

must be positive. `hmin`

must have the same units as h (usually meters).

`DIVAnd.DIVAnd_cvestimator`

— Function`theta = DIVAnd_cvestimator(s,residual)`

Computes the cross validation estimator $(d-\hat{d})^T \mathbf R^{-1} (d-\hat{d}) / ( \mathbf 1^T \mathbf R^{-1} \mathbf 1)$ where the $\hat{d}$ is the analysis not using a data point.

`DIVAnd.weight_RtimesOne`

— Function` weights = weight_RtimesOne(x,len)`

Compute the weight of the observations at location `x`

to reduce the influence of locally clustered data. `x`

is a tuple with n elements: every element represents a coordinate of the observations. `len`

is a tuple of arrays representing the correlation length. `len[i]`

is the correlation length in the i-th dimension.

`DIVAnd.Rtimesx!`

— Function`Rtimesx!(coord,LS,x,Rx)`

Gaussian type `R`

matirx in `ndim`

dimensions applied to vector `x`

of length `ndata`

. The Gaussian scale differs in each direction `k`

: `LS[k]`

Coordinates of point i are `coord[i,1],coord[i,2],...,coord[i,ndim]`

To avoid an ndata² complexity a grid is set up first so as to allow only the calculation of covariances when distances are smaller than `3*LS`

Adapted from `DIVA3D/src/Fortran/Util/Rtimesx_weighting.f90`

### Vocabulary

`DIVAnd.Vocab.@urn_str`

— Macro`urn"SDN:x:y:z'`

Resolve a SeaDataNet URN (Uniform Resource Name) using https://www.seadatanet.org/urnurl/

`DIVAnd.Vocab.CFVocab`

— Type```
collection = Vocab.CFVocab()
collection = Vocab.CFVocab(url = url)
```

Create a Dict-like object represeting the NetCDF CF Standard Name vocabulary. If the `url`

is not provided then current CF Standard Name list http://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml is used. Individual standard names are retirved by indexing which return an object of the type `CFEntry`

:

```
collection = Vocab.CFVocab()
entry = collection["sea_water_temperature"]
```

`Base.haskey`

— Method`bool = haskey(collection::CFVocab,stdname)`

Return true if `stdname`

is part of the NetCDF CF Standard Name vocabulary `collection`

.

`DIVAnd.Vocab.SDNCollection`

— Function`collection = SDNCollection(name)`

Open the SeaDataNet collection with the name `name`

at the URL http://www.seadatanet.org/urnurl/collection/ The collection can be indexed with brackets using the identifier.

```
using DIVAnd
collection = Vocab.SDNCollection("P01")
concept = collection["PSALPR01"]
@show Vocab.prefLabel(concept)
```

`DIVAnd.Vocab.prefLabel`

— Function`s = Vocab.prefLabel(c::Vocab.Concept)`

Return the preferred label of a concept `c`

`s = Vocab.prefLabel(urn::AbstractString)`

Return the preferred label of a concept usings it URN (Uniform Resource Name)

`DIVAnd.Vocab.altLabel`

— Function`s = Vocab.altLabel(c::Vocab.Concept)`

Return the alternative label of a concept `c`

`s = Vocab.altLabel(urn::AbstractString)`

Return the alternative label of a concept usings it URN (Uniform Resource Name)

`DIVAnd.Vocab.notation`

— Function`s = Vocab.notation(c::Vocab.Concept)`

Return the identifier of a concept `c`

`s = Vocab.notation(urn::AbstractString)`

Return the identifier of a concept usings it URN (Uniform Resource Name)

`DIVAnd.Vocab.definition`

— Function`s = Vocab.definition(c::Vocab.Concept)`

Return the definition of a concept `c`

`s = Vocab.definition(urn::AbstractString)`

Return the definition of a concept usings it URN (Uniform Resource Name)

`DIVAnd.Vocab.resolve`

— Function`entry = Vocab.resolve(urn)`

Resolve a SeaDataNet URN (Uniform Resource Name) and returns the corresponding EDMO entry or Vocabulary concept. For example:

`concept = Vocab.resolve("SDN:P021:current:TEMP")`

`DIVAnd.Vocab.find`

— Function`concepts = Vocab.find(c::Concept,name,collection)`

Return a list of related concepts in the collection `collection`

. `name`

can be the string "related", "narrower", "broader".

`DIVAnd.Vocab.description`

— Function```
str = description(entry::CFEntry)
str = canonical_units(entry::CFEntry)
Return the description or the canonical units of the `entry`.
```

`DIVAnd.Vocab.canonical_units`

— Function```
str = description(entry::CFEntry)
str = canonical_units(entry::CFEntry)
Return the description or the canonical units of the `entry`.
```

`DIVAnd.Vocab.splitURL`

— Function`collection,tag,key = Vocab.splitURL(url)`

Split a concept URL into collection, tag and key. url must finishe with a slash.

### Post-processing

`DIVAnd.derived`

— Function```
DIVAnd.derived(filename,varname,new_filename;
error_thresholds = [("L1", 0.3), ("L2", 0.5)],
)
```

Compute derived quantities from a DIVAnd analyse (in particular the deepest value of an analysis) using the NetCDF file `filename`

with the variable `varname`

. The result will be written in `new_filename`

.

See `DIVAnd.diva3d`

for the optional parameter `error_thresholds`

.

**Example**

```
using DIVAnd
filename = "Water_body_dissolved_oxygen_concentration_monthly.nc"
new_filename = "Water_body_dissolved_oxygen_concentration_monthly2.nc"
varname = "Water body dissolved oxygen concentration"
DIVAnd.derived(filename,varname,new_filename)
```

`DIVAnd.cut`

— Function`DIVAnd.cut(filename,varname,filename_cut,polygon_lon,polygon_lat)`

Exclude all grid cells points of the netcdf file `filename`

with the variable `varname`

to the grid cell included in the polygon whose vertices are defined by the vector of longitude values `polygon_lon`

and the vector of latitude values `polygon_lat`

. Values outside of this polygone will be croped (if possible) or masked (if croping is not possible).

Only the NetCDF variable with the dimensions `lon`

and `lat`

will be modified. All other variarables (in particular `obslon`

, `obslat`

, ...) will not be changed.

Example

```
filename2 = "Water_body_dissolved_oxygen_concentration_monthly.nc"
# polygon with the area to retain
polygon_lon = [-20., 20, 20, -19]
polygon_lat = [21, 21, 53., 52.]
varname = "Water body dissolved oxygen concentration"
filename_cut = "cut.nc"
DIVAnd.cut(filename2,varname,filename_cut,polygon_lon,polygon_lat)
```

## Internal API or advanced usage

### State vector

`DIVAnd.statevector`

— Type`sv = statevector((mask1, mask2, ...))`

Initialize structure for packing and unpacking multiple variables given their corresponding land-sea mask.

Input: mask1, mask2,...: land-sea mask for variable 1,2,... Sea grid points correspond to one and land grid points to zero. Every mask can have a different shape.

Output: sv: structure to be used with `pack`

and `unpack`

.

Note: see also `pack`

, `unpack`

`DIVAnd.pack`

— Function`x = pack(sv,(var1, var2, ...))`

Pack the different variables var1, var2, ... into the vector x where `sv`

is a `statevector`

under the control of a mask. Only sea grid points are retained.

Input: sv: structure generated by statevector_init. var1, var2,...: variables to pack (with the same shape as the corresponding masks).

Output: x: vector of the packed elements. The size of this vector is the number of elements of all masks equal to 1.

Notes: If var1, var2, ... have an additional trailing dimension, then this dimension is assumed to represent the different ensemble members. In this case x is a matrix and its last dimension is the number of ensemble members.

`DIVAnd.unpack`

— Function```
var1, var2, ... = unpack(sv,x)
var1, var2, ... = unpack(sv,x,fillvalue)
```

Unpack the vector x into the different variables var1, var2, ... where `sv`

is a `statevector`

.

Input: sv: structure generated by statevector_init. x: vector of the packed elements. The size of this vector is the number of elements equal to 1 in all masks.

Optional input parameter: fillvalue: The value to fill in var1, var2,... where the masks correspond to a land grid point. The default is zero.

Output: var1, var2,...: unpacked variables.

Notes: If x is a matrix, then the second dimension is assumed to represent the different ensemble members. In this case, var1, var2, ... have also an additional trailing dimension.

`DIVAnd.sub2ind`

— Function`ind = statevector_sub2ind(sv,subscripts)`

Compute from a tuple of subscripts the linear index in the packed state vector. The first element of the subscript indicates the variable index and the remaining the spatial subscripts.

`DIVAnd.ind2sub`

— Function`subscripts = ind2sub(sv,index)`

Compute from linear index in the packed state vector a tuple of subscripts. The first element of the subscript indicates the variable index and the remaining the spatial subscripts.

`Base.length`

— Functionnumber of points per node it is always zero for non-leaf nodes

### Constraints

`DIVAnd.DIVAnd_constr_fluxes`

— Function`c = DIVAnd_constr_fluxes(s,topographyforfluxes,fluxes,epsfluxes,pmnin)`

Creates integral constraints for each latitude so that a barotropic correction step leads to an additional flux prescribed.

Input: s: structure topographyforfluxes: tuple of two 2D arrays with the bottom topography used for the flux calculations DO NOT USE NaN in it. If an array is replaced by a scalar zero, the constraint is not used. for fluxes calculated with geostrophy apply g/f to h fluxes: tuple of two arrays of fluxes. The barotropic correction on elevation should be such that Sum over longitude at each latidute of Sum h δ(eta)/δx δx = - fluxes[1] Sum over latitude at each longitude of Sum h δ(eta)/δy δ y = -fluxes[2] WARNING: This has been coded to directly use geostrophy.jl output and flux directions epsfluxes: error variance on constraint. Scaling to be verified pmnin: metrics from the calling routine

Output: c: structure to be used by DIVAnd_addc with the following fields: R (a covariance matrix), H (extraction operator) and yo (specified value for the constrain).

`DIVAnd.DIVAnd_constr_constcoast`

— Function`c = DIVAnd_constr_constcoast(mask,eps2)`

Constrain imposing that the gradients along the coastline defined by `mask`

are close to zero controlled by the parameter `eps2`

which represents the scaled error variance on the gradients.

This constrain is useful to indirectly impose that a stream function does not have a current component perpendicular to the coastline.

### ODV files

`DIVAnd.ODVspreadsheet.listSDNparams`

— Function`p = listSDNparam(ODVData)`

Return a list of SeaDataNet P01 parameters in a ODV spreadsheet `ODVData`

.

`DIVAnd.ODVspreadsheet.load`

— Function```
obsvalue,obslon,obslat,obsdepth,obstime,obsids = load(T,fnames,datanames;
qv_flags = [DIVAnd.ODVspreadsheet.GOOD_VALUE,
DIVAnd.ODVspreadsheet.PROBABLY_GOOD_VALUE],
nametype = :P01,
qvlocalname = "QV:SEADATANET")
```

Load all the profiles from every files listed in the array `fnames`

corresponding to one of the parameter names `datanames`

. If `nametype`

is `:P01`

(default), the datanames are P01 vocabulary names with the SDN prefix. If nametype is `:localname`

, then they are the ODV column header without units.

For example if the column header is `Water body salinity [per mille]`

, then `datenames`

should be `["Water body salinity"]`

. The resulting vectors have the data type `T`

(expect `times`

and `ids`

which are vectors of `DateTime`

and `String`

respectively). Only values matching the quality flag `qv_flags`

are retained. `qv_flags`

is a vector of Strings (based on http://vocab.nerc.ac.uk/collection/L20/current/, e.g. "1" means "good value"). One can also use the constants these constants (prefixed with `DIVAnd.ODVspreadsheet.`

):

`qvlocalname`

is the column name to denote quality flags. It is assumed that the quality flags follow immediately the data column.

constant | value |
---|---|

`NO_QUALITY_CONTROL` | "0" |

`GOOD_VALUE` | "1" |

`PROBABLY_GOOD_VALUE` | "2" |

`PROBABLY_BAD_VALUE` | "3" |

`BAD_VALUE` | "4" |

`CHANGED_VALUE` | "5" |

`VALUE_BELOW_DETECTION` | "6" |

`VALUE_IN_EXCESS` | "7" |

`INTERPOLATED_VALUE` | "8" |

`MISSING_VALUE` | "9" |

`VALUE_PHENOMENON_UNCERTAIN` | "A" |

If the ODV does not contain a semantic header (e.g. for the aggregated ODV files), then local names must be used.

```
julia> data,obslon,obslat,obsdepth,obstime,obsids = DIVAnd.ODVspreadsheet.load(Float64,["data_from_med_profiles_non-restricted_v2.txt"],
["Water body salinity"]; nametype = :localname );
```

In order to read ODV spreasheet containing World Ocean Database file `odvfile`

, one can use a command like:

```
julia> obsval,obslon,obslat,obsdepth,obstime,obsid = ODVspreadsheet.load(Float64,[odvfile],
["Temperature"]; qv_flags=["0", "1"], nametype = :localname, qvlocalname = "QV:WOD");
```

i.e.,

- explicitely specifying the accepted flags
`qv_flags`

- set
`qvlocalname`

as "QV:WOD".

*Note:* no checks are performed to ensure the units are consistent.

` profiles,lons,lats,depths,times,ids = load(T,dir,P01names)`

Load all the ODV files under the directory `dir`

corresponding the one of the parameter names `P01names`

. The resulting vectors have the data type `T`

(expect `times`

and `ids`

which are vectors of `DateTime`

and `String`

, respectively).

No checks are done to ensure the units are consistent.

`DIVAnd.ODVspreadsheet.localnames`

— Function`list = localnames(sheet,P01name)`

Return a list `list`

of all local names mapping to the specified `P01name`

in the ODV spreadsheet `sheet`

without the prefix "SDN:LOCAL:".

`list = localnames(sheet)`

Return a list `list`

of all local names in the ODV spreadsheet `sheet`

without the prefix "SDN:LOCAL:" in the order as they appear in the ODV file.

`DIVAnd.ODVspreadsheet.Spreadsheet`

— TypeDefine composite type that will contain:

- the metadata (dictionary),
- SDN parameter mapping (dictionary)
- the column labels (array) and
- the profiles (array of arrays).

`DIVAnd.ODVspreadsheet.loadprofile`

— Function```
data,data_qv,obslon,obslat,obsdepth,obsdepth_qv,obstime,obstime_qv,EDMO,LOCAL_CDI_ID =
loadprofile(T,sheet,iprofile,dataname; nametype = :P01)
```

Load a `iprofile`

-th profile from the ODV spreadsheet `sheet`

of the parameter `dataname`

. If `nametype`

is `:P01`

(default), the dataname is the P01 vocabulary name with the SDN prefix. If nametype is `:localname`

, then it is the ODV column header. The resulting vectors have the data type `T`

(expect the quality flag and `obstime`

) .

`DIVAnd.ODVspreadsheet.loaddataqv`

— Function`data,data_qv = loaddataqv(sheet,profile,locname,fillvalue; fillmode = :repeat)`

The same as `loaddata`

, but now the quality flag are also loaded.

profile[i][j] is the j-th column of the i-th row of a profile. profile[i,j] is the i-th column of the j-th row of a profile.

`DIVAnd.ODVspreadsheet.SDNparse!`

— Function`SDNparse!(col,fillmode,fillvalue,data)`

Parse the list of String `col`

into the corresponding data type of the vector `data`

. Empty values are either replaced by `fillvalue`

(if fillmode is :fill) or the previous value if repeated (if fillmode is :repeat)

`DIVAnd.ODVspreadsheet.colnumber`

— Function`cn = colnumber(sheet,localname)`

Return the column number `cn`

of the first column with the local name `localname`

(without the prefix "SDN:LOCAL:") in the ODV spreadsheet `sheet`

.

`DIVAnd.ODVspreadsheet.nprofiles`

— Function`n = nprofiles(ODVData)`

Return the number of profiles in a ODV Spreadsheet `ODVData`

loaded by `readODVspreadsheet`

.

### Operators

`DIVAnd.sparse_interp`

— Function`H,out = sparse_interp(mask,I)`

Create interpolation matrix from `mask`

and fractional indexes `I`

.

Input: mask: 0 invalid and 1 valid points (n-dimensional array) I: fractional indexes (2-dim array n by mi, where mi is the number of points to interpolate) Ouput: H: sparse matrix with interpolation coefficients out: true if value outside of grid outbbox: 1 if outise bouding box onland: 1 if point touches land (where mask == 0)

`DIVAnd.sparse_interp_g`

— Functionsparse_interp(x,mask,xi) Interpolate from x onto xi

`DIVAnd.sparse_diff`

— Function`diffx = sparse_diff(sz1,m,cyclic)`

Sparse operator for differentiation along dimension `m`

for "collapsed" matrix of the size `sz1`

. `cyclic`

is true if domain is cyclic along dimension m. `false`

is the default value

`DIVAnd.matfun_trim`

— Function`T = matfun_trim(sz1,m)`

Create an operator which trim first and last row (or column) in The field is a "collapsed" matrix of the size `sz1`

. `m`

is the dimension to trim.

`DIVAnd.matfun_stagger`

— Function`S = matfun_stagger(sz1,m,cyclic)`

Create an operator for staggering a field in dimension m. The field is a "collapsed" matrix of the size sz1.

Input: sz1: size of rhs m: dimension to stagger cyclic: true if domain is cyclic along dimension m. False is the default value

`DIVAnd.matfun_diff`

— FunctionOperator for differentiation.

`diffx = matfun_diff(sz1,m,cyclic)`

Operator for differentiation along dimension m for "collapsed" matrix of the size sz1.

Input: sz1: size of rhs m: dimension to differentiate cyclic: true if domain is cyclic along dimension m. False is the default value

`DIVAnd.matfun_shift`

— FunctionOperator shifting a field in a given dimension.

`function S = matfun_shift(sz1,m,cyclic)`

Operator shifting a field in the dimension m. The field is a "collapsed" matrix of the size sz1.

Input: sz1: size of rhs m: dimension to shift cyclic: true if domain is cyclic along dimension m. False is the default value

### Quadtree

`DIVAnd.Quadtrees.QT`

— Typequadtree (of the higher-dimensional equivalent) T the type of the coordinates TA the type of the attributes N number of dimensions

`DIVAnd.Quadtrees.rsplit!`

— Functionrecursive split

`DIVAnd.Quadtrees.add!`

— Functionsucess = add!(qt,x,attrib,max_cap = 10) Add point `x`

with the attribute `attrib`

to the quadtree `qt`

. `sucess`

is true if `x`

is within the bounds of the quadtree node `qt`

(otherwise false and the point has not been added)

`DIVAnd.Quadtrees.within`

— Function`attribs = within(qt,min,max)`

Search all the points within a bounding box defined by the vectors `min`

and `max`

.

`DIVAnd.Quadtrees.bitget`

— FunctionTest if the n-th bit in a is set. The least significant bit is n = 1.

`DIVAnd.Quadtrees.inside`

— Function`inside(x0,x1,y)`

Returns true of the point `y`

is inside the rectange defined by `x0`

and `x1`

.

```
x1
+----------+
| |
| + |
| y |
+----------+
x0
```

`DIVAnd.Quadtrees.intersect`

— FunctionTest if the rectanges defined by x0,x1 and y0,y1 intersects/overlap

```
x1
+----------+
| |
| +----------+ y1
| | | |
+----------+ |
x0 | |
| |
+----------+
y0
```

`DIVAnd.Quadtrees.split!`

— Functionsplit a single node

### Conjugate gradient

`DIVAnd.conjugategradient`

— Function`x,cgsuccess,niter = conjugategradient(fun!,b)`

Solve a linear system with the preconditioned conjugated-gradient method: A x = b where `A`

is a symmetric positive defined matrix and `b`

is a vector. Equivalently the solution `x`

minimizes the cost function J(x) = ½ xᵀ A x - bᵀ x.

The function `fun!(x,fx)`

computes fx which is equal to `A*x`

. For example:

```
function fun!(x,fx)
fx[:] = A*x
end
```

Note that the following code will NOT work, because a new array `fx`

would be created and it would not be passed back to the caller.

```
function fun!(x,fx)
fx = A*x # bug!
end
```

The function `fun!`

works in-place to reduce the amount of memory allocations.

**Optional input arguments**

`x0`

: starting vector for the interations`tol`

: tolerance on |Ax-b| / |b|`maxit`

: maximum of interations`pc!`

: the preconditioner. The functions`pc(x,fx)`

computes fx = M⁻¹ x (the inverse of M times x) where`M`

is a symmetric positive defined matrix. Effectively, the system E⁻¹ A (E⁻¹)ᵀ (E x) = E⁻¹ b is solved for (E x) where E Eᵀ = M. Ideally, M should this be similar to A, so that E⁻¹ A (E⁻¹)ᵀ is close to the identity matrix. The function`pc!`

should be implemented in a similar way than`fun!`

(see above).

**Output**

`x`

: the solution`cgsuccess`

: true if the interation converged (otherwise false)`niter`

: the number of iterations

`DIVAnd.pc_none!`

— Function`pc_none!(x,fx)`

Dummy call-back function when no preconditioner is used. `fx`

will be equal to `x`

.

`DIVAnd.checksym`

— Function`xAy, yATx = checksym(n,fun!)`

Check if the the function `fun!`

represents a symmetric matrix when applied on random vectors of size `n`

.

### Utility functions

`DIVAnd.DIVAnd_laplacian`

— FunctionCreate the laplacian operator.

`Lap = DIVAnd_laplacian(mask,pmn,nu,iscyclic)`

Form a Laplacian using finite differences assumes that gradient is zero at "coastline"

Input: mask: binary mask delimiting the domain. 1 is inside and 0 outside. For oceanographic application, this is the land-sea mask. pmn: scale factor of the grid. nu: diffusion coefficient of the Laplacian field of the size mask or cell arrays of fields

Output: Lap: sparce matrix represeting a Laplacian

`DIVAnd.DIVAnd_gradient`

— Function`Dx1,Dx2,...,Dxn = DIVAnd.DIVAnd_gradient(operatortype,mask,pmn,iscyclic)`

Form the gradient using finite differences in all n-dimensions. `mask`

is a binary mask delimiting the domain. 1 is inside and 0 outside. For oceanographic application, this is the land-sea mask. `pmn`

is a tuple of arrays with the scale factor of the grid. The output `Dx1,Dx2,...,Dxn`

are sparse matrices represeting a gradient along different dimensions.

`Dx1,Dx2,...,Dxn = DIVAnd.DIVAnd_gradient(operatortype,mask,pmn,iscyclic)`

Form the gradient using finite differences in all n-dimensions. `mask`

is a binary mask delimiting the domain. 1 is inside and 0 outside. For oceanographic application, this is the land-sea mask. `pmn`

is a tuple of arrays with the scale factor of the grid. The output `Dx1,Dx2,...,Dxn`

are sparse matrices represeting a gradient along different dimensions.

`DIVAnd.DIVAnd_obscovar`

— Function`R = DIVAnd_obscovar(epsilon2,m)`

Create a matrix representing the observation error covariance R of size m x m.

If epsilon2 is a scalar, then R = epsilon2 * I If epsilon2 is a vector, then R = diag(epsilon2) If epsilon2 is a matrix, then R = epsilon2

`DIVAnd.DIVAnd_adaptedeps2`

— Function`factor = DIVAnd_adaptedeps2(s,fi);`

**Input:**

`s`

: structure returned by`DIVAndrun`

`fi`

: analysis returned by`DIVAndrun`

**Output:**

`factor`

: multiplicative factor to apply to epsilon2

Using Deroziers adaptive approach provides a multiplicative factor for the current epsilon2 value so that factor*epsilon2 is a better estimate of the R matrix. If you cannot use `DIVAndrun`

but use `DIVAndgo`

, the latter provides automatically this pamater as result.

`DIVAnd_adaptedeps2(yo, residual, diagR, ignoreobs)`

Using Deroziers adaptive approach provides a multiplicative factor for the current epsilon2 value so that factor*epsilon2 is a better estimate of the R matrix.

`yo`

the observations (minus the background), `residual`

the obserations minus the analysis, `diagR`

, the diagonal of the rel. obs. error covariance matrix and `ignoreobs`

is true if an observation is out of the grid or should be ignored for other reasons.

For unscaled R and assuming that the background is zero, Deroziers showed that:

mean((yᵒ - Hxᵃ) ⋅ yᵒ) = ϵ² mean(yᵒ ⋅ yᵒ) = σ² + ϵ²

mean(yᵒ ⋅ yᵒ) / mean((yᵒ - Hxᵃ) ⋅ yᵒ) = σ²/ϵ² + 1 λ = σ²/ϵ² = 1 - mean(yᵒ ⋅ yᵒ) / mean((yᵒ - Hxᵃ) ⋅ yᵒ)

ϵ² / σ² = 1 / λ

`DIVAnd.DIVAnd_diagHKobs`

— Function`diagonalterms = DIVAnd_diagHKobs(s)`

Computes the diagonal terms of the so called hat-matrix HK, using the already solved analysis and it structure s. Warning: might take some time

This version only uses the real data (not those related to additional constraints)

`DIVAnd.DIVAnd_residual`

— Function`dataresidual = DIVAnd_residual(s,fi)`

Computes the generalized residual yo - H xa using the analysis on the grid `fi`

and the solution structure `s`

.

`DIVAnd.DIVAnd_addc`

— Function`s = DIVAnd_addc(s,c)`

Add a constraint `c`

to the cost function defined by `s`

. The structure `s`

is typically created by DIVAnd_background and the contrain `c`

has the following fields: `R`

(a covariance matrix), `H`

(extraction operator) and `yo`

(specified value for the constrain). The added contrain Jc(x) is quadratic and has the following structure.

Jc(x) = (H x - yo)ᵀ R⁻¹ (H x - yo)

`DIVAnd.DIVAnd_erroratdatapoints`

— Function`errorvariance = DIVAnd_erroratdatapoints(s;restrictedlist=[])`

Computes the error at the real data locations using the analysis structure s

If a restricedlist is provided erros are only calculated at the indexes where restricedlist==true

`DIVAnd.DIVAnd_GCVKii`

— FunctionComputes an estimate of the mean value of the diagonal of HK using GCV and the already solved analysisand it structure s

Kii = DIVAnd_GCVKii(s);

`DIVAnd.DIVAnd_fittocpu`

— Function` stepsize,overlapping,isdirect = DIVAnd_fittocpu(Lpmnrange,gridsize,latercsteps,moddim,MEMTOFIT;forcedirect=false,overlapfactor=3.3);`

**Creates a list of windows for subsequent domain decomposition**

**Also calculates already the subsampling steps csteps for the preconditionners**

**Input:**

`Lpmnrange`

: for every dimension the minimum and maximum correlation length scaled by the local resolution (i.e. the product between L and pm (pn,...))`gridsize`

: number of points in each direction (size(mask))`latercsteps`

: coarsening steps used later if a lower resolution model is used for preconditioning.`moddim`

: modulo for cyclic dimension (vector with n elements). Zero is used for non-cyclic dimensions.`memtofit`

: parameter describing how much memory is expected to be available in Gb`forcedirect`

: if true forces direct solver even if iterative solver might allow for larger tiles`overlapfactor`

: describes how many times the length scale is used for the overlapping. default is 3.3. use lower values ONLY for very good data coverage.

**Output:**

`stepsize`

: spatial (and temporal) shift in grid points between subdomains for every dimension (?)`overlapping`

: number of overlapping grid points for every dimension`isdirect`

: true is the direct solver is activated

`DIVAnd.DIVAnd_background`

— Function`s = DIVAnd_background(mask,pmn,Labs,alpha,moddim)`

Form the inverse of the background error covariance matrix with finite-difference operators on a curvilinear grid

**Input:**

- mask: binary mask delimiting the domain. 1 is inside and 0 outside. For oceanographic applications, this is the land-sea mask.
- pmn: scale factor of the grid.
- Labs: correlation length
- alpha: dimensional coefficients for norm, gradient, laplacian,... alpha is usually [1,2,1] in 2 dimensions.

**Output:**

- s: structure containing
- s.iB: inverse of the background error covariance
- s.L: spatially averaged correlation length
- s.n: number of dimensions
- s.coeff: scaling coefficient such that the background variance diag(inv(iB)) is one far away from the boundary.

`DIVAnd.DIVAnd_obs`

— Function`s = DIVAnd_obs(s,xi,x,R,I)`

Include the constrain from the observations. It is assumed that each coordinate depends only on one index. If this is not the case, then matrix I must be provided.

Input: s: structure created by DIVAnd_background xi: coordinates of observations (tuple of vectors) x: coordinates of grid (tuple of arrays) R: obs. error covariance matrix (normalized) I (optional): fractional indexes of location of observation within the grid

Output: s: structure to be used by DIVAnd_factorize

Note: make sure not to mix Float32 and Float64 for DIVAnd_constrain.

`DIVAnd.DIVAnd_bc_stretch`

— Function`DIVAnd.DIVAnd_diagHK`

— FunctionComputes the diagonal terms of the so called hat-matrix HK, using the already solved analysis and it structure s. Warning: might take some time

diagonalterms = DIVAnd_diagHK(s);

`DIVAnd.DIVAnd_kernel`

— Function`mu,K,len_scale = DIVAnd_kernel(n,alpha)`

Return the analytical kernel and normalization factor.

Analytical (normalized) kernels `K`

for infinite domain in dimension `n`

and for coefficients `alpha`

and normalization factor `mu`

.

`K(r)`

is the kernel function (function of the normalized distance `r`

), `len_scale`

is the distance at which `K(len_scale)`

= 0.6019072301972346 (which is besselk(1,1))

`DIVAnd.DIVAnd_residualobs`

— Function`dataresidual = DIVAnd_residualobs(s,fi);`

Computes the residual yo - H xa only at real data points using the analysis. on the grid `fi`

and the solution structure `s`

.

`DIVAnd.DIVAnd_aexerr`

— Function`aexerr,Bref,fa,sa = DIVAnd_aexerr(mask,pmn,xi,x,f,len,epsilon2;...);`

**Input: same as for DIVAndrun**

`mask`

: binary mask delimiting the domain. true is inside and false outside. For oceanographic application, this is the land-sea mask.`pmn`

: scale factor of the grid. pmn is a tuple with n elements. Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension.`xi`

: tuple with n elements. Every element represents a coordinate of the final grid on which the observations are interpolated`x`

: tuple with n elements. Every element represents a coordinate of the observations`f`

: value of the observations*minus*the background estimate (m-by-1 array). (see note)`len`

: correlation length`epsilon2`

: error variance of the observations (normalized by the error variance of the background field).`epsilon2`

can be a scalar (all observations have the same error variance and their errors are decorrelated), a vector (all observations can have a difference error variance and their errors are decorrelated) or a matrix (all observations can have a difference error variance and their errors can be correlated). If`epsilon2`

is a scalar, it is thus the*inverse of the signal-to-noise ratio*.

**Optional input arguments specified as keyword arguments also as for DIVAnd**

**Output:**

`aexerr`

: the almost exact error`Bref`

: the background error for error scaling by background`aexerr./Bref`

`fa`

: the analysis (with low impact fake data): DO NOT USE UNLESS YOU KNOW WHAT YOU ARE DOING`sa`

: the associated structure

Compute a variational analysis of arbitrarily located observations to calculate the almost exact error

`DIVAnd.DIVAnd_cpme`

— Function`cpme = DIVAnd_cpme(mask,pmn,xi,x,f,len,epsilon2;...);`

**Input: Same as for DIVAndrun**

`mask`

: binary mask delimiting the domain. true is inside and false outside.

For oceanographic application, this is the land-sea mask.

`pmn`

: scale factor of the grid. pmn is a tuple with n elements. Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension.`xi`

: tuple with n elements. Every element represents a coordinate of the final grid on which the observations are interpolated`x`

: tuple with n elements. Every element represents a coordinate of the observations`f`

: value of the observations*minus*the background estimate (m-by-1 array). (see note)`len`

: correlation length`epsilon2`

: error variance of the observations (normalized by the error variance of the background field).`epsilon2`

can be a scalar (all observations have the same error variance and their errors are decorrelated), a vector (all observations can have a difference error variance and their errors are decorrelated) or a matrix (all observations can have a difference error variance and their errors can be correlated). If`epsilon2`

is a scalar, it is thus the*inverse of the signal-to-noise ratio*.`keywords`

: undocumented for the moment how to use iterative solver with coarser grid as preconditionner. see`DIVAndjog`

for`csteps`

,`lmask`

and`alphapc`

parameters

**Optional input arguments specified as keyword arguments also as for DIVAnd**

**Output:**

`cpme`

: the clever poor mans error

Perform an n-dimensional variational analysis of the observations `f`

located at the coordinates `x`

. The array `cpme`

represent the error field at the grid defined by the coordinates `xi`

and the scales factors `pmn`

. If you cannot run `DIVAndrun`

you can use `DIVAndgo`

with error field calculation `:cpme`

`DIVAnd.DIVAnd_cpme_go`

— Function`erri = DIVAnd_cpme_go(mask,pmn,xi,x,f,len,epsilon2; ...);`

**Input:**

- Same arguments as DIVAndrun with in addition
`MEMTOFIT=`

: keyword controlling how to cut the domain depending on the memory remaining available for inversion (not total memory)`RTIMESONESCALES=`

: if you provide a tuple of length scales, data are weighted differently depending on the numbers of neighbours they have. See`weight_RtimesOne`

for details

**Output:**

`erri`

: relative error field using the clever poor man's error approach. Result on the same grid as fi.

ONLY USE THIS VERSION IF YOU CANNOT RUN `DIVAndgo`

with `:cmpe`

activated (or directly `DIVAnd_cpme`

if you can run `DIVAndrun`

)

`DIVAnd.DIVAnd_datainboundingbox`

— Function`xn,fn,indexes,Rn = DIVAnd_datainboundingbox(xi,x,f;Rmatrix=())`

**Input:**

`xi`

: tuple with n elements. Every element represents a coordinate of the final grid on which the observations are interpolated

`x`

: tuple with n elements. Every element represents a coordinate of the observations`f`

: value of the observations`Rmatrix`

: error variance of the observations (normalized by the error variance of the background field).`epsilon2`

can be a scalar (all observations have the same error variance and their errors are decorrelated), a vector (all observations can have a difference error variance and their errors are decorrelated) or a matrix (all observations can have a difference error variance and their errors can be correlated). If`epsilon2`

is a scalar, it is thus the*inverse of the signal-to-noise ratio*.

**Output:**

`xn`

: tuple with n elements. Every element represents a coordinate of the observations which falls in the bounding box defined by xi `fn`

: the corresponding data `indexes:`

the indexes in the original array retained `Rn`

: the new error variance

`DIVAnd.DIVAnd_Lpmnrange`

— Function`Lpmnrange = DIVAnd_Lpmnrange(pmn,len);`

**In each direction, searches for the minimum and maximum value of the length scale times the metric in this direction**

**So it basically looks at the worst and the best resolution found in the grid**

**Input:**

`pmn`

: scale factor of the grid. pmn is a tuple with n elements. Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension.`len`

: correlation length

**Output:**

`Lpmnrange`

: Array of range tuples (minimum and maximum of L times metric)

`DIVAnd.DIVAnd_pc_sqrtiB`

— FunctionCompute a preconditioner using the Cholesky decomposition.

`[M1,M2] = DIVAnd_pc_michol(iB,H,R)`

Compute preconditioner matrices M1 and M2 based on the Cholesky decomposition of iB. The matrices H and R are not used. M2 is the transpose of M1 for this preconditioner.

`DIVAnd.DIVAnd_pc_none`

— Function`fun = DIVAnd_pc_none(iB,H,R)`

Dummy function for requiring that no preconditioner is used in DIVAnd.

See also: `diavnd_pc_sqrtiB`

`DIVAnd.DIVAnd_GCVKiiobs`

— Function`Kii = DIVAnd_GCVKiiobs(s)`

Computes an estimate of the mean value of the diagonal of HK using GCV and the already solved analysis and it structure s Only using real data locations.

`DIVAnd.DIVAnd_cutter`

— Function`windowlist,csteps,lmask,alphapc = DIVAnd_cutter(Lpmnrange,gridsize,moddim,MEMTOFIT);`

Creates a list of windows for subsequent domain decomposition. Also calculates already the subsampling steps csteps for the preconditionners as well as the mask lmask to apply to the length scales in the preconditionner, allowing to reduce the problem size

**Input:**

`Lpmnrange`

:`gridsize`

: number of points in each direction (size(mask))`moddim`

:`MEMTOFIT`

`overlapfactor`

: describes how many times the length scale is used for the overlapping. default is 3.3. use lower values ONLY for very good data coverage.

**Output:**

`windowlist`

: vector of tuples (iw1,iw2,isol1,isol2,istore1,istore2,) where`(iw1,iw2)`

correspond to the start and end indices in the (global) grid`(isol1,isol2)`

correspond to the start and end indices solution to be retained in the window (not all is retained due to overlapping) and`(istore1,istore2)`

correspond to the start and end indices of the solution relative to the global grid. They define thus where the local solution has to be stored in the combined global solution.`csteps`

: Array of steps for the coarse grid preconditionner.`csteps`

is zero for the direct solver.`lmask`

: Array of multiplication factors for length scale of preconditionner`alphapc`

: Norm defining coefficients for preconditionner

`DIVAnd.DIVAnd_qc`

— Function`qcvalues = DIVAnd_qc(fi,s,method);`

Perform a quality control of the observations using the interpolated field.

**Input:**

`fi`

: interpolated field from a`DIVAndrun`

execution`s`

: corresponding structure returned by`DIVAnd`

`method`

: optional argument, which describes the method to be used:

1 as for standard cross validation, 3 as for GCV, 4 with CV estimator to be used outside the routine, 5 Poor man's GCV using data instead of random vector, 0 automatic selection of method.

**Output**

`qcvalues`

: quality check values, one for each data point.

The higher the value, the more suspect a data point is. Absolute values of `qcvalues`

might be not robust when analysis parameters are uncertain. The ranking is however quite robust.

If you cannot run `DIVAndrun`

but use `DIVAndgo`

(which does not provide a structure s at the output), the latter provides `qcvalues`

if you call `DIVAndgo`

with a keyword parameter `QCMETHOD=`

`DIVAnd.DIVAnd_solve!`

— FunctionSolve the variational problem.

` fi = DIVAnd_solve(s)`

Derive the analysis based on all contraints included in s and using the observations yo

Input: s: structure created by DIVAnd_factorize fi0: starting point for iterative primal methods f0: starting point for the iterative dual method

btrunc: the value at which the stored value of s.iB was truncated and needs to be completed on the fly using jmBix

Output: fi: analyzed field

`DIVAnd.DIVAnd_sampler`

— Function`samplesteps = DIVAnd_sampler(pmn,len);`

**Defines steps for sub-sampling in the discrete grid which would still allow**

one to resolve the provided lengthscales

**Input:**

`pmn`

: scale factor of the grid. pmn is a tuple with n elements. Every element represents the scale factor of the corresponding dimension. Its inverse is the local resolution of the grid in a particular dimension.`len`

: correlation length

**Output:**

`samplesteps`

: vector of integers with steps in subsampling [1 2 4 1] means every grid point in x direction, every fifth in y etc

`DIVAnd.DIVAndjog`

— FunctionCompute a variational analysis of arbitrarily located observations.

` fi,s = DIVAndjog(mask,pmn,xi,x,f,len,epsilon2,csteps,lmask; alphapc=[1,2,1], otherargs...);`

Perform an n-dimensional variational analysis of the observations `f`

located at the coordinates `x`

. The array `fi`

represent the interpolated field at the grid defined by the coordinates `xi`

and the scales factors `pmn`

.

**Input:**

- Same parameters as for divarun. * Two additional parameters: * csteps: array of ndims values defining the sampling steps for the preconditionner * lmask: array of ndims mutilplications factors for length scales * One additional optional parameter * alphapc: The coefficients for the norm used in the preconditionner

**Output:**

`fi`

: the analysed field`s`

: structure with an array`s.P`

representing the analysed error covariance

`DIVAnd.DIVAnd_background_components`

— Function`iB = DIVAnd_background_components(s,D,alpha; kwargs...)`

Form the different components of the background error covariance matrix. Compute the components of the background error covariance matrix `s.iB_`

and their sum based on alpha (the adimensional coefficients for norm, gradient, laplacian,...).

If the optional arguments contains btrunc, the calculation of iB is limited to the term up and including alpha[btrunc]

`DIVAnd.stats`

— Function`meanx,stdx = stats(sumx,sumx2,N)`

Computes the mean `meanx`

and the standard deviation `stdx`

from the sum (`sumx`

) and the sum of squares (`sumx2`

) from `N`

numbers.

`meanx,meany,stdx,stdy,covar,corr = stats(sumx,sumx2,sumy,sumy2,sumxy,N)`

Computes the mean `meanx`

and the standard deviation `stdx`

from the sum (`sumx`

) and the sum of squares (`sumx2`

) from `N`

numbers and similarly for the variable `y`

. The function computes also the Pearson correlation `corr`

and covariance `covar`

between `x`

and `y`

.

`DIVAnd.statpos`

— Function```
ulon,ulat = statpos(lon,lat)
ulon,ulat = statpos((lon,lat,...))
```

Return unique positions (`ulon`

, `ulat`

) as well their mean, standard deviation and count of the vector of observations `val`

located at the positions `lon`

and `lat`

.

```
(ulon,ulat),meanval,stdval,count = statpos(val,(lon,lat))
(ulon,ulat,...),meanval,stdval,count = statpos(val,(lon,lat,...))
```

Return unique positions (`ulon`

, `ulat`

) as well as their mean, standard deviation and count of the vector of observations `val`

located at the positions `lon`

and `lat`

.

`DIVAnd.blkdiag`

— Functionconcatenate diagonal matrices

`Base.findfirst`

— Function`findfirst(c::Concept,name,collection)`

Return the first related concepts in the collection `collection`

. `name`

can be the string "related", "narrower", "broader".

`DIVAnd.formatsize`

— Functiondisplay size as a string

`DIVAnd.interp!`

— Function`interp!(xi,fi,x,f)`

Interpolate field `fi`

(n-dimensional array) defined at `xi`

(tuble of n-dimensional arrays or vectors) onto grid `x`

(tuble of n-dimensional arrays). The interpolated field is stored in `f`

. The grid in `xi`

must be align with the axis (e.g. produced by DIVAnd.ndgrid).

`DIVAnd.ufill`

— Function`cfilled = ufill(c,valex)`

Replace values in `c`

equal to `valex`

by averages of surrounding points. `valex`

should not be NaN; use `ufill(c,isfinite.(c))`

or `ufill(c,.!isnan.(c))`

instead.

`ufill(c::Array{T,2},mask::AbstractArray{Bool}) where T`

`mask`

is true where `c`

is valid.

`DIVAnd.cgradient`

— Function`hx,hy = cgradient(pmn,h)`

`DIVAnd.fzero`

— Functionfzero(f,x0,x1,eps; maxiter = Inf) find the zero of the function f between x0 and x1 assuming x0 < x1 at a precision eps.

`DIVAnd.localize_separable_grid`

— FunctionDerive fractional indices on a separable grid.

`I = localize_separable_grid(xi,mask,x)`

xi is a tuple of vectors and x and tuple of n-dimensional arrays, e.g.

x1,x2 = ndgrid(2 * collect(1:5),collect(1:6)) x = (x1,x2)

Derive fractional indices where xi are the points (typical discrete observations) to localize in the separable grid `x`

(every dimension in independent on other dimension). The output `I`

is an n-by-m array where n number of dimensions and m number of observations. The corresponding element of I is negative if `xi`

is outside of the grid defined by `x`

.

`DIVAnd.decompB!`

— Functionwork1, work2: size of mask

Symmetric matrix

SB = √(β) (1 + α L)^(nmax / 2) W^{-1}

where W is the volumne of the corresponding grid cell. The background error covariance matrix B is SB W SB

`DIVAnd.varanalysis`

— FunctionVariational analysis similar to 3D-var

Input:

x0: start vector for iteration, at output it is the last state of the iteration. Note that x0 is related to the analysis xa by xa = SB^½ * W^½ * xa

| x + W^½ * SB^½ * H' * (R \ (H * SB^½ * W^½ * x )) - W^½ SB^{½} * H' * (R \ yo) | < tol * s.sv.n / length(yo) * | W^½ SB^{½} * H' * (R \ yo) |

Kernel is the solution of the n-dimensional diffusion equation

∂c/∂t = ∇ ⋅ (D ∇ c)

n-dimensional Green’s function

G(x,x',t) = (4πDt)^(-n/2) exp( - |x -x'|² / (4Dt))

G(x,x',t) = det(D)^(-½) (4π t)^(-n/2) exp( - (x -x')ᵀ D⁻¹ (x -x') / (4t))

http://www.rpgroup.caltech.edu/~natsirt/aph162/diffusion_old.pdf

`DIVAnd.len_harmonize`

— Function`Len = len_harmonise(len,mask)`

Produce a tuple of arrays of the correlation length `len`

which can be either a scalar (homogeneous and isotropic case), a tuple of scalar (homogeneous case) or already a tuple of arrays (general case). The the later case the size of the arrays are veryfied.

`DIVAnd.alpha_default`

— Function`neff, alpha = alpha_default(Labs,alpha)`

Return a default value of alpha.

`DIVAnd.ncfile`

— Function```
DIVAnd_save(ds,filename,xyi,fi,varname;
ncvarattrib = Dict(), ncglobalattrib = Dict(), ...)
```

Save the result of the analysis in a netCDF file .

**Input arguments**

`ds`

: the NetCDF dataset`filename`

: the name of the NetCDF file`mask`

: binary mask delimiting the domain. true is inside and false outside. For oceanographic application, this is the land-sea mask where sea is true and land is false.`xyi`

: tuple with n elements. Every element represents a coordinate of the final grid on which the observations are interpolated`fi`

: the analysed field`varname`

: the name of the NetCDF variable

**Optional arguments:**

`ncglobalattrib`

: a dictionary with the global attributes`ncvarattrib`

: a dictionary with the variable attributes`relerr`

: relative error`timeorigin`

: time origin for the time units attribute (default is 1900-01-01 00:00:00)

`DIVAnd.writeslice`

— Function`writeslice(ncvar, ncvar_relerr, ncvar_Lx, fi, relerr, index)`

Write a slice of data in a netCDF file given by the index `index`

. The variable `relerr`

can be nothing.

`DIVAnd.encodeWMSStyle`

— Functionencode parameters as key-value separated by : and +

`DIVAnd.loadoriginators`

— Function`db = loadoriginators(fname)`

Load the CDI list from the file `fname`

(zip with a csv file, or csv file directly).

`DIVAnd.DIVAnd_integral`

— FunctionComputes an N-dimensional volume integral

DIVAnd_integral(mask,pmn,fieldin)

**Input:**

`mask`

: mask as usual`pmn`

: tuple of metrics as usual`fieldin`

: field of the same dimensions as mask and which is integrated over the domain

**Output:**

`integratedfield`

: The integral

`DIVAnd.DIVAnd_scaleL`

— FunctionComputes a relative length based on the mask, metrics and a density field, typically measuring the observation density calculated with DIVAnd_heatmap

lambda = DIVAnd_scaleL(mask,pmn,dens)

**Input:**

`mask`

: mask as usual`pmn`

: tuple of metrics as usual`dens`

: field of the same dimensions as mask. Higher values of dens will result in lower values of lambda.

**Output:**

`lambda`

: field to be applied to a reference length field. Values are around 1 so some regions will have smaller L and some higher L

## Examples

To run the example, you need to install `PyPlot`

. In the folder `examples`

of DIVAnd, you can run e.g. the example `DIVAnd_simple_example_1D.jl`

by issuing:

```
# cd("/path/to/DIVAnd/examples")
include("test/DIVAnd_simple_example_1D.jl")
```

Replace `/path/to/DIVAnd/`

by the installation directory of DIVAnd which is the output of the following code:

```
using DIVAnd;
joinpath(dirname(pathof(DIVAnd)), "..")
```

### Advection constraint

The functions `DIVAndrun`

, `DIVAndgo`

and `diva3d`

can also use an advection constraint forcing the analysis to align with a vector field (e.g. a velocity field). The velocity field should be a tuple of n-elements. Every element of the tuple is a gridded array (defined at the same location than the target array) representing a single velocity component. For 3D analysis, the order of the dimensions is typically: longitude, latitude and depth. Like-wise the velocity components are zonal, meridional and vertical velocity. The three velocity components has to be scaled by a constant factor to enhance or decrease this constraint. It is recommended that this parameter is tuned by cross-validation. There are no tools currently in `DIVAnd.jl`

to automate this process.

For the two dimensional case, the velocity has just two components as shown in the example below.

```
using DIVAnd, PyPlot
# square domain in 2 dimensions from -1 to 1
mask, (pm, pn), (xi, yi) = DIVAnd_squaredom(2, range(-1, stop = 1, length = 30))
# location of the observations
x = [.4]
y = [.4]
# observed value
f = [1.]
# velocity field and its strength for the advection constrain
strength = 0.5
u = strength * yi
v = -strength * xi
# normalized obs. error variance and correlation length
epsilon2 = 1 / 200
len = 0.2
# call DIVAnd
fi, s = DIVAndrun(mask,(pm,pn),(xi,yi),(x,y),f,len,epsilon2; velocity = (u,v))
# plot the results
subplot(1,2,1)
plot(x,y,"rx")
quiver(xi,yi,u,v)
gca().set_aspect(1)
title("velocity field")
subplot(1,2,2)
plot(x,y,"rx")
pcolor(xi,yi,fi)
gca().set_aspect(1)
title("analysis")
```

## Performance considerations

### Tuning the domain decomposition

The functions `diva3d`

and `DIVAndgo`

split the domain into overlapping sub-domains to reduce the required amount of memory. In some circumstances (in particular few vertical levels), this can unnecessarily degrade the performance. The CPU time of the analysis can be improved by increasing the `diva3d`

option `memtofit`

from 3 (default) to higher values (as long as one does not run out of memory). If this parameter is set to a very high value then the domain decomposition is effectively disabled.

### Multiple CPU system

Per default Julia tries to use all CPUs on your system when doing matrix operations. The number of CPUs is controlled by the call to `BLAS.set_num_threads`

. Using multiple CPUs can result in overhead and it can be beneficial to reduce the number of CPUs:

`BLAS.set_num_threads(2)`

## Debugging message

From Julia 1.0 on, debugging messages can be activated using the following Julia command:

`ENV["JULIA_DEBUG"] = "DIVAnd"`

See also https://docs.julialang.org/en/v1/stdlib/Logging/index.html#Environment-variables-1 .

## Correlation length

The estimation of the correlation length in the function `diva3d`

can be activated with the option `fitcorrlen`

for the horizontal and vertical correlation. The parameter `len`

should then an empty tuple (`()`

) or a tuple of arrays equal to one. The actually used correlation length is a product between the provided values of the array `len`

and the estimated correlation length by fitting. Setting `fitcorrlen`

to true means thus that the interpretation of the parameters changes from absolution correlation length to relative correlation length.

The estimation of the horizontal and vertical correlation can also be activated selectively by just setting `fithorzcorrlen`

and `fitvertcorrlen`

(respectively) to true.

If one wants to not use the vertical correlation length, the one can put the corresponding value in `len`

to zero. Consequently the value of `fitvertcorrlen`

and `fitcorrlen`

should be keep to `false`

(i.e. its default values). Optimizing the horizontal correlation length is still possible by setting `fithorzcorrlen`

to `true`

.

## Integrating different datasets

To facilitated the integrating of different datasets, the function `WorldOceanDatabase.load`

from the module `PhysOcean`

now supports an option `prefixid`

which can be set to `"1977-"`

so that the obsids have automatically the right format for DIVAnd, e.g. `"1977-wod123456789O"`

:

```
using PhysOcean
# assuming the data in the directory "somedir": e.g. "somedir/CTD/file.nc", "somedir/XBT/file.nc"...
basedir = "somedir"
varname = "Temperature"
prefixid = "1977-"
obsvalue,obslon,obslat,obsdepth,obstime,obsid = WorldOceanDatabase.load(Float64,
basedir,varname; prefixid = prefixid);
```

In the module `PhysOcean`

, we implemented the function ARGO.load which can load data following the ARGO format and in particular the CORA dataset. In fact, even if CORA is distributed through CMEMS, the netCDF files in CORA do not follow the same format than the other in situ netCDF files from CMEMS. Therefore the function `CMEMS.load`

can not be used for the CORA dataset. `ARGO.load`

also supports the option prefixid.

```
using Glob, PhysOcean
# assuming the data in the directory "somedir": e.g. "somedir/someyear/file.nc"
filenames = glob("*/*nc","somedir")
obsvalue,obslon,obslat,obsdepth,obstime,obsids = ARGO.load(Float64,
filenames,varname; prefixid = "4630-")
```

In divadoxml we added the new argument additionalcontacts which allows one to acknowledge other datasets which are not in the MARIS database:

```
using DIVAnd
additionalcontacts = [
DIVAnd.getedmoinfo(1977,"originator"), # US NODC for World Ocean Database
DIVAnd.getedmoinfo(4630,"originator"), # CORIOLIS for CORA
]
ignore_errors = true
DIVAnd.divadoxml(
filename,varname,project,cdilist,xmlfilename,
ignore_errors = ignore_errors,
additionalcontacts = additionalcontacts
)
```

You will see a warning that not all observation identifiers could be found, but this is normal and expected.

## Frequently asked questions

### Which data points are used for the analysis?

An individual data point is used if all following conditions are met:

- longitude/latitude is inside the domain and not adjacent to a land point
- the depth is within the depth range of the domain
- the time is within the temporal range
- if an
*anamorphosis*transform is used, it should correspond to a finite transformed value - during the loading, the corresponding quality flag is among the accepted quality flags

Note that for points 1.-3. the finite precision of floating point numbers can affect the results.

### How to resolve a bias of the surface layer (or the deepest layer)?

In DIVAnd, the vertical levels must resolve the vertical correlation length. If the vertical correlation length is smaller than the surface resolution, this can result in a bias of the surface value. A similar problem can also be present at the deepest layer. The solution is to either refine the vertical resolution or to increase the vertical correlation length.

### How do I limit the estimated horizontal and vertical correlation length in DIVAnd?

It can be necessary to limit the estimated correlation length to an acceptable range. The function (called `limitfun`

) can be applied to the estimated correlation to make such adjustment. This function takes as arguments the estimated correlation length and the depth and returns the adjusted correlation length. For example the following function forces the horizontal correlation length to be between 50 km and 200 km (independently of the depth).

```
# len and z are expressed in meters
function mylimitfun(z,len)
if len > 200_000
return 200_000
end
if len < 50_000
return 50_000
end
return len
end
```

(`200_000`

is just a more readable way to write `200000`

). This function is used in `diva3d`

as follow:

```
... = diva3d(...
fithorz_param = Dict(:limitfun => mylimitfun)
```

The same can be achieved more compactly as follows:

```
... = diva3d(...
fithorz_param = Dict(:limitfun => (z,len) -> min(max(len,50_000),200_000)),
fitvert_param = Dict(:limitfun => (z,len) -> min(max(len,20),200)))
```

A similar option has also be added for the vertical correlation length.

### How do I reduce the estimated correlation length near the coast when it is estimated internally?

The actual used correlation lengths is the product between the estimated one (by fitting) and the arrays in the parameter `len`

(if provided). The function `lengraddepth`

can be used to create a reduced correlation length near the bathymetry. (https://github.com/gher-ulg/Diva-Workshops/blob/master/notebooks/5-AdvancedTopics/17-relative-correlation-length.ipynb)

### How can I handle data set of very different resolution?

If data from a high-resolution dataset (e.g. profiling float, dense time series) is combined with data with a low spatial resolution (e.g. profiles from a research vessel), then the analysis can be biased toward the high-resolution data. The function `weight_RtimesOne(x,len)`

can be used to reduce the weight of the high-resolution data (https://github.com/gher-ulg/Diva-Workshops/blob/master/notebooks/13-processing-parameter-optimization.ipynb). Alternative methods are averaging data in bins ("binning") or simply sub-sampling the data.

### My parameter represent a concentration and I get unrealistic negative values

- You can use an anamorphosis transform, in particular
`DIVAnd.Anam.loglin`

. The idea is that the transformed variable is closer to a Gaussian distribution that the original variable. - Use the option
`fieldmin = 0.0`

of`diva3d`

If the parameter `epsilon`

of `DIVAnd.Anam.loglin`

is larger than zero (which is necessary if some measurements are exactly zero), then the smallest value that analysis can have is `-epsilon`

. Therefore the option `fieldmin`

is still required to avoid negative values.

### How can I speed up analysis using observations which always have the same coordinates?

In this situation, the measurements are always done at the same locations, but the measurements are repeated over time or different variables are measured at those positions. It is possible to take into the already computed matrices so that the subsequent analysis can be executed much faster.

Let's assume the observations are available on `np`

locations and repeated `nt`

times, so that `obsval`

is an array of size `np X nt`

.

The first analysis is performed using:

```
@time fi1,s = DIVAndrun((mask),(pm,pn),
(xi,yi),(obslon,obslat),Float64.(obsval[:,1]),len,epsilon2);
```

then, for the `nt`

other analysis, we use the structure `s`

, computed in the previous step, as follows:

```
fpi = s.P * (s.H' * (s.R \ obsval[:,i]))
f_with_mask = unpack(s.sv, fpi, NaN)
```

where i=1, ..., nt.

The `unpack`

function *unpacks* the vector `fpi`

into the different variables var1, var2, ... `s.sv`

is the statevector and `NaN`

is the fill value.

Check the example.

## API changes

We do our best to avoid changing the API, but sometimes it is unfortunately necessary.

- 2021-04-21: When using domain splitting, the average correlation length is computed over all domain and not per subdomain. The API remained the same.
- 2019-06-24:
`DIVAnd.fit_isotropic`

and`DIVAnd.fit`

are removed and replaced by`DIVAnd.fithorzlen`

and`DIVAnd.fitvertlen`

. - 2019-06-24: If the parameters
`background_lenz`

and`background_lenz_factor`

of`diva3d`

are both specified, then preference will now be given for`background_lenz`

. - 2018-07-02: The module
`divand`

has been renamed`DIVAnd`

and likewise functions containing`divand`

- 2018-06-18: The options
`nmean`

and`distbin`

of`fithorzlen`

and`fitvertlen`

have been removed. The functions now choose appropriate values for these parameters automatically.

## Information for developers

To update the documentation locally, install the package `Documenter`

and run the script `include("docs/make.jl")`

.

```
using Pkg
Pkg.add("Documenter")
```

## Troubleshooting

If the installation of a package fails, it is recommended to update the local copy of the package list by issuing `Pkg.update()`

to make sure that Julia knows about the latest version of these packages and then to re-try the installation of the problematic package. For example to retry the installation of `EzXML`

issue the following command:

```
using Pkg
Pkg.update()
Pkg.add("EzXML")
```

### Installation problem of PyPlot on Linux (Debian/Ubuntu)

Make sure that the following Debian/Ubuntu packages are installed:

`sudo apt-get install python3 libpython3 python3-tk`

Then start Julia and run:

```
using Pkg
Pkg.build("PyCall")
Pkg.build("PyPlot")
```

Test PyPlot with:

```
using PyPlot
plot(1:10)
```

### No plotting window appears

If the following command doesn't produce any figure

```
using PyPlot
plot(1:10)
```

A possible solution is to modify the *backend*: this is done by editing the python configuration file matplotlibrc. The location of this file is obtained in python with:

```
import matplotlib
matplotlib.matplotlib_fname
```

Under Linux, this returns `'~/.config/matplotlib/matplotlibrc'`

. To use the `TkAgg`

backend, add the following to the file:

`backend : TkAgg`

The `matplotlibrc`

need to be created if it does not exists.

### C runtime library when calling PyPlot

`R6034 an application has made an attempt to load the C runtime library incorrectly`

on Windows 10 with julia 0.6.1, matplotlib 2.1.0, PyPlot 2.3.2:

`ENV["MPLBACKEND"]="qt4agg"`

You can put this line in a file `.juliarc.jl`

placed in your home directory (the output of `homedir()`

in Julia).

### Julia cannot connect to GitHub on Windows 7 and Windows Server 2012

Cloning the package registery or downloading a Julia packages fails with:

`GitError(Code:ECERTIFICATE, Class:OS, , user cancelled certificate checks: )`

The problem is that Windows 7 and Windows Server 2012 uses outdated encryption protocols. The solution is to run the "Easy fix" tool from the Microsoft support page

### MbedTLS.jl does not install on Windows 7

The installation of `MbedTLS.jl`

fails with the error message:

```
INFO: Building MbedTLS
Info: Downloading https://github.com/quinnj/MbedTLSBuilder/releases/download/v0.6/MbedTLS.x86_64-w64-mingw32.tar.gz to C:\Users\Jeremy\.julia\v0.6\MbedTLS
\deps\usr\downloads\MbedTLS.x86_64-w64-mingw32.tar.gz...
Exception setting "SecurityProtocol": "Cannot convert null to type "System.Net.SecurityProtocolType" due to invalid enumeration values. Specify one of th
e following enumeration values and try again. The possible enumeration values are "Ssl3, Tls"."
At line:1 char:35
+ [System.Net.ServicePointManager]:: <<<< SecurityProtocol =
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : PropertyAssignmentException
[...]
```

See also the issue https://github.com/JuliaWeb/MbedTLS.jl/issues/133.

The solution is to install the Windows Management Framework 4.0.

### EzXML.jl cannot be installed on RedHat 6

The `zlib`

library of RedHat 6, is slightly older than the library which `EzXML.jl`

and `libxml2`

requires.

To verify this issue, you can type in Julia

```
using Libdl
using Pkg
Libdl.dlopen(joinpath(Pkg.dir("EzXML"),"deps/usr/lib/libxml2.so"))
```

It should not return an error message. On Redhat 6.6, the following error message is returned:

```
ERROR: could not load library "/home/username/.../EzXML/deps/usr/lib/libxml2.so"
/lib64/libz.so.1: version `ZLIB_1.2.3.3' not found (required by /home/.../EzXML/deps/usr/lib/libxml2.so)
Stacktrace:
[1] dlopen(::String, ::UInt32) at ./libdl.jl:97 (repeats 2 times)
```

A newer version `zlib`

can be installed by the following command:

```
using Pkg
Pkg.add("CodecZlib")
```

However, the following command should work:

` LD_LIBRARY_PATH="$HOME/.julia/full/path/to/CodecZlib/.../deps/usr/lib/:$LD_LIBRARY_PATH" julia --eval 'print(Libdl.dlopen(joinpath(Pkg.dir("EzXML"),"deps/usr/lib/libxml2.so"))'`

by replacing the file path appropriately. (see also https://github.com/JuliaLang/julia/issues/7004, https://github.com/JuliaIO/HDF5.jl/issues/97, and https://github.com/bicycle1885/EzXML.jl/issues/102)

To make Julia use this library, a user on RedHat 6 should always start Julia with:

`LD_LIBRARY_PATH="$HOME/.julia/full/path/to/CodecZlib/.../deps/usr/lib/:$LD_LIBRARY_PATH" julia`

One can also create script with the following content:

```
#!/bin/bash
export LD_LIBRARY_PATH="$HOME/.julia/full/path/to/CodecZlib/.../deps/usr/lib/:$LD_LIBRARY_PATH"
exec /path/to/bin/julia "$@"
```

by replacing `/path/to/bin/julia`

to the full path of your installation directory. The script should be marked executable and it can be included in your Linux search `PATH`

environment variable. Julia can then be started by calling directly this script.

### The DIVAnd test suite fails with `automatic download failed`

Running `using Pkg; Pkg.test("DIVAnd")`

fails with the error:

`automatic download failed (error: 2147500036)`

The test suite will download some sample data. You need to have internet access and run the test function from a directory with write access.

You can change the directory to your home directory with the Julia command `cd(homedir())`

.

You can check the current working directory with:

`pwd()`

### Convert error in `DIVAnd_obs`

The full error message:

```
MethodError: Cannot `convert` an object of type DIVAnd.DIVAnd_constrain{Float32,Diagonal{Float64},SparseMatrixCSC{Float64,Int64}} to an object of type DIVAnd.DIVAnd_constrain{Float64,TR,TH} where TH<:(AbstractArray{#s370,2} where #s370<:Number) where TR<:(AbstractArray{#s371,2} where #s371<:Number)
This may have arisen from a call to the constructor DIVAnd.DIVAnd_constrain{Float64,TR,TH} where TH<:(AbstractArray{#s370,2} where #s370<:Number) where TR<:(AbstractArray{#s371,2} where #s371<:Number)(...),
since type constructors fall back to convert methods.
```

The solution is to use the same type of all input parameters: all Float32 or all Float64.

### Monthlist issue

Using comments inside list can lead to unexpected results.

This

```
monthlist = [
[1,2,3]
#[4,5,6]
]
```

should be written as

```
monthlist = [
[1,2,3]
]
```

### Error in the factorisation

The error message `Base.LinAlg.PosDefException(95650)`

followed by the stack-trace below might be due to a wrong choice in the analysis parameters, for example a too long correlation length.

```
Stacktrace:
[1] #cholfact!#8(::Float64, ::Function, ::Base.SparseArrays.CHOLMOD.Factor{Float64}, ::Base.SparseArrays.CHOLMOD.Sparse{Float64}) at ./sparse/cholmod.jl:1360
.................
[9] DIVAndrun(::BitArray{3}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Tuple{Array{Float64,1},Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Float64) at /home/ctroupin/.julia/v0.6/DIVAnd/src/DIVAndrun.jl:147
```

### DimensionMismatch when running `diva3d`

You get an error like

`DimensionMismatch("tried to assign 201×201 array to 202×201 destination")`

This type of error might be due to the reading of the bathymetry: if you work with a regional bathymetry (for instance not with GEBCO), you should set the option `bathisglobal`

to false.

When `bathisglobal = true`

, the longitude is supposed to *wrap around* (the last element of the lon should be right before the first element of lon), thus the dimension mismatch.

### Installing additional packages when using a git clone

If `DIVAnd`

is installed without the package manager, it can be necessary to install additional packages. This will be explicitly shown, for example:

```
LoadError: ArgumentError: Module Roots not found in current path.
Run `Pkg.add("Roots")` to install the Roots package.
```

### Kernel not working with IJulia/Jupyter under Julia 0.7 Windows

Try these commands

```
using Pkg
Pkg.add("ZMQ")
Pkg.add("IJulia")
Pkg.update()
```