`DINCAE.DINCAE`

— ModuleDINCAE (Data-Interpolating Convolutional Auto-Encoder) is a neural network to reconstruct missing data in satellite observations.

For most applications it is sufficient to call the function `DINCAE.reconstruct`

directly.

The code is available at: https://github.com/gher-uliege/DINCAE.jl

`DINCAE.NCData`

— Method```
dd = NCData(lon,lat,time,data_full,missingmask,ndims;
train = false,
obs_err_std = 1.,
jitter_std = 0.05)
```

Return a structure holding the data for training (`train = true`

) or testing (`train = false`

) the neural network. `obs_err_std`

is the error standard deviation of the observations. The variable `lon`

is the longitude in degrees east, `lat`

is the latitude in degrees north, `time`

is a DateTime vector, `data_full`

is a 3-d array with the data and `missingmask`

is a boolean mask where true means the data is missing. `jitter_std`

is the standard deviation of the noise to be added to the data during training.

`DINCAE.interp_adjn!`

— Method`all positions should be within the domain. exclusive upper bound`

for all i and n (1 <= pos[i][n] < sz[n])

`DINCAE.interpnd!`

— Methodinterpnd! and interp_adj! are adjoints

vec must be zero initially

`DINCAE.load_gridded_nc`

— Method`lon,lat,time,data,missingmask,mask = load_gridded_nc(fname,varname; minfrac = 0.05)`

Load the variable `varname`

from the NetCDF file `fname`

. The variable `lon`

is the longitude in degrees east, `lat`

is the latitude in degrees north, `time`

is a DateTime vector, `data_full`

is a 3-d array with the data, `missingmask`

is a boolean mask where true means the data is missing and `mask`

is a boolean mask where true means the data location is valid, e.g. sea points for sea surface temperature.

At the bare-minimum a NetCDF file should have the following variables and attributes:

```
netcdf file.nc {
dimensions:
time = UNLIMITED ; // (5266 currently)
lat = 112 ;
lon = 112 ;
variables:
double lon(lon) ;
double lat(lat) ;
double time(time) ;
time:units = "days since 1900-01-01 00:00:00" ;
int mask(lat, lon) ;
float SST(time, lat, lon) ;
SST:_FillValue = -9999.f ;
}
```

`DINCAE.load_gridded_nc`

— Methodthe first variable is for isoutput is none specified

`DINCAE.reconstruct`

— Method`reconstruct(Atype,data_all,fnames_rec;...)`

Train a neural network to reconstruct missing data using the training data set and periodically run the neural network on the test dataset. The data is assumed to be available on a regular longitude/latitude grid (which is the case of L3 satellite data).

**Mandatory parameters**

`Atype`

: array type to use`data_all`

: list of named tuples. Every tuple should have`filename`

and`varname`

.

`data_all[1]`

will be used for training (and perturbed to prevent overfitting). All others entries `data_all[2:end]`

will be reconstructed using the training network at the epochs defined by `save_epochs`

.

`fnames_rec`

: vector of filenames corresponding to the entries`data_all[2:end]`

**Optional parameters:**

`epochs`

: the number of epochs (default`1000`

)`batch_size`

: the size of a mini-batch (default`50`

)`enc_nfilter_internal`

: number of filters of the internal encoding layers (default`[16,24,36,54]`

)`skipconnections`

: list of layers with skip connections (default`2:(length(enc_nfilter_internal)+1)`

)`clip_grad`

: maximum allowed gradient. Elements of the gradients larger than this values will be clipped (default`5.0`

).`regularization_L2_beta`

: Parameter for L2 regularization (default`0`

, i.e. no regularization)`save_epochs`

: list of epochs where the results should be saved (default`200:10:epochs`

)`is3D`

: Switch to apply 2D (`is3D == false`

) or 3D (`is3D == true`

) convolutions (default`false`

)`upsampling_method`

: interpolation method during upsampling which can be either`:nearest`

or`:bilinear`

(default`:nearest`

)`ntime_win`

: number of time instances within the time window. This number should be odd. (default`3`

)`learning_rate`

: initial learning rate of the ADAM optimizer (default`0.001`

)`learning_rate_decay_epoch`

: the exponential decay rate of the learning rate. After`learning_rate_decay_epoch`

the learning rate is halved. The learning rate is computed as`learning_rate * 0.5^(epoch / learning_rate_decay_epoch)`

.`learning_rate_decay_epoch`

can be`Inf`

for a constant learning rate (default)`min_std_err`

: minimum error standard deviation preventing a division close to zero (default`exp(-5) = 0.006737946999085467`

)`loss_weights_refine`

: the weigh of the individual refinement layers using in the cost function.

If `loss_weights_refine`

has a single element, then there is no refinement. (default `(1.,)`

)

Note that also the optional parameters should be to tuned for a particular application.

`DINCAE.reconstruct_points`

— Method`DINCAE.reconstruct_points(T,Atype,filename,varname,grid,fnames_rec )`

Mandatory parameters:

`T`

:`Float32`

or`Float64`

: float-type used by the neural network`Array{T}`

or`KnetArray{T}`

: array-type used by the neural network.`filename`

: NetCDF file in the format described below.`varname`

: name of the primary variable in the NetCDF file.`grid`

: tuple of ranges with the grid in the longitude and latitude direction e.g.`(-180:1:180,-90:1:90)`

.`fnames_rec`

: NetCDF file names of the reconstruction.

Optional parameters:

`jitter_std_pos`

: standard deviation of the noise to be added to the position of the observations (default`(5,5)`

)`auxdata_files`

: gridded auxiliary data file for a multivariate reconstruction.`auxdata_files`

is an array of named tuples with the fields (`filename`

, the file name of the NetCDF file,`varname`

the NetCDF name of the primary variable and`errvarname`

the NetCDF name of the expected standard deviation error). For example:`probability_skip_for_training`

: For a given time step n, every track from the same time step n will be skipped by this probability during training (default 0.2). This does not affect the tracks from previous (n-1,n-2,..) and following time steps (n+1,n+2,...). The goal of this parameter is to force the neural network to learn to interpolate the data in time.`paramfile`

: the path of the file (netCDF) where the parameter values are stored (default:`nothing`

).

For example, a single entry of `auxdata_files`

could be:

```
auxdata_files = [
(filename = "big-sst-file.nc"),
varname = "SST",
errvarname = "SST_error")]
```

The data in the file should already be interpolated on the targed grid. The file structure of the NetCDF file is described in `DINCAE.load_gridded_nc`

. The fields defined in this file should not have any missing value (see DIVAnd.ufill).

See `DINCAE.reconstruct`

for other optional parameters.

An (minimal) example of the NetCDF file is:

```
netcdf all-sla.train {
dimensions:
time_instances = 9628 ;
obs = 7445528 ;
variables:
int64 size(time_instances) ;
size:sample_dimension = "obs" ;
double dates(time_instances) ;
dates:units = "days since 1900-01-01 00:00:00" ;
float sla(obs) ;
float lon(obs) ;
float lat(obs) ;
int64 id(obs) ;
double dtime(obs) ;
dtime:long_name = "time of measurement" ;
dtime:units = "days since 1900-01-01 00:00:00" ;
}
```

The file should contain the variables `lon`

(longitude), `lat`

(latitude), `dtime`

(time of measurement) and `id`

(numeric identifier, only used by post processing scripts) and `dates`

(time instance of the gridded field). The file should be in the contiguous ragged array representation as specified by the CF convention allowing to group data points into "features" (e.g. tracks for altimetry). Every feature can also contain a single data point.

`DINCAE.transform_mσ2_single`

— Methodtransform x[:,:,1,:] and x[:,:,2,:] to mean and error variance

`DINCAE.vector2_covariance`

— Method```
⎛ L11 0 ⎞
L = ⎝ L21 L22 ⎠
P = L * L'
⎛ P11 P12 ⎞
P = ⎝ P12 P22 ⎠
```