DINCAE.DINCAE
— ModuleDINCAE (Data-Interpolating Convolutional Auto-Encoder) is a neural network to reconstruct missing data in satellite observations.
For most applications it is sufficient to call the function DINCAE.reconstruct
directly.
The code is available at: https://github.com/gher-uliege/DINCAE.jl
DINCAE.NCData
— Methoddd = NCData(lon,lat,time,data_full,missingmask,ndims;
train = false,
obs_err_std = fill(1.,size(data_full,3)),
jitter_std = fill(0.05,size(data_full,3)),
mask = trues(size(data_full)[1:2]),
)
Return a structure holding the data for training (train = true
) or testing (train = false
) the neural network. obs_err_std
is the error standard deviation of the observations. The variable lon
is the longitude in degrees east, lat
is the latitude in degrees north, time
is a DateTime vector, data_full
is a 3-d array with the data and missingmask
is a boolean mask where true means the data is missing. jitter_std
is the standard deviation of the noise to be added to the data during training.
DINCAE.interp_adjn!
— Methodall positions should be within the domain. exclusive upper bound
for all i and n (1 <= pos[i][n] < sz[n])
DINCAE.interpnd!
— Methodinterpnd! and interp_adj! are adjoints
vec must be zero initially
DINCAE.load_gridded_nc
— Methodlon,lat,time,data,missingmask,mask = load_gridded_nc(fname,varname; minfrac = 0.05)
Load the variable varname
from the NetCDF file fname
. The variable lon
is the longitude in degrees east, lat
is the latitude in degrees north, time
is a DateTime vector, data_full
is a 3-d array with the data, missingmask
is a boolean mask where true means the data is missing and mask
is a boolean mask where true means the data location is valid, e.g. sea points for sea surface temperature.
At the bare-minimum a NetCDF file should have the following variables and attributes:
netcdf file.nc {
dimensions:
time = UNLIMITED ; // (5266 currently)
lat = 112 ;
lon = 112 ;
variables:
double lon(lon) ;
double lat(lat) ;
double time(time) ;
time:units = "days since 1900-01-01 00:00:00" ;
int mask(lat, lon) ;
float SST(time, lat, lon) ;
SST:_FillValue = -9999.f ;
}
The the netCDF mask is 0 for invalid (e.g. land for an ocean application) and 1 for pixels (e.g. ocean).
DINCAE.reconstruct
— Methodreconstruct(Atype,data_all,fnames_rec;...)
Train a neural network to reconstruct missing data using the training data set and periodically run the neural network on the test dataset. The data is assumed to be available on a regular longitude/latitude grid (which is the case of L3 satellite data).
Mandatory parameters
Atype
: array type to usedata_all
: list of named tuples. Every tuple should havefilename
andvarname
.data_all[1]
will be used for training (and perturbed to prevent overfitting). All others entriesdata_all[2:end]
will be reconstructed using the training network
at the epochs defined by save_epochs
.
fnames_rec
: vector of filenames corresponding to the entriesdata_all[2:end]
Optional parameters:
epochs
: the number of epochs (default1000
)batch_size
: the size of a mini-batch (default50
)enc_nfilter_internal
: number of filters of the internal encoding layers (default[16,24,36,54]
)skipconnections
: list of layers with skip connections (default2:(length(enc_nfilter_internal)+1)
)clip_grad
: maximum allowed gradient. Elements of the gradients larger than this values will be clipped (default5.0
).regularization_L2_beta
: Parameter for L2 regularization (default0
, i.e. no regularization)save_epochs
: list of epochs where the results should be saved (default200:10:epochs
)is3D
: Switch to apply 2D (is3D == false
) or 3D (is3D == true
) convolutions (defaultfalse
)upsampling_method
: interpolation method during upsampling which can be either:nearest
or:bilinear
(default:nearest
)ntime_win
: number of time instances within the time window. This number should be odd. (default3
)learning_rate
: initial learning rate of the ADAM optimizer (default0.001
)learning_rate_decay_epoch
: the exponential decay rate of the learning rate. Afterlearning_rate_decay_epoch
the learning rate is halved. The learning rate is computed aslearning_rate * 0.5^(epoch / learning_rate_decay_epoch)
.learning_rate_decay_epoch
can beInf
for a constant learning rate (default)min_std_err
: minimum error standard deviation preventing a division close to zero (defaultexp(-5) = 0.006737946999085467
)loss_weights_refine
: the weigh of the individual refinement layers using in the cost function. Ifloss_weights_refine
has a single element, then there is no refinement. (default(1.,)
)
Note that also the optional parameters should be to tuned for a particular application.
Internally the time mean is removed (per default) from the data before it is reconstructed. The time mean is also added back when the file is saved. However, the mean is undefined for for are pixels in the data defined as valid (sea) by the mask which do not have any valid data in the training dataset.
See DINCAE.load_gridded_nc
for more information about the netCDF file.
DINCAE.reconstruct_points
— MethodDINCAE.reconstruct_points(T,Atype,filename,varname,grid,fnames_rec )
Mandatory parameters:
T
:Float32
orFloat64
: float-type used by the neural networkArray{T}
orKnetArray{T}
: array-type used by the neural network.filename
: NetCDF file in the format described below.varname
: name of the primary variable in the NetCDF file.grid
: tuple of ranges with the grid in the longitude and latitude direction e.g.(-180:1:180,-90:1:90)
.fnames_rec
: NetCDF file names of the reconstruction.
Optional parameters:
jitter_std_pos
: standard deviation of the noise to be added to the position of the observations (default(5,5)
)auxdata_files
: gridded auxiliary data file for a multivariate reconstruction.auxdata_files
is an array of named tuples with the fields (filename
, the file name of the NetCDF file,varname
the NetCDF name of the primary variable anderrvarname
the NetCDF name of the expected standard deviation error). For example:probability_skip_for_training
: For a given time step n, every track from the same time step n will be skipped by this probability during training (default 0.2). This does not affect the tracks from previous (n-1,n-2,..) and following time steps (n+1,n+2,...). The goal of this parameter is to force the neural network to learn to interpolate the data in time.paramfile
: the path of the file (netCDF) where the parameter values are stored (default:nothing
).
For example, a single entry of auxdata_files
could be:
auxdata_files = [
(filename = "big-sst-file.nc"),
varname = "SST",
errvarname = "SST_error")]
The data in the file should already be interpolated on the targed grid. The file structure of the NetCDF file is described in DINCAE.load_gridded_nc
. The fields defined in this file should not have any missing value (see DIVAnd.ufill).
See DINCAE.reconstruct
for other optional parameters.
An (minimal) example of the NetCDF file is:
netcdf all-sla.train {
dimensions:
time_instances = 9628 ;
obs = 7445528 ;
variables:
int64 size(time_instances) ;
size:sample_dimension = "obs" ;
double dates(time_instances) ;
dates:units = "days since 1900-01-01 00:00:00" ;
float sla(obs) ;
float lon(obs) ;
float lat(obs) ;
int64 id(obs) ;
double dtime(obs) ;
dtime:long_name = "time of measurement" ;
dtime:units = "days since 1900-01-01 00:00:00" ;
}
The file should contain the variables lon
(longitude), lat
(latitude), dtime
(time of measurement) and id
(numeric identifier, only used by post processing scripts) and dates
(time instance of the gridded field). The file should be in the contiguous ragged array representation as specified by the CF convention allowing to group data points into "features" (e.g. tracks for altimetry). Every feature can also contain a single data point.
DINCAE.transform_mσ2_single
— Methodtransform x[:,:,1,:] and x[:,:,2,:] to mean and error variance
DINCAE.vector2_covariance
— Method ⎛ L11 0 ⎞
L = ⎝ L21 L22 ⎠
P = L * L'
⎛ P11 P12 ⎞
P = ⎝ P12 P22 ⎠