CommonDataModel.AbstractDatasetType

AbstractDataset is a collection of multidimensional variables (for example a NetCDF or GRIB file)

A data set ds of a type derived from AbstractDataset should implemented at minimum:

  • Base.key(ds): return a list of variable names as strings
  • variable(ds,varname::String): return an array-like data structure (derived from AbstractVariable) of the variables corresponding to varname. This array-like data structure should follow the CF semantics.
  • dimnames(ds): should be an iterable with all dimension names in the data set ds
  • dim(ds,name): dimension value corresponding to name

Optionally a data set can have attributes and groups:

  • attribnames(ds): should be an iterable with all attribute names
  • attrib(ds,name): attribute value corresponding to name
  • groupnames(ds): should be an iterable with all group names
  • group(ds,name): group corresponding to the name

For a writable dataset, one should also implement:

  • defDim: define a dimension
  • defAttrib: define a attribute
  • defVar: define a variable
  • defGroup: define a group
CommonDataModel.AbstractVariableType

AbstractVariable{T,N} is a subclass of AbstractArray{T, N}. A variable v of a type derived from AbstractVariable should implement:

  • name(v): should be the name of variable within the data set
  • dimnames(v): should be a iterable data structure with all dimension names of the variable v
  • dataset(v): the parent dataset containing v
  • Base.size(v): the size of the variable
  • Base.getindex(v,indices...): get the data of v at the provided indices

Optionally a variable can have attributes:

  • attribnames(v): should be an iterable with all attribute names
  • attrib(v,name): attribute value corresponding to name

For a writable dataset, one should also implement:

  • defAttrib: define a attribute
  • Base.setindex!(v,data,indices...): set the data in v at the provided indices
CommonDataModel.AttributesType

A collection of attributes with a Dict-like interface dispatching to attribnames, attrib, defAttrib for keys, getindex and setindex! respectively.

CommonDataModel.CFVariableType

Variable (with applied transformations following the CF convention) attrib can have different attributes as the parent variables (used in GRIBDatasets to map from grib attributes to CF attributes)

CommonDataModel.DimensionsType

A collection of dimensions with a Dict-like interface dispatching to dimnames, dim, defDim for keys, getindex and setindex! respectively.

CommonDataModel.GroupsType

A collection of groups with a Dict-like interface dispatching to groupnames and group for keys and getindex respectively.

Base.collectMethod

collect always returns an array. Even if the result of the indexing is a scalar, it is wrapped into a zero-dimensional array.

Base.delete!Method
Base.delete!(a::Attributes, name)

Delete the attribute name from the attribute list a.

Base.filterMethod
data = CommonDataModel.filter(ncv, indices...; accepted_status_flags = nothing)

Load and filter observations by replacing all variables without an acepted status flag to missing. It is used the attribute ancillary_variables to identify the status flag.

# da["data"] is 2D matrix
good_data = NCDatasets.filter(ds["data"],:,:, accepted_status_flags = ["good_data","probably_good_data"])
Base.getindexMethod
v = getindex(ds::AbstractDataset, varname::SymbolOrString)

Return the variable varname in the dataset ds as a CFVariable. The following CF convention are honored when the variable is indexed:

  • _FillValue or missing_value (which can be a list) will be returned as missing.
  • scale_factor and add_offset are applied (output = scale_factor * data_in_file + add_offset)
  • time variables (recognized by the units attribute and possibly the calendar attribute) are returned usually as DateTime object. Note that CFTime.DateTimeAllLeap, CFTime.DateTimeNoLeap and CF.TimeDateTime360Day cannot be converted to the proleptic gregorian calendar used in julia and are returned as such. (See CFTime.jl for more information about those date types.) If a calendar is defined but not among the ones specified in the CF convention, then the data in the file is not converted into a date structure.

A call getindex(ds, varname) is usually written as ds[varname].

If variable represents a cell boundary, the attributes calendar and units of the related variables are used, if they are not specified. For example:

dimensions:
  time = UNLIMITED; // (5 currently)
  nv = 2;
variables:
  double time(time);
    time:long_name = "time";
    time:units = "hours since 1998-04-019 06:00:00";
    time:bounds = "time_bnds";
  double time_bnds(time,nv);

In this case, the variable time_bnds uses the units and calendar of time because both variables are related thought the bounds attribute following the CF conventions.

See also cfvariable(ds, varname).

Base.getindexMethod
getindex(a::Attributes,name::SymbolOrString)

Return the value of the attribute called name from the attribute list a. Generally the attributes are loaded by indexing, for example:

using NCDatasets
ds = NCDataset("file.nc")
title = ds.attrib["title"]
Base.getindexMethod
group = getindex(g::Groups,groupname::AbstractString)

Return the NetCDF group with the name groupname from the parent group g.

For example:

ds = NCDataset("results.nc", "r");
forecast_group = ds.group["forecast"]
forecast_temp = forecast_group["temperature"]
Base.getindexMethod
var = getindex(ds::Union{AbstractDataset,AbstractVariable},cfname::CFStdName)

Return the NetCDF variable var with the standard name cfname from a dataset. If the first argument is a variable, then the search is limited to all variables with the same dimension names.

Base.keysMethod
Base.keys(a::Attributes)

Return a list of the names of all attributes.

Base.keysMethod
keys(d::Dimensions)

Return a list of all dimension names in NCDataset ds.

Examples

ds = NCDataset("results.nc", "r");
dimnames = keys(ds.dim)
Base.keysMethod
names = keys(g::Groups)

Return the names of all subgroubs of the group g.

Base.reduceMethod
gr = reduce(f,gv::GroupedVariable)

Reduce the grouped variable gv along grouped dimension using the function f. The function f will be called as f(x,dims=d) where x array (an element of gv) and d is an integer of the dimension overwhich one need to reduce x.

Base.setindex!Method
Base.setindex!(a::Attributes,data,name::SymbolOrString)

Set the attribute called name to the value data in the attribute list a. data can be a vector or a scalar. A scalar is handeld as a vector with one element in the NetCDF data model.

Generally the attributes are defined by indexing, for example:

ds = NCDataset("file.nc","c")
ds.attrib["title"] = "my title"
close(ds)
Base.setindex!Method
setindex!(d::Dimensions,len,name::AbstractString)

Defines the dimension called name to the length len, for example:

ds = NCDataset("file.nc","c")
ds.dim["longitude"] = 100

If len is the special value Inf, then the dimension is considered as unlimited, i.e. it will grow as data is added to the NetCDF file.

Base.sizeMethod
sz = size(var::CFVariable)

Return a tuple of integers with the size of the variable var.

Note

Note that the size of a variable can change, i.e. for a variable with an unlimited dimension.

Base.viewMethod
sv = view(v::CommonDataModel.AbstractVariable,indices...)

Returns a view of the variable v where indices are only lazily applied. No data is actually copied or loaded. Modifications to a view sv, also modifies the underlying array v. All attributes of v are also present in sv.

Examples

using NCDatasets
fname = tempname()
data = zeros(Int,10,11)
ds = NCDataset(fname,"c")
ncdata = defVar(ds,"temp",data,("lon","lat"))
ncdata_view = view(ncdata,2:3,2:4)
size(ncdata_view)
# output (2,3)
ncdata_view[1,1] = 1
ncdata[2,2]
# outputs 1 as ncdata is also modified
close(ds)
Base.writeMethod
write(dest::AbstractDataset, src::AbstractDataset; include = keys(src), exclude = [])

Write the variables of src dataset into an empty dest dataset (which must be opened in mode "a" or "c"). The keywords include and exclude configure which variable of src should be included (by default all), or which should be excluded (by default none).

If the first argument is a file name, then the dataset is open in create mode ("c").

This function is useful when you want to save the dataset from a multi-file dataset.

To save a subset, one can use the view function view to virtually slice a dataset:

Example

NCDataset(fname_src) do ds
    write(fname_slice,view(ds, lon = 2:3))
end

All variables in the source file fname_src with a dimension lon will be sliced along the indices 2:3 for the lon dimension. All attributes (and variables without a dimension lon) will be copied over unmodified.

CommonDataModel.ancillaryvariablesMethod
ncvar = CommonDataModel.ancillaryvariables(ncv::CFVariable,modifier)

Return the first ancillary variables from the NetCDF (or other format) variable ncv with the standard name modifier modifier. It can be used for example to access related variable like status flags.

CommonDataModel.attribMethod
CommonDatamodel.attrib(ds::Union{AbstractDataset,AbstractVariable},attribname::SymbolOrString)

Return the value of the attribute attribname in the data set ds.

CommonDataModel.attribnamesMethod
CommonDatamodel.attribnames(ds::Union{AbstractDataset,AbstractVariable})

Return an iterable of all attribute names in ds.

CommonDataModel.boundsMethod
b = bounds(ncvar::NCDatasets.CFVariable)

Return the CFVariable corresponding to the bounds attribute of the variable ncvar. The time units and calendar from the ncvar are used but not the attributes controling the packing of data scale_factor, add_offset and _FillValue.

CommonDataModel.cfvariableMethod
v = cfvariable(ds::NCDataset,varname::SymbolOrString; <attrib> = <value>)

Return the variable varname in the dataset ds as a NCDataset.CFVariable. The keyword argument <attrib> are the attributes (fillvalue, missing_value, scale_factor, add_offset, units and calendar) relevant to the CF conventions. By specifing the value of these attributes, the one can override the value specified in the data set. If the attribute is set to nothing, then the attribute is not loaded and the corresponding transformation is ignored. This function is similar to ds[varname] with the additional flexibility that some variable attributes can be overridden.

Example:

NCDataset("foo.nc","c") do ds
  defVar(ds,"data",[10., 11., 12., 13.], ("time",), attrib = Dict(
      "add_offset" => 10.,
      "scale_factor" => 0.2))
end

# The stored (packed) valued are [0., 5., 10., 15.]
# since 0.2 .* [0., 5., 10., 15.] .+ 10 is [10., 11., 12., 13.]

ds = NCDataset("foo.nc");

@show ds["data"].var[:]
# returns [0., 5., 10., 15.]

@show cfvariable(ds,"data")[:]
# returns [10., 11., 12., 13.]

# neither add_offset nor scale_factor are applied
@show cfvariable(ds,"data", add_offset = nothing, scale_factor = nothing)[:]
# returns [0, 5, 10, 15]

# add_offset is applied but not scale_factor
@show cfvariable(ds,"data", scale_factor = nothing)[:]
# returns [10, 15, 20, 25]

# 0 is declared as the fill value (add_offset and scale_factor are applied as usual)
@show cfvariable(ds,"data", fillvalue = 0)[:]
# returns [missing, 11., 12., 13.]

# Use the time units: days since 2000-01-01
@show cfvariable(ds,"data", units = "days since 2000-01-01")[:]
# returns [DateTime(2000,1,11), DateTime(2000,1,12), DateTime(2000,1,13), DateTime(2000,1,14)]

close(ds)
CommonDataModel.chunkingMethod
storage,chunksizes = chunking(v::MFVariable)
storage,chunksizes = chunking(v::MFCFVariable)

Return the storage type (:contiguous or :chunked) and the chunk sizes of the varable v corresponding to the first file. If the first file in the collection is chunked then this storage attributes are returned. If not the first file is not contiguous, then multi-file variable is still reported as chunked with chunk size equal to the size of the first variable.

CommonDataModel.coordMethod
cv = coord(v::Union{CFVariable,Variable},standard_name)

Find the coordinate of the variable v by the standard name standard_name or some standardized heuristics based on units. If the heuristics fail to detect the coordinate, consider to modify the file to add the standard_name attribute. All dimensions of the coordinate must also be dimensions of the variable v.

Example

using NCDatasets
ds = NCDataset("file.nc")
ncv = ds["SST"]
lon = coord(ncv,"longitude")[:]
lat = coord(ncv,"latitude")[:]
v = ncv[:]
close(ds)
CommonDataModel.datasetMethod
ds = CommonDataModel.dataset(v::AbstractVariable)

Return the data set ds to which a the variable v belongs to.

CommonDataModel.defAttribMethod
CommonDatamodel.defAttrib(ds::Union{AbstractDataset,AbstractVariable},name::SymbolOrString,data)

Create an attribute with the name attrib in the data set or variable ds.

CommonDataModel.defDimMethod
CommonDatamodel.defDim(ds::AbstractDataset,name::SymbolOrString,len)

Create dimension with the name name in the data set ds with the length len. len can be Inf for unlimited dimensions.

CommonDataModel.defGroupMethod
group = CommonDatamodel.defGroup(ds::AbstractDataset,name::SymbolOrString)

Create an empty sub-group with the name name in the data set ds. The group is a sub-type of AbstractDataset.

CommonDataModel.defVarMethod
v = CommonDataModel.defVar(ds::AbstractDataset,src::AbstractVariable)
v = CommonDataModel.defVar(ds::AbstractDataset,name::SymbolOrString,src::AbstractVariable)

Defines and return the variable in the data set ds copied from the variable src. The dimension name, attributes and data are copied from src as well as the variable name (unless provide by name).

CommonDataModel.delAttribMethod
CommonDatamodel.delAttrib(ds::Union{AbstractDataset,AbstractVariable},name::SymbolOrString,data)

Deletes an attribute with the name attrib in the data set or variable ds.

CommonDataModel.dimMethod
CommonDatamodel.dim(ds::AbstractDataset,dimname::SymbolOrString)

Return the length of the dimension dimname in the data set ds.

CommonDataModel.dimnamesMethod
CommonDataModel.dimnames(v::AbstractVariable)

Return an iterable of the dimension names of the variable v.

CommonDataModel.dimnamesMethod
CommonDatamodel.dimnames(ds::AbstractDataset)

Return an iterable of all dimension names in ds. This information can also be accessed using the property ds.dim:

Examples

ds = NCDataset("results.nc", "r");
dimnames = keys(ds.dim)
CommonDataModel.dimnamesMethod
dimnames(v::CFVariable)

Return a tuple of strings with the dimension names of the variable v.

CommonDataModel.dimsMethod
CommonDatamodel.dims(ds::Union{AbstractDataset,AbstractVariable})

Return a dict-like of all dimensions and their corresponding length defined in the the data set ds (or variable).

CommonDataModel.fillvalueMethod
fillvalue(::Type{Int8})
fillvalue(::Type{UInt8})
fillvalue(::Type{Int16})
fillvalue(::Type{UInt16})
fillvalue(::Type{Int32})
fillvalue(::Type{UInt32})
fillvalue(::Type{Int64})
fillvalue(::Type{UInt64})
fillvalue(::Type{Float32})
fillvalue(::Type{Float64})
fillvalue(::Type{Char})
fillvalue(::Type{String})

Default fill-value for the given type from NetCDF.

CommonDataModel.groupMethod
CommonDatamodel.group(ds::AbstractDataset,groupname::SymbolOrString)

Return the sub-group data set with the name groupname.

CommonDataModel.groupbyMethod
gv = CommonDataModel.groupby(v::AbstractVariable,:coordname => group_fun)
gv = CommonDataModel.groupby(v::AbstractVariable,"coordname" => group_fun)

Create a grouped variable gv whose elements composed by all elements in v whose corresponding coordinate variable (with the name coordname) map to the same value once the group function group_fun is applied to the coordinate.

The grouped variable gv and be reduced using the functions sum mean, median, var or std, for example gr = mean(gv). The result gr is a lazy structure representing the outcome of these operations performed over the grouped dimension. Only when the result gr is indexed the actually values are computed.

Broadcasting for gv and gr is overloaded. Broadcasting over all elements of gv means that a mapping function is to be applied to all elements of gv before a possible the reduction. Broadcasting over gr, for example v .- gr mean that gr is broadcasted over the full size of v according to the grouping function.

Example:

using NCDatasets, Dates
using CommonDataModel: @groupby, groupby

# create same test data

time = DateTime(2000,1,1):Day(1):DateTime(2009,12,31);  # 10 years
data = rand(Float32.(-9:99),360,180,length(time));
fname = "test_file.nc"
ds = NCDataset(fname,"c");
defVar(ds,"time",time,("time",));
defVar(ds,"data",data,("lon","lat","time"));

# group over month

gv = @groupby(ds["data"],Dates.Month(time))
# or
# gv = groupby(ds["data"],:time => Dates.Month)
length(gv)
# output 12 as they are all 12 months in this dataset
size(gv[1])
# 360 x 180 x 310 array with all time slices corresponding to the 1st month

# the variable `gv` is equivalent to the following operation
time_month = Dates.Month.(ds[:time][:])
gv2 = [ds[:data][:,:,findall(time_month .== m)] for m in sort(unique(time_month))];

# compute basic statistics

using Statistics
monthly_mean = mean(gv);
size(monthly_mean)
# 360 x 180 x 12 array with the monthly mean

# get a regular julia array
monthly_mean_array = monthly_mean[:,:,:];
typeof(monthly_mean_array)
# Array{Float32, 3}

# substact from data the corresponding monthly mean
monthly_anomalies = data .- mean(gv);

close(ds)
CommonDataModel.groupnamesMethod
CommonDatamodel.groupnames(ds::AbstractDataset)

All the subgroup names of the data set ds. For a data set containing only a single group, this will be an empty vector of String.

CommonDataModel.groupsMethod
CommonDatamodel.groups(ds::AbstractDataset)

Return all sub-group data as a dict-like object.

CommonDataModel.load!Method
CommonDataModel.load!(ncvar::CFVariable, data, buffer, indices)

Loads a NetCDF (or other format) variables ncvar in-place and puts the result in data (an array of eltype(ncvar)) along the specified indices. buffer is a temporary array of the same size as data but the type should be eltype(ncv.var), i.e. the corresponding type in the files (before applying scale_factor, add_offset and masking fill values). Scaling and masking will be applied to the array data.

data and buffer can be the same array if eltype(ncvar) == eltype(ncvar.var).

Example:

# create some test array
Dataset("file.nc","c") do ds
    defDim(ds,"time",3)
    ncvar = defVar(ds,"vgos",Int16,("time",),attrib = ["scale_factor" => 0.1])
    ncvar[:] = [1.1, 1.2, 1.3]
    # store 11, 12 and 13 as scale_factor is 0.1
end


ds = Dataset("file.nc")
ncv = ds["vgos"];
# data and buffer must have the right shape and type
data = zeros(eltype(ncv),size(ncv)); # here Vector{Float64}
buffer = zeros(eltype(ncv.var),size(ncv)); # here Vector{Int16}
NCDatasets.load!(ncv,data,buffer,:,:,:)
close(ds)
CommonDataModel.nameMethod
CommonDatamodel.name(ds::AbstractDataset)

Name of the group of the data set ds. For a data set containing only a single group, this will be always the root group "/".

CommonDataModel.nameMethod
CommonDataModel.name(v::AbstractVariable)

Return the name of the variable v as a string.

CommonDataModel.parentdatasetMethod
pds = CommonDatamodel.parentdataset(ds::AbstractDataset)

The data set pds containing ds as a sub-group. pds is nothing for the root group.

CommonDataModel.pathMethod
CommonDatamodel.path(ds::AbstractDataset)

File path of the data set ds.

CommonDataModel.rollingMethod
gv = CommonDataModel.rolling(v::AbstractVariable,:coordname => n)
gv = CommonDataModel.rolling(v::AbstractVariable,:coordname => rolling_classes)

Create a grouped variable gv whose elements composed by all elements in v grouped by a rolling window of length n along the coordinate variable coordname. One can also specify a vector classes (rolling_classes) with as many elements as they are groups and the elements coorespond to the values of the coordinate. Unlike CommonDataModel.groupby, the groups defined by rolling can overlap.

The grouped variable gv and be reduced using the functions sum mean, median, var or std, for example gr = mean(gv). The result gr is a lazy structure representing the outcome of these operations performed over the grouped dimension. Only when the result gr is indexed the actually values are computed.

Broadcasting for gv is overloaded. Broadcasting over all elements of gv means that a mapping function is to be applied to all elements of gv before a possible the reduction.

This operations is also called "running mean" when using mean as reduction function.

Example:

using NCDatasets, Dates
using CommonDataModel: rolling

# create same test data

time = DateTime(2000,1,1):Day(1):DateTime(2009,12,31);  # 10 years
data = rand(Float32.(-9:99),360,180,length(time));
fname = "test_file.nc"
ds = NCDataset(fname,"c");
defVar(ds,"time",time,("time",));
defVar(ds,"data",data,("lon","lat","time"));

# running weekly mean

gv = rolling(ds["data"],:time => 7)
length(gv)
# output 3653 as a mean will be compute for all time instance (including a
# partial mean and the beginning and the end)
size(gv[1])
# output 360 x 180 x 4: the first time slice will only be a mean of 4 values
size(gv[4])
# output 360 x 180 x 7: data from the first 7 days

# compute basic statistics

using Statistics
weekly_running_mean = mean(gv);
size(weekly_running_mean)
# 360 x 180 x 3653 array with the running weekly mean

# get a regular julia array
weekly_running_mean_array = weekly_running_mean[:,:,:];
typeof(weekly_running_mean_array)
# Array{Float32, 3}

# computing a centred 3 monthly mean taking into account that month do not have the
# same length

rolling_classes = [(t - Dates.Month(1)):Day(1):(t + Dates.Month(1)) for t in time]
extrema(length.(rolling_classes))
# output (60, 63)
data_3monthly = mean(rolling(ds["data"],:time => rolling_classes))[:,:,:];

close(ds)
CommonDataModel.selectMethod
vsubset = CommonDataModel.select(v,param1 => condition1, param2 => condition2,...)
dssubset = CommonDataModel.select(ds,param1 => condition1, param2 => condition2,...)

Return a subset of the variable v (or dataset ds) satisfying the conditions applied to the corresponding parameters. param1, param2 ... are symbols or strings with variable names and condition1, condition2 ... are functions taking as a argument the values of the corresponding parameter and return either false (ignore the corresponding elements) or true (select the corresponding elements for loading).

Examples

Create a sample file with random data:

using NCDatasets, Dates
using CommonDataModel: select, Near

fname = "sample_file.nc"
lon = -180:180
lat = -90:90
time = DateTime(2000,1,1):Day(1):DateTime(2000,1,3)
SST = randn(length(lon),length(lat),length(time));

ds = NCDataset(fname,"c")
defVar(ds,"lon",lon,("lon",));
defVar(ds,"lat",lat,("lat",));
defVar(ds,"time",time,("time",));
defVar(ds,"SST",SST,("lon","lat","time"));


# load by bounding box
v = select(ds["SST"],
           :lon => lon -> 30 <= lon <= 60,
           :lat => lat -> 40 <= lat <= 80)

# You can also select based on `ClosedInterval`s from `IntervalSets.jl`.
# Both 30..60 and 60 ± 20 construct `ClosedInterval`s, see their documentation for details. `∈` can be typed `\in` followed by the TAB-key.

using IntervalSets
v = select(ds["SST"], :lon => ∈(30..60), :lat => ∈(60 ± 20))

# get the indices matching the select query
(lon_indices,lat_indices,time_indices) = parentindices(v)

# get longitude matchting the select query
v_lon = v["lon"]

# find the nearest time instance
v = select(ds["SST"],:time => Near(DateTime(2000,1,4)))

# find the nearest time instance but not earlier or later than 2 hours
# an empty array is returned if no time instance is present

v = select(ds["SST"],:time => Near(DateTime(2000,1,3,1),Hour(2)))

close(ds)

See also @select for more information.

CommonDataModel.show_dimMethod
CommonDatamodel.show_dim(io,dim)

Print a list all dimensions (key/values pairs where key is the dimension names and value the corresponding length) in dim to IO stream io. The IO property :level is used for indentation.

CommonDataModel.time_factorMethod
tf = CommonDataModel.time_factor(v::CFVariable)

The time unit in milliseconds. E.g. seconds would be 1000., days would be 86400000. The result can also be nothing if the variable has no time units.

CommonDataModel.unlimitedMethod
CommonDatamodel.unlimited(ds::AbstractDataset)

Iterator of strings with the name of the unlimited dimension.

CommonDataModel.varbyattribMethod
varbyattrib(ds, attname = attval)

Returns a list of variable(s) which has the attribute attname matching the value attval in the dataset ds. The list is empty if the none of the variables has the match. The output is a list of CFVariables.

Examples

Load all the data of the first variable with standard name "longitude" from the NetCDF file results.nc.

julia> ds = NCDataset("results.nc", "r");
julia> data = varbyattrib(ds, standard_name = "longitude")[1][:]
CommonDataModel.variableMethod
CommonDataModel.variable(ds::AbstractDataset,variablename::SymbolOrString)

Return the variable with the name variablename from the data set ds.

CommonDataModel.varnamesMethod
CommonDataModel.varnames(ds::AbstractDataset)

Return an iterable of all the variable name.

CommonDataModel.@groupbyMacro
gv = CommonDataModel.@groupby(v,group_fun(coordname))

Create a grouped variable gv whose elements are composed by all elements in v whose corresponding coordinate variable (with the name coordname) map to the same value once the group function group_fun is applied to the coordinate variable.

See groupby for more information.

CommonDataModel.@selectMacro
vsubset = CommonDataModel.@select(v,expression)
dssubset = CommonDataModel.@select(ds,expression)

Return a subset of the variable v (or dataset ds) satisfying the condition expression as a view. The condition has the following form:

condition₁ && condition₂ && condition₃ ... conditionₙ

Every condition should involve a single 1D variable (typically a coordinate variable, referred as coord below). If v is a variable, the related 1D variable should have a shared dimension with the variable v. All local variables need to have a $ prefix (see examples below). This macro is experimental and subjected to change.

Every condition can either perform:

  • a nearest match: coord ≈ target_coord (for type \approx followed by the TAB-key). Only the data corresponding to the index closest to target_coord is loaded.

  • a nearest match with tolerance: coord ≈ target_coord ± tolerance. As before, but if the difference between the closest value in coord and target_coord is larger (in absolute value) than tolerance, an empty array is returned.

  • a condition operating on scalar values. For example, a condition equal to 10 <= lon <= 20 loads all data with the longitude between 10 and 20 or abs(lat) > 60 loads all variables with a latitude north of 60° N and south of 60° S (assuming that the has the 1D variables lon and lat for longitude and latitude).

Only the data which satisfies all conditions is loaded. All conditions must be chained with an && (logical and). They should not contain additional parenthesis or other logical operators such as || (logical or).

To convert the view into a regular array one can use collect, Array or regular indexing. As in julia, views of scalars are wrapped into a zero dimensional arrays which can be dereferenced by using []. Modifying a view will modify the underlying file (if the file is opened as writable, otherwise an error is issued).

As for any view, one can use parentindices(vsubset) to get the indices matching a select query.

Examples

Create a sample file with random data:

using NCDatasets, Dates
using CommonDataModel: @select
# or
# using NCDatasets: @select

fname = "sample_file.nc"
lon = -180:180
lat = -90:90
time = DateTime(2000,1,1):Day(1):DateTime(2000,1,3)
SST = randn(length(lon),length(lat),length(time))

ds = NCDataset(fname,"c")
defVar(ds,"lon",lon,("lon",));
defVar(ds,"lat",lat,("lat",));
defVar(ds,"time",time,("time",));
defVar(ds,"SST",SST,("lon","lat","time"));


# load by bounding box
v = @select(ds["SST"],30 <= lon <= 60 && 40 <= lat <= 80)

# substitute a local variable in condition using $
lonr = (30,60) # longitude range
latr = (40,80) # latitude range

v = @select(ds["SST"],$lonr[1] <= lon <= $lonr[2] && $latr[1] <= lat <= $latr[2])

# You can also select based on `ClosedInterval`s from `IntervalSets.jl`.
# Both 30..60 and 60 ± 20 construct `ClosedInterval`s, see their documentation for details.

lon_interval = 30..60
lat_interval = 60 ± 20
v = @select(ds["SST"], lon ∈ $lon_interval && lat ∈ $lat_interval)

# get the indices matching the select query
(lon_indices,lat_indices,time_indices) = parentindices(v)

# get longitude matchting the select query
v_lon = v["lon"]

# find the nearest time instance
v = @select(ds["SST"],time ≈ DateTime(2000,1,4))

# find the nearest time instance but not earlier or later than 2 hours
# an empty array is returned if no time instance is present

v = @select(ds["SST"],time ≈ DateTime(2000,1,3,1) ± Hour(2))

close(ds)

Any 1D variable with the same dimension name can be used in @select. For example, if we have a time series of temperature and salinity, the temperature values can also be selected based on salinity:

using NCDatasets, Dates
using CommonDataModel: @select
fname = "sample_series.nc"
# create a sample time series
time = DateTime(2000,1,1):Day(1):DateTime(2009,12,31)
salinity = randn(length(time)) .+ 35
temperature = randn(length(time))

NCDataset(fname,"c") do ds
    defVar(ds,"time",time,("time",));
    defVar(ds,"salinity",salinity,("time",));
    defVar(ds,"temperature",temperature,("time",));
end

ds = NCDataset(fname)

# load all temperature data from January where the salinity is larger than 35.
v = @select(ds["temperature"],Dates.month(time) == 1 && salinity >= 35)

# this is equivalent to:
v2 = ds["temperature"][findall(Dates.month.(time) .== 1 .&& salinity .>= 35)]

v == v2
# returns true
close(ds)
Note

For optimal performance, one should try to load contiguous data ranges, in particular when the data is loaded over HTTP/OPeNDAP.