API Reference
This section describes all available functions of this package.
Public API
YAXArrays.getAxis
— MethodgetAxis(desc, c)
Given an Axis description and a cube, returns the corresponding axis of the cube. The Axis description can be:
- the name as a string or symbol.
- an Axis object
YAXArrays.Cubes
— ModuleThe functions provided by YAXArrays are supposed to work on different types of cubes. This module defines the interface for all Data types that
YAXArrays.Cubes.YAXArray
— TypeYAXArray{T,N}
An array labelled with named axes that have values associated with them. It can wrap normal arrays or, more typically DiskArrays.
Fields
axes
:Tuple
of Dimensions containing the Axes of the Cubedata
: length(axes)-dimensional array which holds the data, this can be a lazy DiskArrayproperties
: Metadata properties describing the content of the datachunks
: Representation of the chunking of the datacleaner
: Cleaner objects to track which objects to tidy up when the YAXArray goes out of scope
YAXArrays.Cubes.caxes
— FunctionReturns the axes of a Cube
YAXArrays.Cubes.caxes
— Methodcaxes
Embeds Cube inside a new Cube
YAXArrays.Cubes.concatenatecubes
— Methodfunction concatenateCubes(cubelist, cataxis::CategoricalAxis)
Concatenates a vector of datacubes that have identical axes to a new single cube along the new axis cataxis
YAXArrays.Cubes.readcubedata
— Methodreadcubedata(cube)
Given any array implementing the YAXArray interface it returns an in-memory YAXArray
from it.
YAXArrays.Cubes.setchunks
— Methodsetchunks(c::YAXArray,chunks)
Resets the chunks of a YAXArray and returns a new YAXArray. Note that this will not change the chunking of the underlying data itself, it will just make the data "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savecube
on the resulting array. The chunks
argument can take one of the following forms:
- a
DiskArrays.GridChunks
object - a tuple specifying the chunk size along each dimension
- an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes
YAXArrays.Cubes.subsetcube
— FunctionThis function calculates a subset of a cube's data
YAXArrays.DAT.InDims
— TypeInDims(axisdesc...;...)
Creates a description of an Input Data Cube for cube operations. Takes a single or multiple axis descriptions as first arguments. Alternatively a MovingWindow(@ref) struct can be passed to include neighbour slices of one or more axes in the computation. Axes can be specified by their name (String), through an Axis type, or by passing a concrete axis.
Keyword arguments
artype
how shall the array be represented in the inner function. Defaults toArray
, alternatives areDataFrame
orAsAxisArray
filter
define some filter to skip the computation, e.g. when all values are missing. Defaults toAllMissing()
, possible values areAnyMissing()
,AnyOcean()
,StdZero()
,NValid(n)
(for at least n non-missing elements). It is also possible to provide a custom one-argument function that takes the array and returnstrue
if the compuation shall be skipped andfalse
otherwise.window_oob_value
if one of the input dimensions is a MowingWindow, this value will be used to fill out-of-bounds areas
YAXArrays.DAT.MovingWindow
— TypeMovingWindow(desc, pre, after)
Constructs a MovingWindow
object to be passed to an InDims
constructor to define that the axis in desc
shall participate in the inner function (i.e. shall be looped over), but inside the inner function pre
values before and after
values after the center value will be passed as well.
For example passing MovingWindow("Time", 2, 0)
will loop over the time axis and always pass the current time step plus the 2 previous steps. So in the inner function the array will have an additional dimension of size 3.
YAXArrays.DAT.OutDims
— MethodOutDims(axisdesc;...)
Creates a description of an Output Data Cube for cube operations. Takes a single or a Vector/Tuple of axes as first argument. Axes can be specified by their name (String), through an Axis type, or by passing a concrete axis.
axisdesc
: List of input axis namesbackend
: specifies the dataset backend to write data to, must be either :auto or a key inYAXArrayBase.backendlist
update
: specifies wether the function operates inplace or if an output is returnedartype
: specifies the Array type inside the inner function that is mapped overchunksize
: A Dict specifying the chunksizes for the output dimensions of the cube, or:input
to copy chunksizes from input cube axes or:max
to not chunk the inner dimensionsouttype
: force the output type to a specific type, defaults toAny
which means that the element type of the first input cube is used
YAXArrays.DAT.CubeTable
— MethodCubeTable()
Function to turn a DataCube object into an iterable table. Takes a list of as arguments, specified as a name=cube
expression. For example CubeTable(data=cube1,country=cube2)
would generate a Table with the entries data
and country
, where data
contains the values of cube1
and country
the values of cube2
. The cubes are matched and broadcasted along their axes like in mapCube
.
YAXArrays.DAT.cubefittable
— MethodYAXArrays.DAT.fittable
— Methodfittable(tab,o,fitsym;by=(),weight=nothing)
Loops through an iterable table tab
and thereby fitting an OnlineStat o
with the values specified through fitsym
. Optionally one can specify a field (or tuple) to group by. Any groupby specifier can either be a symbol denoting the entry to group by or an anynymous function calculating the group from a table row.
For example the following would caluclate a weighted mean over a cube weighted by grid cell area and grouped by country and month:
fittable(iter,WeightedMean,:tair,weight=(i->abs(cosd(i.lat))),by=(i->month(i.time),:country))
YAXArrays.DAT.mapCube
— MethodmapCube(fun, cube, addargs...;kwargs...)
Map a given function `fun` over slices of all cubes of the dataset `ds`.
Use InDims to discribe the input dimensions and OutDims to describe the output dimensions of the function.
For Datasets, only one output cube can be specified.
In contrast to the mapCube function for cubes, additional arguments for the inner function should be set as keyword arguments.
For the specific keyword arguments see the docstring of the mapCube function for cubes.
YAXArrays.DAT.mapCube
— MethodmapCube(fun, cube, addargs...;kwargs...)
Map a given function fun
over slices of the data cube cube
. The additional arguments addargs
will be forwarded to the inner function fun
. Use InDims to discribe the input dimensions and OutDims to describe the output dimensions of the function.
Keyword arguments
max_cache=YAXDefaults.max_cache
Float64 maximum size of blocks that are read into memory in bits e.g.max_cache=5.0e8
. Or String. e.g.max_cache="10MB" or
max_cache=1GB``` defaults to approx 10Mb.indims::InDims
List of input cube descriptors of typeInDims
for each input data cube.outdims::OutDims
List of output cube descriptors of typeOutDims
for each output cube.inplace
does the function write to an output array inplace or return a single value> defaults totrue
ispar
boolean to determine if parallelisation should be applied, defaults totrue
if workers are available.showprog
boolean indicating if a ProgressMeter shall be showninclude_loopvars
boolean to indicate if the varoables looped over should be added as function argumentsnthreads
number of threads for the computation, defaults to Threads.nthreads for every worker.loopchunksize
determines the chunk sizes of variables which are looped over, a dictkwargs
additional keyword arguments are passed to the inner function
The first argument is always the function to be applied, the second is the input cube or a tuple of input cubes if needed.
YAXArrays.Datasets.Dataset
— TypeDataset object which stores an `OrderedDict` of YAXArrays with Symbol keys.
a dictionary of CubeAxes and a Dictionary of general properties.
A dictionary can hold cubes with differing axes. But it will share the common axes between the subcubes.
YAXArrays.Datasets.Dataset
— MethodDataset(; properties = Dict{String,Any}, cubes...)
Construct a YAXArray Dataset with global attributes properties
a and a list of named YAXArrays cubes...
YAXArrays.Datasets.Cube
— MethodCube(ds::Dataset; joinname="Variable")
Construct a single YAXArray from the dataset ds
by concatenating the cubes in the datset on the joinname
dimension.
YAXArrays.Datasets.open_dataset
— Methodopen_dataset(g; driver=:all)
Open the dataset at g
with the given driver
. The default driver will search for available drivers and tries to detect the useable driver from the filename extension.
YAXArrays.Datasets.savecube
— Methodsavecube(cube,name::String)
Save a YAXArray
to the path
.
Extended Help
The keyword arguments are:
name
:datasetaxis="Variable"
special treatment of a categorical axis that gets written into separate zarr arraysmax_cache
: The number of bits that are used as cache for the data handling.backend
: The backend, that is used to save the data. Falls back to searching the backend according to the extension of the path.driver
: The same setting asbackend
.overwrite::Bool=false
overwrite cube if it already exists
YAXArrays.Datasets.savedataset
— Methodsavedataset(ds::Dataset; path = "", persist = nothing, overwrite = false, append = false, skeleton=false, backend = :all, driver = backend, max_cache = 5e8, writefac=4.0)
Saves a Dataset into a file at path
with the format given by driver
, i.e., driver=:netcdf or driver=:zarr.
overwrite = true, deletes ALL your data and it will create a new file.
YAXArrays.Datasets.to_dataset
— Methodto_dataset(c;datasetaxis = "Variable", layername = "layer")
Convert a Data Cube into a Dataset. It is possible to treat one of the Cube's axes as a "DatasetAxis" i.e. the cube will be split into different parts that become variables in the Dataset. If no such axis is specified or found, there will only be a single variable in the dataset with the name layername
Internal API
YAXArrays.YAXDefaults
— ConstantDefault configuration for YAXArrays, has the following fields:
workdir[]::String = "./"
The default location for temporary cubes.recal[]::Bool = false
set to true if you want@loadOrGenerate
to always recalculate the results.chunksize[]::Any = :input
Set the default output chunksize.max_cache[]::Float64 = 1e8
The maximum cache used by mapCube.cubedir[]::""
the default location forCube()
without an argument.subsetextensions::Array{Any} = []
List of registered functions, that convert subsetting input into dimension boundaries.
YAXArrays.findAxis
— MethodfindAxis(desc, c)
Internal function
Extended Help
Given an Axis description and a cube return the index of the Axis.
The Axis description can be:
- the name as a string or symbol.
- an Axis object
YAXArrays.getOutAxis
— MethodgetOutAxis
YAXArrays.get_descriptor
— Methodget_descriptor(a)
Get the descriptor of an Axis. This is used to dispatch on the descriptor.
YAXArrays.match_axis
— Methodmatch_axis
Internal function
Extended Help
Match the Axis based on the AxisDescriptor.
This is used to find different axes and to make certain axis description the same.
For example to disregard differences of captialisation.
YAXArrays.Cubes.CleanMe
— Typemutable struct CleanMe
Struct which describes data paths and their persistency. Non-persistend paths/files are removed at finalize step
YAXArrays.Cubes.clean
— Methodclean(c::CleanMe)
finalizer function for CleanMe struct. The main process removes all directories/files which are not persistent.
YAXArrays.Cubes.copydata
— Methodcopydata(outar, inar, copybuf)
Internal function which copies the data from the input inar
into the output outar
at the copybuf
positions.
YAXArrays.Cubes.optifunc
— Methodoptifunc(s, maxbuf, incs, outcs, insize, outsize, writefac)
Internal
This function is going to be minimized to detect the best possible chunk setting for the rechunking of the data.
YAXArrays.DAT.DATConfig
— TypeConfiguration object of a DAT process. This holds all necessary information to perform the calculations. It contains the following fields:
incubes::Tuple{Vararg{YAXArrays.DAT.InputCube, NIN}} where NIN
: The input data cubesoutcubes::Tuple{Vararg{YAXArrays.DAT.OutputCube, NOUT}} where NOUT
: The output data cubesallInAxes::Vector
: List of all axes of the input cubesLoopAxes::Vector
: List of axes that are looped throughispar::Bool
: Flag whether the computation is parallelizedloopcachesize::Vector{Int64}
:allow_irregular_chunks::Bool
:max_cache::Any
: Maximal size of the in memory cachefu::Any
: Inner function which is computedinplace::Bool
: Flag whether the computation happens in placeinclude_loopvars::Bool
:ntr::Any
:do_gc::Bool
: Flag if GC should be called explicitly. Probably necessary for many runs in Julia 1.9addargs::Any
: Additional arguments for the inner functionkwargs::Any
: Additional keyword arguments for the inner function
YAXArrays.DAT.InputCube
— TypeInternal representation of an input cube for DAT operations
cube
: The input datadesc
: The input description given by the user/registrationaxesSmall
: List of axes that were actually selected through the descriptionicolon
colonperm
loopinds
: Indices of loop axes that this cube does not contain, i.e. broadcastscachesize
: Number of elements to keep in cache along each axiswindow
iwindow
windowloopinds
iall
YAXArrays.DAT.OutputCube
— TypeInternal representation of an output cube for DAT operations
Fields
cube
: The actual outcube cube, once it is generatedcube_unpermuted
: The unpermuted output cubedesc
: The description of the output axes as given by users or registrationaxesSmall
: The list of output axes determined through the descriptionallAxes
: List of all the axes of the cubeloopinds
: Index of the loop axes that are broadcasted for this output cubeinnerchunks
outtype
: Elementtype of the outputcube
YAXArrays.DAT.YAXColumn
— TypeYAXColumn
A struct representing a single column of a YAXArray partitioned Table # Fields
inarBC
inds
YAXArrays.DAT.cmpcachmisses
— MethodFunction that compares two cache miss specifiers by their importance
YAXArrays.DAT.getFrontPerm
— MethodCalculate an axis permutation that brings the wanted dimensions to the front
YAXArrays.DAT.getLoopCacheSize
— MethodCalculate optimal Cache size to DAT operation
YAXArrays.DAT.getOuttype
— MethodgetOuttype(outtype, cdata)
Internal function
Get the element type for the output cube
YAXArrays.DAT.getloopchunks
— Methodgetloopchunks(dc::DATConfig)
Internal function
Returns the chunks that can be looped over toghether for all dimensions.
This computation of the size of the chunks is handled by [`DiskArrays.approx_chunksize`](@ref)
YAXArrays.DAT.permuteloopaxes
— Methodpermuteloopaxes(dc)
Internal function
Permute the dimensions of the cube, so that the axes that are looped through are in the first positions. This is necessary for a faster looping through the data.
YAXArrays.Cubes.setchunks
— Methodsetchunks(c::Dataset,chunks)
Resets the chunks of all or a subset YAXArrays in the dataset and returns a new Dataset. Note that this will not change the chunking of the underlying data itself, it will just make the data "look" like it had a different chunking. If you need a persistent on-disk representation of this chunking, use savedataset
on the resulting array. The chunks
argument can take one of the following forms:
- a NamedTuple or AbstractDict mapping from variable name to a description of the desired variable chunks
- a NamedTuple or AbstractDict mapping from dimension name to a description of the desired variable chunks
- a description of the desired variable chunks applied to all members of the Dataset
where a description of the desired variable chunks can take one of the following forms:
- a
DiskArrays.GridChunks
object - a tuple specifying the chunk size along each dimension
- an AbstractDict or NamedTuple mapping one or more axis names to chunk sizes
YAXArrays.Datasets.collectfromhandle
— MethodExtracts a YAXArray from a dataset handle that was just created from a arrayinfo
YAXArrays.Datasets.createdataset
— Methodfunction createdataset(DS::Type,axlist; kwargs...)
Creates a new dataset with axes specified in axlist
. Each axis must be a subtype of CubeAxis
. A new empty Zarr array will be created and can serve as a sink for mapCube
operations.
Keyword arguments
path=""
location where the new cube is storedT=Union{Float32,Missing}
data type of the target cubechunksize = ntuple(i->length(axlist[i]),length(axlist))
chunk sizes of the arraychunkoffset = ntuple(i->0,length(axlist))
offsets of the chunkspersist::Bool=true
shall the disk data be garbage-collected when the cube goes out of scope?overwrite::Bool=false
overwrite cube if it already existsproperties=Dict{String,Any}()
additional cube propertiesglobalproperties=Dict{String,Any}
global attributes to be added to the datasetfillvalue= T>:Missing ? defaultfillval(Base.nonmissingtype(T)) : nothing
fill valuedatasetaxis="Variable"
special treatment of a categorical axis that gets written into separate zarr arrayslayername="layer"
Fallback name of the variable stored in the dataset if nodatasetaxis
is found
YAXArrays.Datasets.getarrayinfo
— MethodExtract necessary information to create a YAXArrayBase dataset from a name and YAXArray pair
YAXArrays.Datasets.testrange
— MethodTest if data in x can be approximated by a step range