DimensionalData

CICodecovAqua.jl Quality Assurance

DimensionalData.jl provides tools and abstractions for working with datasets that have named dimensions, and optionally a lookup index. It's a pluggable, generalised version of AxisArrays.jl with a cleaner syntax, and additional functionality found in NamedDims.jl. It has similar goals to pythons xarray, and is primarily written for use with spatial data in GeoData.jl.

Broadcasting and most Base methods maintain and sync dimension context.

DimensionalData.jl also implements:

  • comprehensive plot recipes for Plots.jl.
  • a Tables.jl interface with DimTable
  • multi-layered DimStacks that can be indexed together, and have base methods applied to all layers.
  • the Adapt.jl interface for use on GPUs, even as GPU kernel arguments.
  • traits for handling a wide range of spatial data types accurately.

Dimensions

Dimensions are wrapper types. They hold the lookup index, details about the grid, and other metadata. They are also used to index into the array. X, Y, Z and Ti are the exported defaults. A generalised Dim type is available to use arbitrary symbols to name dimensions. Custom dimension types can also be defined using the @dim macro.

Dimensions can be used to construct arrays in rand, ones, zeros and fill with either a range for a lookup index or a number for the dimension length:

julia> using DimensionalData

julia> A = rand(X(1:40), Y(50))
40×50 DimArray{Float64,2} with dimensions:
  X: 1:40 (Sampled - Ordered Regular Points)
  Y
 0.929006   0.116946  0.750017    0.172604  0.678835   0.495294
 0.0550038  0.100739  0.427026     0.778067  0.309657   0.831754
                               
 0.647768   0.965682  0.049315     0.220338  0.0326206  0.36705
 0.851769   0.164914  0.555637     0.771508  0.964596   0.30265

We can also use dim wrappers for indexing, so that the dimension order in the underlying array does not need to be known:

julia> A[Y(1), X(1:10)]
10-element DimArray{Float64,1} with dimensions:
  X: 1:10 (Sampled - Ordered Regular Points)
and reference dimensions: Y(1) 
 0.929006
 0.0550038
 0.641773
 
 0.846251
 0.506362
 0.0492866

And this has no runtime cost:

julia> A = ones(X(3), Y(3))
3×3 DimArray{Float64,2} with dimensions: X, Y
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0

julia> @btime $A[X(1), Y(2)]
  1.077 ns (0 allocations: 0 bytes)
1.0

julia> @btime parent($A)[1, 2]
  1.078 ns (0 allocations: 0 bytes)
1.0

Dims can be used for indexing and views without knowing dimension order:

julia> A = rand(X(40), Y(50))
40×50 DimArray{Float64,2} with dimensions: X, Y
 0.377696  0.105445  0.543156    0.844973  0.163758  0.849367
                              
 0.431454  0.108927  0.137541     0.531587  0.592512  0.598927

julia> A[Y=3]
40-element DimArray{Float64,1} with dimensions: X
and reference dimensions: Y(3)
 0.543156
 
 0.137541

julia> view(A, Y(), X(1:5))
5×50 DimArray{Float64,2} with dimensions: X, Y
 0.377696  0.105445  0.543156    0.844973  0.163758  0.849367
                              
 0.875279  0.133032  0.925045     0.156768  0.736917  0.444683

And for specifying dimension number in all Base and Statistics functions that have a dims argument:

julia> using Statistics

julia> A = rand(X(3), Y(4), Ti(5));

julia> mean(A; dims=Ti)
3×4×1 DimArray{Float64,3} with dimensions: X, Y, Ti (Time)
[:, :, 1]
 0.168058  0.52353   0.563065  0.347025
 0.472786  0.395884  0.307846  0.518926
 0.365028  0.381367  0.423553  0.369339

You can also use symbols to create Dim{X} dimensions, although we can't use the rand method directly with Symbols, and insteadd use the regular DimArray constructor:

julia> A = DimArray(rand(10, 20, 30), (:a, :b, :c));

julia> A[a=2:5, c=9]

4×20 DimArray{Float64,2} with dimensions: Dim{:a}, Dim{:b}
and reference dimensions: Dim{:c}(9)
 0.134354  0.581673  0.422615    0.410222   0.687915  0.753441
 0.573664  0.547341  0.835962     0.0353398  0.794341  0.490831
 0.166643  0.133217  0.879084     0.695685   0.956644  0.698638
 0.325034  0.147461  0.149673     0.560843   0.889962  0.75733

Selectors

Selectors find indices in the lookup index for each dimension:

  • At(x): get the index exactly matching the passed in value(s)
  • Near(x): get the closest index to the passed in value(s)
  • Where(f::Function): filter the array axis by a function of the dimension index values.
  • Between(a, b): get all indices between two values, excluding the high value.
  • Contains(x): get indices where the value x falls within the interval, exluding the upper value. Only used for SampledIntervals, for Points, use At.

(Between and Contains exlude the upper boundary so that adjacent selections never contain the same index)

Selectors can be used in getindex, setindex! and view to select indices matching the passed in value(s)

We can use selectors inside dim wrappers:

julia> using Dates

julia> timespan = DateTime(2001,1):Month(1):DateTime(2001,12)
DateTime("2001-01-01T00:00:00"):Month(1):DateTime("2001-12-01T00:00:00")

julia> A = DimArray(rand(12,10), (Ti(timespan), X(10:10:100)))
12×10 DimArray{Float64,2} with dimensions:
  Ti (Time): DateTime("2001-01-01T00:00:00"):Month(1):DateTime("2001-12-01T00:00:00") (Sampled - Ordered Regular Points)
  X: 10:10:100 (Sampled - Ordered Regular Points)
 0.14106   0.476176  0.311356  0.454908    0.464364  0.973193  0.535004
                                        
 0.522759  0.390414  0.797637  0.686718     0.901123  0.704603  0.0740788

julia> @btime A[X(Near(35)), Ti(At(DateTime(2001,5)))]
0.3133109280208961

Without dim wrappers selectors must be in the right order:

using Unitful

julia> A = rand(X((1:10:100)u"m"), Ti((1:5:100)u"s"));

julia> A[Between(10.5u"m", 50.5u"m"), Near(23u"s")]
4-element DimArray{Float64,1} with dimensions:
  X: (11:10:41) m (Sampled - Ordered Regular Points)
and reference dimensions:
  Ti(21 s) (Time): 21 s (Sampled - Ordered Regular Points)
 0.584028
 
 0.716715

For values other than Int/AbstractArray/Colon (which are set aside for regular indexing) the At selector is assumed, and can be dropped completely:

julia> A = rand(X([:a, :b, :c]), Y([25.6, 25.7, 25.8]));

julia> A[:b, 25.8]
0.61839141062599

Compile-time selectors

Using all Val indexes (only recommended for small arrays) you can index with named dimensions At arbitrary values with no runtime cost:

julia> A = rand(X(Val((:a, :b, :c))), Y(Val((5.0, 6.0, 7.0))))
3×3 DimArray{Float64,2} with dimensions:
  X: Val{(:a, :b, :c)}() (Categorical - Unordered)
  Y: Val{(5.0, 6.0, 7.0)}() (Categorical - Unordered)
 0.5808   0.835037  0.528461
 0.8924   0.431394  0.506915
 0.66386  0.955305  0.774132

julia> @btime $A[:c, 6.0]
  2.777 ns (0 allocations: 0 bytes)
0.9553052910459472

julia> @btime $A[Val(:c), Val(6.0)]
  1.288 ns (0 allocations: 0 bytes)
0.9553052910459472

Methods where dims can be used containing indices or Selectors

getindex, setindex!view

Methods where dims, dim types, or Symbols can be used to indicate the array dimension:

  • size, axes, firstindex, lastindex
  • cat, reverse, dropdims
  • reduce, mapreduce
  • sum, prod, maximum, minimum,
  • mean, median, extrema, std, var, cor, cov
  • permutedims, adjoint, transpose, Transpose
  • mapslices, eachslice

Methods where dims can be used to construct DimArrays:

  • fill, ones, zeros, rand

Warnings

Indexing with unordered or reverse order arrays has undefined behaviour. It will trash the dimension index, break searchsorted and nothing will make sense any more. So do it at you own risk. However, indexing with sorted vectors of Int can be useful. So it's allowed. But it will still do strange things to your interval sizes if the dimension span is Irregular.

Alternate Packages

There are a lot of similar Julia packages in this space. AxisArrays.jl, NamedDims.jl, NamedArrays.jl are registered alternative that each cover some of the functionality provided by DimensionalData.jl. DimensionalData.jl should be able to replicate most of their syntax and functionality.

AxisKeys.jl and AbstractIndices.jl are some other interesting developments. For more detail on why there are so many similar options and where things are headed, read this thread.