Compute YAXArrays
This section describes how to create new YAXArrays by performing operations on them.
- Use arithmetics to add or multiply numbers to each element of an array
- Use map to apply a more complex functions to every element of an array
- Use mapslices to reduce a dimension, e.g. to get the mean over all time steps
- Use mapCube to apply complex functions on an array that may change any dimensions
Let's start by creating an example dataset:
using YAXArrays
using Dates
axlist = (
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
Dim{:lon}(range(1, 10, length=10)),
Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data, properties)
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
→ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
↗ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, String} with 1 entry:
:origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘
Modify elements of a YAXArray
a[1,2,3]
0.1108917095653551
a[1,2,3] = 42
42
a[1,2,3]
42.0
::: warning
Some arrays, e.g. those saved in a cloud object storage are immutable making any modification of the data impossible.
:::
Arithmetics
Add a value to all elements of an array and save it as a new array:
a2 = a .+ 5
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
→ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
↗ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, String} with 1 entry:
:origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘
a2[1,2,3] == a[1,2,3] + 5
true
map
Apply a function on every element of an array individually:
offset = 5
map(a) do x
(x + offset) / 2 * 3
end
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
→ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
↗ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{Symbol, String} with 1 entry:
:origin => "user guide"
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘
This keeps all dimensions unchanged. Note, that here we can not access neighboring elements. In this case, we can use mapslices
or mapCube
instead. Each element of the array is processed individually.
The code runs very fast, because map
applies the function lazily. Actual computation will be performed only on demand, e.g. when elements were explicitly requested or further computations were performed.
mapslices
Reduce the time dimension by calculating the average value of all points in time:
import Statistics: mean
mapslices(mean, a, dims="Time")
╭───────────────────────────────────────────╮
│ 10×15 YAXArray{Union{Missing, Float64},2} │
├───────────────────────────────────────────┴──────────────────────────── dims ┐
↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
→ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 1.17 KB
└──────────────────────────────────────────────────────────────────────────────┘
There is no time dimension left, because there is only one value left after averaging all time steps. We can also calculate spatial means resulting in one value per time step:
mapslices(mean, a, dims=("lat", "lon"))
╭────────────────────────────────────────────────╮
│ 30-element YAXArray{Union{Missing, Float64},1} │
├────────────────────────────────────────────────┴─────────────────────── dims ┐
↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 240.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘
mapCube
mapCube
is the most flexible way to apply a function over subsets of an array. Dimensions may be added or removed.
Here we transform a raster array with spatial dimension lat and lon into a vector array having just one spatial dimension i.e. region. First, create the raster array:
using YAXArrays
using DimensionalData
using Dates
axlist = (
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
Dim{:lon}(range(1, 10, length=10)),
Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
raster_arr = YAXArray(axlist, data)
╭──────────────────────────────╮
│ 30×10×15 YAXArray{Float64,3} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
↓ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points,
→ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
↗ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 35.16 KB
└──────────────────────────────────────────────────────────────────────────────┘
Then, create a Matrix with the same spatial dimensions indicating to which region each point belongs to:
regions_mat = map(Iterators.product(raster_arr.lon, raster_arr.lat)) do (lon, lat)
1 <= lon < 10 && 1 <= lat < 5 && return "A"
1 <= lon < 10 && 5 <= lat < 10 && return "B"
10 <= lon < 15 && 1 <= lat < 5 && return "C"
return "D"
end
regions_mat = DimArray(regions_mat, (raster_arr.lon, raster_arr.lat))
╭──────────────────────────╮
│ 10×15 DimArray{String,2} │
├──────────────────────────┴───────────────────────────────────────────── dims ┐
↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
→ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
└──────────────────────────────────────────────────────────────────────────────┘
↓ → 1.0 1.28571 1.57143 1.85714 … 4.14286 4.42857 4.71429 5.0
1.0 "A" "A" "A" "A" "A" "A" "A" "B"
2.0 "A" "A" "A" "A" "A" "A" "A" "B"
3.0 "A" "A" "A" "A" "A" "A" "A" "B"
4.0 "A" "A" "A" "A" "A" "A" "A" "B"
5.0 "A" "A" "A" "A" … "A" "A" "A" "B"
6.0 "A" "A" "A" "A" "A" "A" "A" "B"
7.0 "A" "A" "A" "A" "A" "A" "A" "B"
8.0 "A" "A" "A" "A" "A" "A" "A" "B"
9.0 "A" "A" "A" "A" "A" "A" "A" "B"
10.0 "C" "C" "C" "C" … "C" "C" "C" "D"
which has the same spatial dimensions as the raster array at any given point in time:
DimArray(raster_arr[time = 1])
╭───────────────────────────╮
│ 10×15 DimArray{Float64,2} │
├───────────────────────────┴──────────────────────────────────────────── dims ┐
↓ lon Sampled{Float64} 1.0:1.0:10.0 ForwardOrdered Regular Points,
→ lat Sampled{Float64} 1.0:0.2857142857142857:5.0 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
└──────────────────────────────────────────────────────────────────────────────┘
↓ → 1.0 1.28571 1.57143 … 4.42857 4.71429 5.0
1.0 0.831485 0.815547 0.263498 0.370928 0.246982 0.740199
2.0 0.435785 0.20776 0.902026 0.199708 0.486121 0.388773
3.0 0.421056 0.524699 0.216875 0.60018 0.872581 0.801736
⋮ ⋱ ⋮
8.0 0.779933 0.528792 0.979828 0.207975 0.201816 0.364037
9.0 0.933193 0.95773 0.577388 0.614951 0.0741381 0.37831
10.0 0.284691 0.24948 0.353963 … 0.807791 0.683428 0.697182
Now we calculate the list of corresponding points for each region. This will be re-used for each point in time during the final mapCube
. In addition, this avoids the allocation of unnecessary memory.
regions = ["A", "B", "C", "D"]
points_of_regions = map(enumerate(regions)) do (i,region)
region => findall(isequal(region), regions_mat)
end |> Dict |> sort
OrderedCollections.OrderedDict{String, Vector{CartesianIndex{2}}} with 4 entries:
"A" => [CartesianIndex(1, 1), CartesianIndex(2, 1), CartesianIndex(3, 1), Car…
"B" => [CartesianIndex(1, 15), CartesianIndex(2, 15), CartesianIndex(3, 15), …
"C" => [CartesianIndex(10, 1), CartesianIndex(10, 2), CartesianIndex(10, 3), …
"D" => [CartesianIndex(10, 15)]
Finally, we can transform the entire raster array:
vector_array = mapCube(
raster_arr,
indims=InDims("lon", "lat"),
outdims=OutDims(Dim{:region}(regions))
) do xout, xin
for (region_pos, points) in enumerate(points_of_regions.vals)
# aggregate values of points in the current region at the current date
xout[region_pos] = sum(view(xin, points))
end
end
╭──────────────────────────────────────────╮
│ 4×30 YAXArray{Union{Missing, Float64},2} │
├──────────────────────────────────────────┴───────────────────────────── dims ┐
↓ region Categorical{String} ["A", "B", "C", "D"] ForwardOrdered,
→ time Sampled{Date} Date("2022-01-01"):Dates.Day(1):Date("2022-01-30") ForwardOrdered Regular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 960.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘
This gives us a vector array with only one spatial dimension, i.e. the region. Note that we still have 30 points in time. The transformation was applied for each date separately.
Hereby, xin
is a 10x15 array representing a map at a given time and xout
is a 4 element vector of missing values initially representing the 4 regions at that date. Then, we set each output element by the sum of all corresponding points
Distributed Computation
All map methods apply a function on all elements of all non-input dimensions separately. This allows to run each map function call in parallel. For example, we can execute each date of a time series in a different CPU thread during spatial aggregation.
The following code does a time mean over all grid points using multiple CPUs of a local machine:
using YAXArrays
using Dates
using Distributed
axlist = (
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
Dim{:lon}(range(1, 10, length=10)),
Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data, properties)
addprocs(2)
@everywhere begin
using YAXArrays
using Zarr
using Statistics
end
@everywhere function mymean(output, pixel)
@show "doing a mean"
output[:] .= mean(pixel)
end
mapCube(mymean, a, indims=InDims("time"), outdims=OutDims())
In the last example, mapCube
was used to map the mymean
function. mapslices
is a convenient function that can replace mapCube
, where you can omit defining an extra function with the output argument as an input (e.g. mymean
). It is possible to simply use mapslice
mapslices(mean ∘ skipmissing, a, dims="time")
It is also possible to distribute easily the workload on a cluster, with little modification to the code. To do so, we use the ClusterManagers
package.
using Distributed
using ClusterManagers
addprocs(SlurmManager(10))