Group YAXArrays and Datasets
The following examples will use the groupby
function to calculate temporal and spatial averages.
using YAXArrays, DimensionalData
using NetCDF
using Downloads
using Dates
using Statistics
Seasonal Averages from Time Series of Monthly Means
The following reproduces the example in xarray by Joe Hamman.
Where the goal is to calculate the seasonal average. And in order to do this properly, is necessary to calculate the weighted average considering that each month has a different number of days.
Download the data
url_path = "https://github.com/pydata/xarray-data/raw/master/rasm.nc"
filename = Downloads.download(url_path, "rasm.nc")
ds_o = Cube(filename)
::: warning
The following rebuild should not be necessary in the future, plus is unpractical to use for large data sets. Out of memory groupby currently is work in progress. Related to https://github.com/rafaqz/DimensionalData.jl/issues/642
:::
axs = dims(ds_o) # get the dimensions
data = ds_o.data[:,:,:] # read the data
_FillValue = ds_o.properties["_FillValue"]
data = replace(data, _FillValue => NaN)
# create new YAXArray
ds = YAXArray(axs, data)
GroupBy: seasons
::: details function weighted_seasons(ds) ... end
function weighted_seasons(ds)
# calculate weights
tempo = dims(ds, :Ti)
month_length = YAXArray((tempo,), daysinmonth.(tempo))
g_tempo = groupby(month_length, Ti => seasons(; start=December))
sum_days = sum.(g_tempo, dims=:Ti)
weights = map(./, g_tempo, sum_days)
# unweighted seasons
g_ds = groupby(ds, Ti => seasons(; start=December))
mean_g = mean.(g_ds, dims=:Ti)
mean_g = dropdims.(mean_g, dims=:Ti)
# weighted seasons
g_dsW = broadcast_dims.(*, weights, g_ds)
weighted_g = sum.(g_dsW, dims = :Ti);
weighted_g = dropdims.(weighted_g, dims=:Ti)
# differences
diff_g = map(.-, weighted_g, mean_g)
seasons_g = lookup(mean_g, :Ti)
return mean_g, weighted_g, diff_g, seasons_g
end
:::
Now, we continue with the groupby
operations as usual
And the mean per season is calculated as follows
dropdims
Note that now the time dimension has length one, we can use dropdims
to remove it
seasons
Due to the groupby
function we will obtain new grouping names, in this case in the time dimension:
seasons_g = lookup(mean_g, :Ti)
Categorical{Symbol} Unordered
wrapping: 4-element Vector{Symbol}:
:Dec_Jan_Feb
:Mar_Apr_May
:Jun_Jul_Aug
:Sep_Oct_Nov
Next, we will weight this grouping by days/month in each group.
GroupBy: weight
Create a YAXArray
for the month length
tempo = dims(ds, :Ti)
month_length = YAXArray((tempo,), daysinmonth.(tempo))
╭──────────────────────────────╮
│ 36-element YAXArray{Int64,1} │
├──────────────────────────────┴───────────────────────────────────────── dims ┐
↓ Ti Sampled{CFTime.DateTimeNoLeap} [CFTime.DateTimeNoLeap(1980-09-16T12:00:00), …, CFTime.DateTimeNoLeap(1983-08-17T00:00:00)] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤
file size: 288.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘
Now group it by season
Get the number of days per season
weights
Weight the seasonal groups by sum_days
Verify that the sum per season is 1
weighted seasons
Now, let's weight the seasons
apply a sum
over the time dimension and drop it
Calculate the differences
All the previous steps are equivalent to calling the function defined at the top:
mean_g, weighted_g, diff_g, seasons_g = weighted_seasons(ds)
Once all calculations are done we can plot the results with Makie.jl
as follows:
using CairoMakie
# define plot arguments/attributes
colorrange = (-30,30)
colormap = Reverse(:Spectral)
highclip = :red
lowclip = :grey15
cb_label = ds_o.properties["long_name"]
"Surface air temperature"
with_theme(theme_ggplot2()) do
hm_o, hm_d, hm_w = nothing, nothing, nothing
# the figure
fig = Figure(; size = (850,500))
axs = [Axis(fig[i,j], aspect=DataAspect()) for i in 1:3, j in 1:4]
for (j, s) in enumerate(seasons_g)
hm_o = heatmap!(axs[1,j], mean_g[Ti=At(s)]; colorrange, lowclip, highclip, colormap)
hm_w = heatmap!(axs[2,j], weighted_g[Ti=At(s)]; colorrange, lowclip, highclip, colormap)
hm_d = heatmap!(axs[3,j], diff_g[Ti=At(s)]; colorrange=(-0.1,0.1), lowclip, highclip,
colormap=:diverging_bwr_20_95_c54_n256)
end
Colorbar(fig[1:2,5], hm_o, label=cb_label)
Colorbar(fig[3,5], hm_d, label="Tair")
hidedecorations!.(axs, grid=false, ticks=false, label=false)
# some labels
[axs[1,j].title = string.(s) for (j,s) in enumerate(seasons_g)]
Label(fig[0,1:5], "Seasonal Surface Air Temperature", fontsize=18, font=:bold)
axs[1,1].ylabel = "Unweighted"
axs[2,1].ylabel = "Weighted"
axs[3,1].ylabel = "Difference"
colgap!(fig.layout, 5)
rowgap!(fig.layout, 5)
fig
end
which shows a good agreement with the results first published by Joe Hamman.