Group YAXArrays and Datasets

The following examples will use the groupby function to calculate temporal and spatial averages.

using YAXArrays, DimensionalData
using NetCDF
using Downloads
using Dates
using Statistics

Seasonal Averages from Time Series of Monthly Means

The following reproduces the example in xarray by Joe Hamman.

Where the goal is to calculate the seasonal average. And in order to do this properly, is necessary to calculate the weighted average considering that each month has a different number of days.

Download the data

url_path = "https://github.com/pydata/xarray-data/raw/master/rasm.nc"
filename = Downloads.download(url_path, "rasm.nc")
ds_o = Cube(filename)

::: warning

The following rebuild should not be necessary in the future, plus is unpractical to use for large data sets. Out of memory groupby currently is work in progress. Related to https://github.com/rafaqz/DimensionalData.jl/issues/642

:::

axs = dims(ds_o) # get the dimensions
data = ds_o.data[:,:,:] # read the data
_FillValue = ds_o.properties["_FillValue"]
data = replace(data, _FillValue => NaN)
# create new YAXArray
ds = YAXArray(axs, data)

GroupBy: seasons

::: details function weighted_seasons(ds) ... end

function weighted_seasons(ds)
    # calculate weights 
    tempo = dims(ds, :Ti)
    month_length = YAXArray((tempo,), daysinmonth.(tempo))
    g_tempo = groupby(month_length, Ti => seasons(; start=December))
    sum_days = sum.(g_tempo, dims=:Ti)
    weights = map(./, g_tempo, sum_days)
    # unweighted seasons
    g_ds = groupby(ds, Ti => seasons(; start=December))
    mean_g = mean.(g_ds, dims=:Ti)
    mean_g = dropdims.(mean_g, dims=:Ti)
    # weighted seasons
    g_dsW = broadcast_dims.(*, weights, g_ds)
    weighted_g = sum.(g_dsW, dims = :Ti);
    weighted_g = dropdims.(weighted_g, dims=:Ti)
    # differences
    diff_g = map(.-, weighted_g, mean_g)
    seasons_g = lookup(mean_g, :Ti)
    return mean_g, weighted_g, diff_g, seasons_g
end

:::

Now, we continue with the groupby operations as usual

And the mean per season is calculated as follows

dropdims

Note that now the time dimension has length one, we can use dropdims to remove it

seasons

Due to the groupby function we will obtain new grouping names, in this case in the time dimension:

seasons_g = lookup(mean_g, :Ti)
Categorical{Symbol} Unordered
wrapping: 4-element Vector{Symbol}:
 :Dec_Jan_Feb
 :Mar_Apr_May
 :Jun_Jul_Aug
 :Sep_Oct_Nov

Next, we will weight this grouping by days/month in each group.

GroupBy: weight

Create a YAXArray for the month length

tempo = dims(ds, :Ti)
month_length = YAXArray((tempo,), daysinmonth.(tempo))
╭──────────────────────────────╮
36-element YAXArray{Int64,1}
├──────────────────────────────┴───────────────────────────────────────── dims ┐
  ↓ Ti Sampled{CFTime.DateTimeNoLeap} [CFTime.DateTimeNoLeap(1980-09-16T12:00:00), …, CFTime.DateTimeNoLeap(1983-08-17T00:00:00)] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────── metadata ┤
  Dict{String, Any}()
├─────────────────────────────────────────────────────────────────── file size ┤ 
  file size: 288.0 bytes
└──────────────────────────────────────────────────────────────────────────────┘

Now group it by season

Get the number of days per season

weights

Weight the seasonal groups by sum_days

Verify that the sum per season is 1

weighted seasons

Now, let's weight the seasons

apply a sum over the time dimension and drop it

Calculate the differences

All the previous steps are equivalent to calling the function defined at the top:

mean_g, weighted_g, diff_g, seasons_g = weighted_seasons(ds)

Once all calculations are done we can plot the results with Makie.jl as follows:

using CairoMakie
# define plot arguments/attributes
colorrange = (-30,30)
colormap = Reverse(:Spectral)
highclip = :red
lowclip = :grey15
cb_label =  ds_o.properties["long_name"]
"Surface air temperature"
with_theme(theme_ggplot2()) do
    hm_o, hm_d, hm_w = nothing, nothing, nothing
    # the figure
    fig = Figure(; size = (850,500))
    axs = [Axis(fig[i,j], aspect=DataAspect()) for i in 1:3, j in 1:4]
    for (j, s) in enumerate(seasons_g)
        hm_o = heatmap!(axs[1,j], mean_g[Ti=At(s)]; colorrange, lowclip, highclip, colormap)
        hm_w = heatmap!(axs[2,j], weighted_g[Ti=At(s)]; colorrange, lowclip, highclip, colormap)
        hm_d = heatmap!(axs[3,j], diff_g[Ti=At(s)]; colorrange=(-0.1,0.1), lowclip, highclip,
            colormap=:diverging_bwr_20_95_c54_n256)
    end
    Colorbar(fig[1:2,5], hm_o, label=cb_label)
    Colorbar(fig[3,5], hm_d, label="Tair")
    hidedecorations!.(axs, grid=false, ticks=false, label=false)
    # some labels
    [axs[1,j].title = string.(s) for (j,s) in enumerate(seasons_g)]
    Label(fig[0,1:5], "Seasonal Surface Air Temperature", fontsize=18, font=:bold)
    axs[1,1].ylabel = "Unweighted"
    axs[2,1].ylabel = "Weighted"
    axs[3,1].ylabel = "Difference"
    colgap!(fig.layout, 5)
    rowgap!(fig.layout, 5)
    fig
end
Example block output

which shows a good agreement with the results first published by Joe Hamman.