DataFrames.jl

This section of the documentation will help you understand how to work with SpectralIndices.jl using DataFrames.jl as input.

This tutorial relies on data stored in data. To access it we are going to use the following:

using SpectralIndices, DataFrames
df = load_dataset("spectral", DataFrame)
first(df, 5)
5×9 DataFrame
RowSR_B5ST_B10SR_B2SR_B6classSR_B4SR_B7SR_B3SR_B1
Float64Float64Float64Float64StringFloat64Float64Float64Float64
10.269054297.3280.1007950.306206Urban0.1657640.2519490.1322270.08985
20.281264297.1080.086990.267596Urban0.1609790.2179170.1244040.0738588
30.28422297.4360.08602750.258384Urban0.1402030.2000980.1209940.0729375
40.254479297.2040.1039160.25958Urban0.1639760.2167350.1359810.0877325
50.269535297.0980.1093060.273234Urban0.181260.2195540.150350.0905925

Each column of this dataset is the Surface Reflectance from Landsat 8 for 3 different classes. The samples were taken over Oporto. The data is taken from spyndex and this tutorial is meant to closely mirror the python version.

This dataset specifically contains three different classes:

unique(df[!, "class"])
3-element Vector{String}:
 "Urban"
 "Water"
 "Vegetation"

so to reflect that we are going to calculate three different indices: NDVI for vegetation, NDWI for water and NDBI for urban.

NDVI
NDVI: Normalized Difference Vegetation Index
* Application Domain: vegetation
* Bands/Parameters: Any["N", "R"]
* Formula: (N-R)/(N+R)
* Reference: https://ntrs.nasa.gov/citations/19740022614
NDWI
NDWI: Normalized Difference Water Index
* Application Domain: water
* Bands/Parameters: Any["G", "N"]
* Formula: (G-N)/(G+N)
* Reference: https://doi.org/10.1080/01431169608948714
NDBI
NDBI: Normalized Difference Built-Up Index
* Application Domain: urban
* Bands/Parameters: Any["S1", "N"]
* Formula: (S1-N)/(S1+N)
* Reference: http://dx.doi.org/10.1080/01431160304987

We have multiple ways to feed this data to SectralIndices.jl to generate our indices. We will try to cover most of them here.

From DataFrame to DataFrame

A straightforward way to obtain the calculation of the indices is to feed a DataFrame to compute_index. In order to do this we need first to build the new DataFrame. We can explore which bands we need by calling the bands field in the indices:

NDVI.bands
2-element Vector{Any}:
 "N"
 "R"
NDWI.bands
2-element Vector{Any}:
 "G"
 "N"
NDBI.bands
2-element Vector{Any}:
 "S1"
 "N"

In this case we are going to need only Green, Red, NIR and SWIR1 bands. Since the compute_index expects the bands to have the same name as the have in the bands field we need to select the specific columns that we want out of the dataset and rename them. We can do this easily with select:

params = select(df, :SR_B3=>:G, :SR_B4=>:R, :SR_B5=>:N, :SR_B6=>:S1)
first(params, 5)
5×4 DataFrame
RowGRNS1
Float64Float64Float64Float64
10.1322270.1657640.2690540.306206
20.1244040.1609790.2812640.267596
30.1209940.1402030.284220.258384
40.1359810.1639760.2544790.25958
50.150350.181260.2695350.273234

Now our dataset is ready, and we just need to call the compute_index function

idx = compute_index(["NDVI", "NDWI", "NDBI"], params)
first(idx, 5)
5×3 DataFrame
RowNDVINDWINDBI
Float64Float64Float64
10.237548-0.3409730.0645838
20.271989-0.386671-0.0249016
30.339326-0.402815-0.0476153
40.216278-0.3034820.00992348
50.195821-0.2838520.0068146

The result is a new DataFrame with the desired indices as columns.

Another way to obtain this is to feed single DataFrames as kwargs. First we need to define the single DataFrames:

idx = compute_index(["NDVI", "NDWI", "NDBI"];
    G = select(df, :SR_B3=>:G),
    N = select(df, :SR_B5=>:N),
    R = select(df, :SR_B4=>:R),
    S1 = select(df, :SR_B6=>:S1))
first(idx, 5)
5×3 DataFrame
RowNDVINDWINDBI
Float64Float64Float64
10.237548-0.3409730.0645838
20.271989-0.386671-0.0249016
30.339326-0.402815-0.0476153
40.216278-0.3034820.00992348
50.195821-0.2838520.0068146

From DataFrame to Vector

Alternatively you can define a Dict for the indices from the DataFrame, going back to an example we saw in the previous page:

params = Dict("G" => df[!, "SR_B3"], "N" => df[!, "SR_B5"], "R" => df[!, "SR_B4"], "S1" => df[!, "SR_B6"])
Dict{String, Vector{Float64}} with 4 entries:
  "S1" => [0.306206, 0.267596, 0.258384, 0.25958, 0.273234, 0.32954, 0.271721, …
  "N"  => [0.269054, 0.281264, 0.28422, 0.254479, 0.269535, 0.277153, 0.26563, …
  "G"  => [0.132227, 0.124404, 0.120994, 0.135981, 0.15035, 0.152303, 0.135885,…
  "R"  => [0.165764, 0.160979, 0.140203, 0.163976, 0.18126, 0.19754, 0.170026, …

The computation is done in the same way:

ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"], params)
3-element Vector{Any}:
 [0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802  …  0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
 [-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846  …  -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
 [0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355  …  -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]

Just be careful with the naming, SpectralIndices.jl brings into the namespace all the indices as defined in indices. The all caps version of the indices is reserved for them, as we illustrated at the beginning of this tutorial:

NDVI
NDVI: Normalized Difference Vegetation Index
* Application Domain: vegetation
* Bands/Parameters: Any["N", "R"]
* Formula: (N-R)/(N+R)
* Reference: https://ntrs.nasa.gov/citations/19740022614

The two steps can be merged by providing the values directly as kwargs:

ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"];
    G = df[!, "SR_B3"],
    N = df[!, "SR_B5"],
    R = df[!, "SR_B4"],
    S1 = df[!, "SR_B6"])
3-element Vector{Any}:
 [0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802  …  0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
 [-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846  …  -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
 [0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355  …  -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]

You are free to choose whichever method you prefer, there is no meaningful trade-off in speed

@time ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"], params)
3-element Vector{Any}:
 [0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802  …  0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
 [-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846  …  -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
 [0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355  …  -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]
@time ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"];
           G = df[!, "SR_B3"],
           N = df[!, "SR_B5"],
           R = df[!, "SR_B4"],
           S1 = df[!, "SR_B6"])
3-element Vector{Any}:
 [0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802  …  0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
 [-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846  …  -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
 [0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355  …  -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]