DataFrames.jl

This section of the documentation will help you understand how to work with SpectralIndices.jl using DataFrames.jl as input.

This tutorial relies on data stored in data. To access it we are going to use the following:

using SpectralIndices, DataFrames
df = load_dataset("spectral", DataFrame)
first(df, 5)

5×9 DataFrame

Row	SR_B5	ST_B10	SR_B2	SR_B6	class	SR_B4	SR_B7	SR_B3	SR_B1
	Float64	Float64	Float64	Float64	String	Float64	Float64	Float64	Float64
1	0.269054	297.328	0.100795	0.306206	Urban	0.165764	0.251949	0.132227	0.08985
2	0.281264	297.108	0.08699	0.267596	Urban	0.160979	0.217917	0.124404	0.0738588
3	0.28422	297.436	0.0860275	0.258384	Urban	0.140203	0.200098	0.120994	0.0729375
4	0.254479	297.204	0.103916	0.25958	Urban	0.163976	0.216735	0.135981	0.0877325
5	0.269535	297.098	0.109306	0.273234	Urban	0.18126	0.219554	0.15035	0.0905925

Each column of this dataset is the Surface Reflectance from Landsat 8 for 3 different classes. The samples were taken over Oporto. The data is taken from spyndex and this tutorial is meant to closely mirror the python version.

This dataset specifically contains three different classes:

unique(df[!, "class"])

3-element Vector{String}:
 "Urban"
 "Water"
 "Vegetation"

so to reflect that we are going to calculate three different indices: NDVI for vegetation, NDWI for water and NDBI for urban.

NDVI

NDVI: Normalized Difference Vegetation Index
* Application Domain: vegetation
* Bands/Parameters: Any["N", "R"]
* Formula: (N-R)/(N+R)
* Reference: https://ntrs.nasa.gov/citations/19740022614

NDWI

NDWI: Normalized Difference Water Index
* Application Domain: water
* Bands/Parameters: Any["G", "N"]
* Formula: (G-N)/(G+N)
* Reference: https://doi.org/10.1080/01431169608948714

NDBI

NDBI: Normalized Difference Built-Up Index
* Application Domain: urban
* Bands/Parameters: Any["S1", "N"]
* Formula: (S1-N)/(S1+N)
* Reference: http://dx.doi.org/10.1080/01431160304987

We have multiple ways to feed this data to SectralIndices.jl to generate our indices. We will try to cover most of them here.

From `DataFrame` to `DataFrame`

A straightforward way to obtain the calculation of the indices is to feed a DataFrame to compute_index. In order to do this we need first to build the new DataFrame. We can explore which bands we need by calling the bands field in the indices:

NDVI.bands

2-element Vector{Any}:
 "N"
 "R"

NDWI.bands

2-element Vector{Any}:
 "G"
 "N"

NDBI.bands

2-element Vector{Any}:
 "S1"
 "N"

In this case we are going to need only Green, Red, NIR and SWIR1 bands. Since the compute_index expects the bands to have the same name as the have in the bands field we need to select the specific columns that we want out of the dataset and rename them. We can do this easily with select:

params = select(df, :SR_B3=>:G, :SR_B4=>:R, :SR_B5=>:N, :SR_B6=>:S1)
first(params, 5)

5×4 DataFrame

Row	G	R	N	S1
	Float64	Float64	Float64	Float64
1	0.132227	0.165764	0.269054	0.306206
2	0.124404	0.160979	0.281264	0.267596
3	0.120994	0.140203	0.28422	0.258384
4	0.135981	0.163976	0.254479	0.25958
5	0.15035	0.18126	0.269535	0.273234

Now our dataset is ready, and we just need to call the compute_index function

idx = compute_index(["NDVI", "NDWI", "NDBI"], params)
first(idx, 5)

5×3 DataFrame

Row	NDVI	NDWI	NDBI
	Float64	Float64	Float64
1	0.237548	-0.340973	0.0645838
2	0.271989	-0.386671	-0.0249016
3	0.339326	-0.402815	-0.0476153
4	0.216278	-0.303482	0.00992348
5	0.195821	-0.283852	0.0068146

The result is a new DataFrame with the desired indices as columns.

Another way to obtain this is to feed single DataFrames as kwargs. First we need to define the single DataFrames:

idx = compute_index(["NDVI", "NDWI", "NDBI"];
    G = select(df, :SR_B3=>:G),
    N = select(df, :SR_B5=>:N),
    R = select(df, :SR_B4=>:R),
    S1 = select(df, :SR_B6=>:S1))
first(idx, 5)

5×3 DataFrame

Row	NDVI	NDWI	NDBI
	Float64	Float64	Float64
1	0.237548	-0.340973	0.0645838
2	0.271989	-0.386671	-0.0249016
3	0.339326	-0.402815	-0.0476153
4	0.216278	-0.303482	0.00992348
5	0.195821	-0.283852	0.0068146

From `DataFrame` to `Vector`

Alternatively you can define a Dict for the indices from the DataFrame, going back to an example we saw in the previous page:

params = Dict("G" => df[!, "SR_B3"], "N" => df[!, "SR_B5"], "R" => df[!, "SR_B4"], "S1" => df[!, "SR_B6"])

Dict{String, Vector{Float64}} with 4 entries:
  "S1" => [0.306206, 0.267596, 0.258384, 0.25958, 0.273234, 0.32954, 0.271721, …
  "N"  => [0.269054, 0.281264, 0.28422, 0.254479, 0.269535, 0.277153, 0.26563, …
  "G"  => [0.132227, 0.124404, 0.120994, 0.135981, 0.15035, 0.152303, 0.135885,…
  "R"  => [0.165764, 0.160979, 0.140203, 0.163976, 0.18126, 0.19754, 0.170026, …

The computation is done in the same way:

ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"], params)

3-element Vector{Any}:
 [0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802  …  0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
 [-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846  …  -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
 [0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355  …  -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]

Just be careful with the naming, SpectralIndices.jl brings into the namespace all the indices as defined in indices. The all caps version of the indices is reserved for them, as we illustrated at the beginning of this tutorial:

NDVI

NDVI: Normalized Difference Vegetation Index
* Application Domain: vegetation
* Bands/Parameters: Any["N", "R"]
* Formula: (N-R)/(N+R)
* Reference: https://ntrs.nasa.gov/citations/19740022614

The two steps can be merged by providing the values directly as kwargs:

ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"];
    G = df[!, "SR_B3"],
    N = df[!, "SR_B5"],
    R = df[!, "SR_B4"],
    S1 = df[!, "SR_B6"])

3-element Vector{Any}:
 [0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802  …  0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
 [-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846  …  -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
 [0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355  …  -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]

You are free to choose whichever method you prefer, there is no meaningful trade-off in speed

@time ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"], params)

3-element Vector{Any}:
 [0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802  …  0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
 [-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846  …  -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
 [0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355  …  -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]

@time ndvi, ndwi, ndbi = compute_index(["NDVI", "NDWI", "NDBI"];
           G = df[!, "SR_B3"],
           N = df[!, "SR_B5"],
           R = df[!, "SR_B4"],
           S1 = df[!, "SR_B6"])

3-element Vector{Any}:
 [0.23754793677807357, 0.2719887844338796, 0.33932578974960087, 0.21627773595727137, 0.19582071673377036, 0.16771383579896465, 0.21944767233340506, 0.2251996432295527, 0.1655330261746833, 0.2675545906704802  …  0.810365666144593, 0.8104049969776344, 0.7616768543153676, 0.8027222040013119, 0.7929365431300779, 0.7862750574070626, 0.8080303042462863, 0.8025822103946664, 0.7135886988619672, 0.7672440264304153]
 [-0.3409734444357916, -0.38667135030536093, -0.4028151808767594, -0.3034817907083952, -0.28385153077628394, -0.29071730449057526, -0.32313861250513676, -0.3563320964589312, -0.24060392753715099, -0.34356689100134846  …  -0.7698492602846995, -0.7547124120206541, -0.7128263753013682, -0.7716516398212895, -0.7491201313937117, -0.7510114068441064, -0.7257608604061496, -0.7401234567901236, -0.6752241340558899, -0.7074355283543386]
 [0.06458384035045028, -0.02490161425500128, -0.04761531780788457, 0.009923476645422341, 0.006814596455672831, 0.08634934501415456, 0.01133569522728392, 0.03875665342611921, 0.006910176170362171, -0.0322322650047355  …  -0.47115094032591764, -0.46672499804111056, -0.40825671490715415, -0.5414949557901297, -0.43083696212857336, -0.43525525151156264, -0.4700842430846934, -0.4585879184008887, -0.4050436713235448, -0.44864683453438614]

DataFrames.jl

From DataFrame to DataFrame

From DataFrame to Vector

From `DataFrame` to `DataFrame`

From `DataFrame` to `Vector`