A package for reading and using GEOTRACES data in Julia.

Important note

In order to use this software, you must first download the GEOTRACES IDP21 data as a NetCDF file.

You must place it in a Data directory in your local "home" directory. For example, on OSX, the path of my GEOTRACES NetCDF file is:


The GEOTRACES data management committee does not allow third-party distribution of its data and does not provide a public URL pointing directly to the data, which prevents this package from downloading the data for you. However, the GEOTRACES data are publicly accessible, but they must be manually downloaded.

To use this package, like every other registered Julia package, you must add it to your environment, and then

julia> using GEOTRACES

should work.

Get a table of GEOTRACES observations

Simply use the GEOTRACES.observations function. For example, to get cadmium data:

julia> obs = GEOTRACES.observations("Cd")
10118×7 DataFrame
   Row  lat      lon      depth      cruise  station  date                 Cd
        Float32  Float32  Quantity  String  String   DateTime            Quantity
     1  -50.594  308.359     10.0 m  GA02    1        2011-03-05T19:50:30   0.05282 nmol kg⁻¹
     2  -50.594  308.359     25.0 m  GA02    1        2011-03-05T19:50:30   0.06973 nmol kg⁻¹
     3  -50.594  308.359     51.0 m  GA02    1        2011-03-05T19:50:30   0.15567 nmol kg⁻¹
 10116  -44.119  146.221    740.7 m  GS01    (2)      2018-01-11T15:24:00  0.509746 nmol kg⁻¹
 10117  -44.119  146.221    887.9 m  GS01    (2)      2018-01-11T15:24:00  0.649988 nmol kg⁻¹
 10118  -44.119  146.221    939.2 m  GS01    (2)      2018-01-11T15:24:00  0.686353 nmol kg⁻¹
                                                                             10112 rows omitted

Variable names made easy (with your help!)

Most GEOTRACES variable names are not very explicit (e.g., var70 for cadmium). For this reason, GEOTRACES.jl provides shortcut names for common tracers/variables, like "Cd" for cadmium. To check which variable they correspond to, you can do (sticking with cadmium as an example)

julia> GEOTRACES.variable("Cd")
var85 (698 × 3149)
  Datatype:    Float32
  Dimensions:  N_SAMPLES × N_STATIONS
   long_name            = Cd_D_CONC_BOTTLE
   units                = nmol/kg
   comment              = Concentration of dissolved Cd
   ancillary_variables  = var85_qc var85_err
   C_format             = %.3f
   FORTRAN_format       = F12.3
   _FillValue           = -1.0e10

At this stage, only a few variables have a predefined shortcut (those I have used myself). But suggestions to add new shortcut names are more than welcome! Just start an issue to ask for it on the repository and I'll try to respond ASAP! (PRs even better — check the varname function for the current list of predefined shortcuts.)

GEOTRACES.jl provides a helper function, matchingvariables, to find variable names. For example, to find nickel variable names, you could start with

julia> GEOTRACES.matchingvariables("ni_")
41-element Vector{Pair{String, String}}:
    "var165" => "Ni_60_58_D_DELTA_FISH"
    "var401" => "Ni_SPT_CONC_PUMP"
    "var351" => "Ni_TP_CONC_BOTTLE"
 "var402_qc" => "Quality flag of Ni_SPL_CONC_PUMP"
    "var402" => "Ni_SPL_CONC_PUMP"
 "var435_qc" => "Quality flag of Ni_TP_CONC_FISH"

Joining tracers

Sometimes, you want to extract data for two or more tracers but only where/when these are observed simultaneously. GEOTRACES does the filtering for you if you ask for them in the same call, thanks to the innerjoin function from DataFrames.jl:

julia> obs = GEOTRACES.observations("Cd", "PO₄", "DFe") # Cd, PO₄, and DFe obs with units
6097×9 DataFrame
  Row  lat       lon      depth      cruise  station  date                 Cd                  PO₄                DFe
       Float32   Float32  Quantity  String  String   DateTime            Quantity           Quantity          Quantity
    1  -50.594   308.359     10.0 m  GA02    1        2011-03-05T19:50:30   0.05282 nmol kg⁻¹    1.012 μmol kg⁻¹      0.52 nmol kg⁻¹
    2  -50.594   308.359     10.0 m  GA02    1        2011-03-05T19:50:30   0.05282 nmol kg⁻¹    1.014 μmol kg⁻¹      0.52 nmol kg⁻¹
    3  -50.594   308.359     25.0 m  GA02    1        2011-03-05T19:50:30   0.06973 nmol kg⁻¹    2.367 μmol kg⁻¹      0.37 nmol kg⁻¹
 6095  -51.4577  148.524   1481.8 m  GPpr11  (25)     2016-04-11T03:50:00     0.821 nmol kg⁻¹    2.458 μmol kg⁻¹     0.439 nmol kg⁻¹
 6096  -65.4472  139.851    295.8 m  GS01    (54)     2018-01-31T19:26:44  0.861593 nmol kg⁻¹  2.22555 μmol kg⁻¹  0.262196 nmol kg⁻¹
 6097  -63.4987  150.0     3435.7 m  GS01    (69)     2018-02-05T12:30:02  0.809597 nmol kg⁻¹  2.27572 μmol kg⁻¹  0.292817 nmol kg⁻¹
                                                                                                                     6091 rows omitted

Arranging observations in transects

If you want the GEOTRACES data organized into cruise transects and profiles, this is supported under the hood by the OceanographyCruises.jl package, so that you can do

julia> Cd = GEOTRACES.transects("Cd")
Transects of Cd
(Cruises GA02, GA03, GA04N, GA10, GA11, GI04, GIPY01, GIPY02, GIPY04, GIPY05, GIPY06, GIPY13, GIpr05, GN01, GN02, GN03, GN04, GP02, GP13, GP16, GP18, GP19, GPc03, GPc06, GPpr01, GPpr02, GPpr07, GPpr08, GPpr11, and GS01.)

to access all the transects that have Cadmium concentrations, and explore the data transect by transect, you can append .transects and chose a cruise, e.g.,

julia> Cd_GA02 = Cd.transects[1]
Transect of Cd
Cruise GA02
 Station                 Date       Lat      Lon 
       1  2011-03-05T19:50:30   -50.594  308.359 
       2  2010-05-02T22:44:44   64.0002   325.75 
       2  2011-03-06T23:27:34  -48.9071  311.244 
       3  2010-05-03T22:27:44   62.3452  324.002 
       3  2011-03-08T01:53:49  -46.9243  312.793 
       3  2012-08-03T14:29:29   57.2111  318.401 
       4  2011-03-09T01:55:29  -44.7052  314.461 
                                       53 rows omitted

which contains all the profiles of the GA02 cruise. You can further explore profiles by appending .profiles and selecting a profile, e.g.,

julia> Cd_GA02_profile1 = Cd_GA02.profiles[1]
Depth profile at Station 1 2011-03-05T19:50:30 (50.6S, 308.4E)
 Depth  Value [nmol kg⁻¹] 
  10.0            0.05282 
  25.0            0.06973 
  51.0            0.15567 
  74.0            0.37431 
 100.0            0.46844 
 151.0            0.50468 
 200.0            0.53303 
              17 rows omitted

Finally, you can access the vectors of concentration values (with units!) and depths by appending .values and .depths:

julia> Cd_GA02_profile1.values
24-element Vector{Unitful.Quantity{Float32, 𝐍 𝐌⁻¹, Unitful.FreeUnits{(kg⁻¹, nmol), 𝐍 𝐌⁻¹, nothing}}}:
 0.05282f0 nmol kg⁻¹
 0.06973f0 nmol kg⁻¹
 0.15567f0 nmol kg⁻¹
 0.69459f0 nmol kg⁻¹
 0.69974f0 nmol kg⁻¹
 0.70673f0 nmol kg⁻¹

julia> Cd_GA02_profile1.depths
24-element Vector{Float64}:

Note I will simply move the functionality of OceanographyCruises.jl (e.g., finding a transect "cruise track" using a salesman-problem algorithm) into GEOTRACES and apply functions directly to the dataframe returned by observations.

I hope you find this tool useful! Suggestions and PRs welcome!