Data Download

Imaging data can be downloaded either as a .zip file containing an imaging series or as a DICOM .dcm file containing a single acquisition within an imaging series.

Imaging series

Selecting the imaging series

The SeriesInstanceUID is needed to download an imaging series. The below example selects one series from the TCGA-THCA collection.

julia> patient_studies = tcia_studies(collection = "TCGA-THCA")7×9 DataFrame
 Row │ Collection  PatientID     PatientName   PatientSex  StudyInstanceUID    ⋯
     │ String15…   String15…     String15…     String1…    String              ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ TCGA-THCA   TCGA-DE-A4MD  TCGA-DE-A4MD  M           1.3.6.1.4.1.14519.5 ⋯
   2 │ TCGA-THCA   TCGA-DE-A4MA  TCGA-DE-A4MA  F           1.3.6.1.4.1.14519.5
   3 │ TCGA-THCA   TCGA-DE-A4MA  TCGA-DE-A4MA  F           1.3.6.1.4.1.14519.5
   4 │ TCGA-THCA   TCGA-DE-A4MC  TCGA-DE-A4MC  F           1.3.6.1.4.1.14519.5
   5 │ TCGA-THCA   TCGA-DE-A4MB  TCGA-DE-A4MB  F           1.3.6.1.4.1.14519.5 ⋯
   6 │ TCGA-THCA   TCGA-E3-A3DZ  TCGA-E3-A3DZ  F           1.3.6.1.4.1.14519.5
   7 │ TCGA-THCA   TCGA-E3-A3E5  TCGA-E3-A3E5  M           1.3.6.1.4.1.14519.5
                                                               5 columns omitted
julia> chosen_study = patient_studies.StudyInstanceUID[1]"1.3.6.1.4.1.14519.5.2.1.8421.4019.291746741815681058731047886323"
julia> imaging_series = tcia_series(study = chosen_study)2×16 DataFrame Row │ PatientID StudyInstanceUID SeriesInstanceUID ⋯ │ String15… String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ TCGA-DE-A4MD 1.3.6.1.4.1.14519.5.2.1.8421.401… 1.3.6.1.4.1.14519.5.2. ⋯ 2 │ TCGA-DE-A4MD 1.3.6.1.4.1.14519.5.2.1.8421.401… 1.3.6.1.4.1.14519.5.2. 14 columns omitted
julia> chosen_series = imaging_series.SeriesInstanceUID[1]"1.3.6.1.4.1.14519.5.2.1.8421.4019.267009254990923767283017660950"

Downloading the imaging series

Once the SeriesInstanceUID is known, the imaging data can be downloaded as a zip file by:

julia> zip_file = "output_file.zip"; # Can also be a path
julia> tcia_images(series = chosen_series, file = zip_file)"output_file.zip"

Convenience wrapper

The above steps will only download a zip file which then has to be extracted. This can be cumbersome when downloading multipled series, so the download_series() function is provided for convenience.

Note

The download_series() assumes that the unzip utility is installed on the system. This can be verified by typing unzip in a terminal or ;unzip in julia. ```

Downloading a single series

The following will download and extract the chosen_series (selected above) and extract the images in the current directory ./.

julia> download_series(chosen_series, "./")

Downloading multiple series

The wrapper function can download multiple series from a Dataframe by

julia> series = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001")
julia> download_series(series, "./testdf")

or from an array of dictionaries by

julia> seriesjs = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001", format="json") 
julia> download_series(seriesjs, "./testjs")

Single image

Selecting the single image

To download a single image, both its SeriesInstanceUID and SOPInstanceUID must be known. Continuing from the previous example, if we only wanted to download the first image in chosen_series, then:

julia> series_sops = tcia_sop(series = chosen_series)2×1 DataFrame
 Row │ SOPInstanceUID
     │ String
─────┼───────────────────────────────────
   1 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…
   2 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…
julia> chosen_sop = series_sops.SOPInstanceUID[1]"1.3.6.1.4.1.14519.5.2.1.8421.4019.244350881260053174818877266843"

Downloading the single image

Once the SeriesInstanceUID and SOPInstanceUID are known, the dicom file can be downloaded by:

julia> dicom_file = "output_file.dcm";
julia> tcia_single_image(series = chosen_series, sop = chosen_sop, file = dicom_file)"output_file.dcm"