CIFAR-100

Description from the original website

The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).

Contents

Overview

The MLDatasets.CIFAR100 sub-module provides a programmatic interface to download, load, and work with the CIFAR-100 dataset.

using MLDatasets

# load full training set
train_x, train_y_coarse, train_y_fine = CIFAR100.traindata()

# load full test set
test_x, test_y_coarse, test_y_fine  = CIFAR100.testdata()

The provided functions also allow for optional arguments, such as the directory dir where the dataset is located, or the specific observation indices that one wants to work with. For more information on the interface take a look at the documentation (e.g. ?CIFAR100.traindata).

FunctionDescription
download([dir])Trigger interactive download of the dataset
classnames_coarse(; [dir])Return the 20 super-class names as a vector of strings
classnames_fine(; [dir])Return the 100 class names as a vector of strings
traintensor([T], [indices]; [dir])Load the training images as an array of eltype T
trainlabels([indices]; [dir])Load the labels for the training images
testtensor([T], [indices]; [dir])Load the test images as an array of eltype T
testlabels([indices]; [dir])Load the labels for the test images
traindata([T], [indices]; [dir])Load images and labels of the training data
testdata([T], [indices]; [dir])Load images and labels of the test data

This module also provides utility functions to make working with the CIFAR-100 dataset in Julia more convenient.

FunctionDescription
convert2features(array)Convert the CIFAR-100 tensor to a flat feature matrix
convert2image(array)Convert the CIFAR-100 tensor/matrix to a colorant array

You can use the function convert2features to convert the given CIFAR-100 tensor to a feature matrix (or feature vector in the case of a single image). The purpose of this function is to drop the spatial dimensions such that traditional ML algorithms can process the dataset.

julia> CIFAR100.convert2features(CIFAR100.traintensor()) # full training data
3072×50000 Array{N0f8,2}:
[...]

To visualize an image or a prediction we provide the function convert2image to convert the given CIFAR-100 horizontal-major tensor (or feature matrix) to a vertical-major Colorant array.

julia> CIFAR100.convert2image(CIFAR100.traintensor(1)) # first training image
32×32 Array{RGB{N0f8},2}:
[...]

API Documentation

Trainingset

MLDatasets.CIFAR100.traintensorFunction
traintensor([T = N0f8], [indices]; [dir]) -> Array{T}

Return the CIFAR-100 training images corresponding to the given indices as a multi-dimensional array of eltype T. If the corresponding labels are required as well, it is recommended to use CIFAR100.traindata instead.

The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1.

If the parameter indices is omitted or an AbstractVector, the images are returned as a 4D array (i.e. a Array{T,4}), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.

julia> CIFAR100.traintensor() # load all training images
32×32×3×50000 Array{N0f8,4}:
[...]

julia> CIFAR100.traintensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]

If indices is an Integer, the single image is returned as Array{T,3} in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), the second dimension denotes the pixel columns (y), and the third dimension the RGB color channels of the image.

julia> CIFAR100.traintensor(1) # load first training image
32×32×3 Array{N0f8,3}:
[...]

As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image to convert an CIFAR-100 array into a vertical-major Julia image with the appropriate RGB eltype.

julia> CIFAR100.convert2image(CIFAR100.traintensor(1)) # convert to column-major colorant array
32×32 Array{RGB{N0f8},2}:
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing CIFAR100 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/CIFAR100. In the case that dir does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.CIFAR100.trainlabelsFunction
trainlabels([indices]; [dir]) -> Yc, Yf

Return the CIFAR-100 trainset labels (coarse and fine) corresponding to the given indices as a tuple of two Int or two Vector{Int}. The variables returned are the coarse label(s) (Yc) and the fine label(s) (Yf) respectively.

Yc, Yf = CIFAR100.trainlabels(); # full training set

The values of the labels denote the zero-based class-index that they represent (see CIFAR100.classnames_coarse and CIFAR100.classnames_fine for the corresponding names). If indices is omitted, all labels are returned.

julia> Yc, Yf = CIFAR100.trainlabels(1:3) # first three labels
([11, 15, 4], [19, 29, 0])

julia> yc, yf = CIFAR100.trainlabels(1) # first label
(11, 19)

julia> CIFAR100.classnames_coarse()[yc + 1] # corresponding superclass name
"large_omnivores_and_herbivores"

julia> CIFAR100.classnames_fine()[yf + 1] # corresponding class name
"cattle"

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing CIFAR100 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/CIFAR100. In the case that dir does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.CIFAR100.traindataFunction
traindata([T = N0f8], [indices]; [dir]) -> X, Yc, Yf

Returns the CIFAR-100 trainset corresponding to the given indices as a three-element tuple. If indices is omitted the full trainingset is returned. The first element of the three return values (X) will be the images as a multi-dimensional array, the second element (Yc) the corresponding coarse labels as integers, and the third element (Yf) the fine labels respectively.

The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1. The integer values of the labels correspond 1-to-1 the digit that they represent.

X, Yc, Yf = CIFAR100.traindata() # full datatset
X, Yc, Yf = CIFAR100.traindata(dir="./CIFAR100") # custom folder
x, yc, yf = CIFAR100.traindata(2) # only second observation

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing CIFAR100 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/CIFAR100. In the case that dir does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

Take a look at CIFAR100.traintensor and CIFAR100.trainlabels for more information.

Testset

MLDatasets.CIFAR100.testtensorFunction
testtensor([T = N0f8], [indices]; [dir]) -> Array{T}

Return the CIFAR-100 test images corresponding to the given indices as a multi-dimensional array of eltype T. If the corresponding labels are required as well, it is recommended to use CIFAR100.testdata instead.

The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1.

If the parameter indices is omitted or an AbstractVector, the images are returned as a 4D array (i.e. a Array{T,4}), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.

julia> CIFAR100.testtensor() # load all training images
32×32×3×10000 Array{N0f8,4}:
[...]

julia> CIFAR100.testtensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]

If indices is an Integer, the single image is returned as Array{T,3} in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), the second dimension denotes the pixel columns (y), and the third dimension the RGB color channels of the image.

julia> CIFAR100.testtensor(1) # load first training image
32×32×3 Array{N0f8,3}:
[...]

As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image to convert an CIFAR-100 array into a vertical-major Julia image with the appropriate RGB eltype.

julia> CIFAR100.convert2image(CIFAR100.testtensor(1)) # convert to column-major colorant array
32×32 Array{RGB{N0f8},2}:
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing CIFAR100 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/CIFAR100. In the case that dir does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.CIFAR100.testlabelsFunction
testlabels([indices]; [dir]) -> Yc, Yf

Return the CIFAR-100 testset labels (coarse and fine) corresponding to the given indices as a tuple of two Int or two Vector{Int}. The variables returned are the coarse label(s) (Yc) and the fine label(s) (Yf) respectively.

Yc, Yf = CIFAR100.testlabels(); # full training set

The values of the labels denote the zero-based class-index that they represent (see CIFAR100.classnames_coarse and CIFAR100.classnames_fine for the corresponding names). If indices is omitted, all labels are returned.

julia> Yc, Yf = CIFAR100.testlabels(1:3) # first three labels
([10, 10, 0], [49, 33, 72])

julia> yc, yf = CIFAR100.testlabels(1) # first label
(10, 49)

julia> CIFAR100.classnames_coarse()[yc + 1] # corresponding superclass name
"large_natural_outdoor_scenes"

julia> CIFAR100.classnames_fine()[yf + 1] # corresponding class name
"mountain"

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing CIFAR100 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/CIFAR100. In the case that dir does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.CIFAR100.testdataFunction
testdata([T = N0f8], [indices]; [dir]) -> X, Yc, Yf

Returns the CIFAR-100 testset corresponding to the given indices as a three-element tuple. If indices is omitted the full testset is returned. The first element of the three return values (X) will be the images as a multi-dimensional array, the second element (Yc) the corresponding coarse labels as integers, and the third element (Yf) the fine labels respectively.

The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1. The integer values of the labels correspond 1-to-1 the digit that they represent.

X, Yc, Yf = CIFAR100.testdata() # full datatset
X, Yc, Yf = CIFAR100.testdata(dir="./CIFAR100") # custom folder
x, yc, yf = CIFAR100.testdata(2) # only second observation

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing CIFAR100 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/CIFAR100. In the case that dir does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

Take a look at CIFAR100.testtensor and CIFAR100.testlabels for more information.

Utilities

See CIFAR10.convert2features and CIFAR10.convert2image

MLDatasets.CIFAR100.downloadFunction
download([dir]; [i_accept_the_terms_of_use])

Trigger the (interactive) download of the full dataset into "dir". If no dir is provided the dataset will be downloaded into "~/.julia/datadeps/CIFAR100".

This function will display an interactive dialog unless either the keyword parameter i_accept_the_terms_of_use or the environment variable DATADEPS_ALWAYS_ACCEPT is set to true. Note that using the data responsibly and respecting copyright/terms-of-use remains your responsibility.

MLDatasets.CIFAR100.classnames_coarseFunction
classnames_coarse(; [dir]) -> Vector{String}

Return the 20 names for the CIFAR100 superclasses as a vector of strings. Note that these strings are read from the actual resource file.

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing CIFAR100 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/CIFAR100. In the case that dir does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.CIFAR100.classnames_fineFunction
classnames_fine(; [dir]) -> Vector{String}

Return the 100 names for the CIFAR100 classes as a vector of strings. Note that these strings are read from the actual resource file.

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing CIFAR100 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/CIFAR100. In the case that dir does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

References