CIFAR-100
Description from the original website
The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).
Contents
Overview
The MLDatasets.CIFAR100
sub-module provides a programmatic interface to download, load, and work with the CIFAR-100 dataset.
using MLDatasets
# load full training set
train_x, train_y_coarse, train_y_fine = CIFAR100.traindata()
# load full test set
test_x, test_y_coarse, test_y_fine = CIFAR100.testdata()
The provided functions also allow for optional arguments, such as the directory dir
where the dataset is located, or the specific observation indices
that one wants to work with. For more information on the interface take a look at the documentation (e.g. ?CIFAR100.traindata
).
Function | Description |
---|---|
download([dir]) | Trigger interactive download of the dataset |
classnames_coarse(; [dir]) | Return the 20 super-class names as a vector of strings |
classnames_fine(; [dir]) | Return the 100 class names as a vector of strings |
traintensor([T], [indices]; [dir]) | Load the training images as an array of eltype T |
trainlabels([indices]; [dir]) | Load the labels for the training images |
testtensor([T], [indices]; [dir]) | Load the test images as an array of eltype T |
testlabels([indices]; [dir]) | Load the labels for the test images |
traindata([T], [indices]; [dir]) | Load images and labels of the training data |
testdata([T], [indices]; [dir]) | Load images and labels of the test data |
This module also provides utility functions to make working with the CIFAR-100 dataset in Julia more convenient.
Function | Description |
---|---|
convert2features(array) | Convert the CIFAR-100 tensor to a flat feature matrix |
convert2image(array) | Convert the CIFAR-100 tensor/matrix to a colorant array |
You can use the function convert2features
to convert the given CIFAR-100 tensor to a feature matrix (or feature vector in the case of a single image). The purpose of this function is to drop the spatial dimensions such that traditional ML algorithms can process the dataset.
julia> CIFAR100.convert2features(CIFAR100.traintensor()) # full training data
3072×50000 Array{N0f8,2}:
[...]
To visualize an image or a prediction we provide the function convert2image
to convert the given CIFAR-100 horizontal-major tensor (or feature matrix) to a vertical-major Colorant
array.
julia> CIFAR100.convert2image(CIFAR100.traintensor(1)) # first training image
32×32 Array{RGB{N0f8},2}:
[...]
API Documentation
Trainingset
MLDatasets.CIFAR100.traintensor
— Functiontraintensor([T = N0f8], [indices]; [dir]) -> Array{T}
Return the CIFAR-100 training images corresponding to the given indices
as a multi-dimensional array of eltype T
. If the corresponding labels are required as well, it is recommended to use CIFAR100.traindata
instead.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
.
If the parameter indices
is omitted or an AbstractVector
, the images are returned as a 4D array (i.e. a Array{T,4}
), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.
julia> CIFAR100.traintensor() # load all training images
32×32×3×50000 Array{N0f8,4}:
[...]
julia> CIFAR100.traintensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]
If indices
is an Integer
, the single image is returned as Array{T,3}
in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), the second dimension denotes the pixel columns (y), and the third dimension the RGB color channels of the image.
julia> CIFAR100.traintensor(1) # load first training image
32×32×3 Array{N0f8,3}:
[...]
As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image
to convert an CIFAR-100 array into a vertical-major Julia image with the appropriate RGB
eltype.
julia> CIFAR100.convert2image(CIFAR100.traintensor(1)) # convert to column-major colorant array
32×32 Array{RGB{N0f8},2}:
[...]
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR100
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR100
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR100.trainlabels
— Functiontrainlabels([indices]; [dir]) -> Yc, Yf
Return the CIFAR-100 trainset labels (coarse and fine) corresponding to the given indices
as a tuple of two Int
or two Vector{Int}
. The variables returned are the coarse label(s) (Yc
) and the fine label(s) (Yf
) respectively.
Yc, Yf = CIFAR100.trainlabels(); # full training set
The values of the labels denote the zero-based class-index that they represent (see CIFAR100.classnames_coarse
and CIFAR100.classnames_fine
for the corresponding names). If indices
is omitted, all labels are returned.
julia> Yc, Yf = CIFAR100.trainlabels(1:3) # first three labels
([11, 15, 4], [19, 29, 0])
julia> yc, yf = CIFAR100.trainlabels(1) # first label
(11, 19)
julia> CIFAR100.classnames_coarse()[yc + 1] # corresponding superclass name
"large_omnivores_and_herbivores"
julia> CIFAR100.classnames_fine()[yf + 1] # corresponding class name
"cattle"
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR100
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR100
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR100.traindata
— Functiontraindata([T = N0f8], [indices]; [dir]) -> X, Yc, Yf
Returns the CIFAR-100 trainset corresponding to the given indices
as a three-element tuple. If indices
is omitted the full trainingset is returned. The first element of the three return values (X
) will be the images as a multi-dimensional array, the second element (Yc
) the corresponding coarse labels as integers, and the third element (Yf
) the fine labels respectively.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T
. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
. The integer values of the labels correspond 1-to-1 the digit that they represent.
X, Yc, Yf = CIFAR100.traindata() # full datatset
X, Yc, Yf = CIFAR100.traindata(dir="./CIFAR100") # custom folder
x, yc, yf = CIFAR100.traindata(2) # only second observation
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR100
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR100
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
Take a look at CIFAR100.traintensor
and CIFAR100.trainlabels
for more information.
Testset
MLDatasets.CIFAR100.testtensor
— Functiontesttensor([T = N0f8], [indices]; [dir]) -> Array{T}
Return the CIFAR-100 test images corresponding to the given indices
as a multi-dimensional array of eltype T
. If the corresponding labels are required as well, it is recommended to use CIFAR100.testdata
instead.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
.
If the parameter indices
is omitted or an AbstractVector
, the images are returned as a 4D array (i.e. a Array{T,4}
), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.
julia> CIFAR100.testtensor() # load all training images
32×32×3×10000 Array{N0f8,4}:
[...]
julia> CIFAR100.testtensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]
If indices
is an Integer
, the single image is returned as Array{T,3}
in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), the second dimension denotes the pixel columns (y), and the third dimension the RGB color channels of the image.
julia> CIFAR100.testtensor(1) # load first training image
32×32×3 Array{N0f8,3}:
[...]
As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image
to convert an CIFAR-100 array into a vertical-major Julia image with the appropriate RGB
eltype.
julia> CIFAR100.convert2image(CIFAR100.testtensor(1)) # convert to column-major colorant array
32×32 Array{RGB{N0f8},2}:
[...]
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR100
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR100
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR100.testlabels
— Functiontestlabels([indices]; [dir]) -> Yc, Yf
Return the CIFAR-100 testset labels (coarse and fine) corresponding to the given indices
as a tuple of two Int
or two Vector{Int}
. The variables returned are the coarse label(s) (Yc
) and the fine label(s) (Yf
) respectively.
Yc, Yf = CIFAR100.testlabels(); # full training set
The values of the labels denote the zero-based class-index that they represent (see CIFAR100.classnames_coarse
and CIFAR100.classnames_fine
for the corresponding names). If indices
is omitted, all labels are returned.
julia> Yc, Yf = CIFAR100.testlabels(1:3) # first three labels
([10, 10, 0], [49, 33, 72])
julia> yc, yf = CIFAR100.testlabels(1) # first label
(10, 49)
julia> CIFAR100.classnames_coarse()[yc + 1] # corresponding superclass name
"large_natural_outdoor_scenes"
julia> CIFAR100.classnames_fine()[yf + 1] # corresponding class name
"mountain"
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR100
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR100
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR100.testdata
— Functiontestdata([T = N0f8], [indices]; [dir]) -> X, Yc, Yf
Returns the CIFAR-100 testset corresponding to the given indices
as a three-element tuple. If indices
is omitted the full testset is returned. The first element of the three return values (X
) will be the images as a multi-dimensional array, the second element (Yc
) the corresponding coarse labels as integers, and the third element (Yf
) the fine labels respectively.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T
. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
. The integer values of the labels correspond 1-to-1 the digit that they represent.
X, Yc, Yf = CIFAR100.testdata() # full datatset
X, Yc, Yf = CIFAR100.testdata(dir="./CIFAR100") # custom folder
x, yc, yf = CIFAR100.testdata(2) # only second observation
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR100
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR100
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
Take a look at CIFAR100.testtensor
and CIFAR100.testlabels
for more information.
Utilities
See CIFAR10.convert2features
and CIFAR10.convert2image
MLDatasets.CIFAR100.download
— Functiondownload([dir]; [i_accept_the_terms_of_use])
Trigger the (interactive) download of the full dataset into "dir
". If no dir
is provided the dataset will be downloaded into "~/.julia/datadeps/CIFAR100".
This function will display an interactive dialog unless either the keyword parameter i_accept_the_terms_of_use
or the environment variable DATADEPS_ALWAYS_ACCEPT
is set to true
. Note that using the data responsibly and respecting copyright/terms-of-use remains your responsibility.
MLDatasets.CIFAR100.classnames_coarse
— Functionclassnames_coarse(; [dir]) -> Vector{String}
Return the 20 names for the CIFAR100 superclasses as a vector of strings. Note that these strings are read from the actual resource file.
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR100
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR100
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR100.classnames_fine
— Functionclassnames_fine(; [dir]) -> Vector{String}
Return the 100 names for the CIFAR100 classes as a vector of strings. Note that these strings are read from the actual resource file.
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR100
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR100
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR100.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
References
Authors: Alex Krizhevsky, Vinod Nair, Geoffrey Hinton
Website: https://www.cs.toronto.edu/~kriz/cifar.html
[Krizhevsky, 2009] Alex Krizhevsky. "Learning Multiple Layers of Features from Tiny Images", Tech Report, 2009.