CIFAR-10
Description from the original website
The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
Contents
Overview
The MLDatasets.CIFAR10
sub-module provides a programmatic interface to download, load, and work with the CIFAR-10 dataset.
using MLDatasets
# load full training set
train_x, train_y = CIFAR10.traindata()
# load full test set
test_x, test_y = CIFAR10.testdata()
The provided functions also allow for optional arguments, such as the directory dir
where the dataset is located, or the specific observation indices
that one wants to work with. For more information on the interface take a look at the documentation (e.g. ?CIFAR10.traindata
).
Function | Description |
---|---|
download([dir]) | Trigger interactive download of the dataset |
classnames() | Return the class names as a vector of strings |
traintensor([T], [indices]; [dir]) | Load the training images as an array of eltype T |
trainlabels([indices]; [dir]) | Load the labels for the training images |
testtensor([T], [indices]; [dir]) | Load the test images as an array of eltype T |
testlabels([indices]; [dir]) | Load the labels for the test images |
traindata([T], [indices]; [dir]) | Load images and labels of the training data |
testdata([T], [indices]; [dir]) | Load images and labels of the test data |
This module also provides utility functions to make working with the CIFAR-10 dataset in Julia more convenient.
Function | Description |
---|---|
convert2features(array) | Convert the CIFAR-10 tensor to a flat feature matrix |
convert2image(array) | Convert the CIFAR-10 tensor/matrix to a colorant array |
You can use the function convert2features
to convert the given CIFAR-10 tensor to a feature matrix (or feature vector in the case of a single image). The purpose of this function is to drop the spatial dimensions such that traditional ML algorithms can process the dataset.
julia> CIFAR10.convert2features(CIFAR10.traintensor()) # full training data
3072×50000 Array{N0f8,2}:
[...]
To visualize an image or a prediction we provide the function convert2image
to convert the given CIFAR10 horizontal-major tensor (or feature matrix) to a vertical-major Colorant
array.
julia> CIFAR10.convert2image(CIFAR10.traintensor(1)) # first training image
32×32 Array{RGB{N0f8},2}:
[...]
API Documentation
Trainingset
MLDatasets.CIFAR10.traintensor
— Functiontraintensor([T = N0f8], [indices]; [dir]) -> Array{T}
Return the CIFAR-10 training images corresponding to the given indices
as a multi-dimensional array of eltype T
. If the corresponding labels are required as well, it is recommended to use CIFAR10.traindata
instead.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
.
If the parameter indices
is omitted or an AbstractVector
, the images are returned as a 4D array (i.e. a Array{T,4}
), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.
julia> CIFAR10.traintensor() # load all training images
32×32×3×50000 Array{N0f8,4}:
[...]
julia> CIFAR10.traintensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]
If indices
is an Integer
, the single image is returned as Array{T,3}
in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), the second dimension denotes the pixel columns (y), and the third dimension the RGB color channels of the image.
julia> CIFAR10.traintensor(1) # load first training image
32×32×3 Array{N0f8,3}:
[...]
As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image
to convert an CIFAR-10 array into a vertical-major Julia image with the appropriate RGB
eltype.
julia> CIFAR10.convert2image(CIFAR10.traintensor(1)) # convert to column-major colorant array
32×32 Array{RGB{N0f8},2}:
[...]
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR10
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR10
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR10.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR10.trainlabels
— Functiontrainlabels([indices]; [dir])
Returns the CIFAR-10 trainset labels corresponding to the given indices
as an Int
or Vector{Int}
. The values of the labels denote the zero-based class-index that they represent (see CIFAR10.classnames
for the corresponding names). If indices
is omitted, all labels are returned.
julia> CIFAR10.trainlabels() # full training set
50000-element Array{Int64,1}:
6
9
⋮
1
1
julia> CIFAR10.trainlabels(1:3) # first three labels
3-element Array{Int64,1}:
6
9
9
julia> CIFAR10.trainlabels(1) # first label
6
julia> CIFAR10.classnames()[CIFAR10.trainlabels(1) + 1] # corresponding name
"frog"
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR10
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR10
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR10.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR10.traindata
— Functiontraindata([T = N0f8], [indices]; [dir]) -> images, labels
Returns the CIFAR-10 trainingset corresponding to the given indices
as a two-element tuple. If indices
is omitted the full trainingset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T
. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
. The integer values of the labels correspond 1-to-1 the digit that they represent.
train_x, train_y = CIFAR10.traindata() # full datatset
train_x, train_y = CIFAR10.traindata(2) # only second observation
train_x, train_y = CIFAR10.traindata(dir="./CIFAR10") # custom folder
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR10
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR10
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR10.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
Take a look at CIFAR10.traintensor
and CIFAR10.trainlabels
for more information.
Testset
MLDatasets.CIFAR10.testtensor
— Functiontesttensor([T = N0f8], [indices]; [dir]) -> Array{T}
Return the CIFAR-10 test images corresponding to the given indices
as a multi-dimensional array of eltype T
. If the corresponding labels are required as well, it is recommended to use CIFAR10.testdata
instead.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
.
If the parameter indices
is omitted or an AbstractVector
, the images are returned as a 4D array (i.e. a Array{T,4}
), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.
julia> CIFAR10.testtensor() # load all training images
32×32×3×10000 Array{N0f8,4}:
[...]
julia> CIFAR10.testtensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]
If indices
is an Integer
, the single image is returned as Array{T,3}
in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), the second dimension denotes the pixel columns (y), and the third dimension the RGB color channels of the image.
julia> CIFAR10.testtensor(1) # load first training image
32×32×3 Array{N0f8,3}:
[...]
As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image
to convert an CIFAR-10 array into a vertical-major Julia image with the appropriate RGB
eltype.
julia> CIFAR10.convert2image(CIFAR10.testtensor(1)) # convert to column-major colorant array
32×32 Array{RGB{N0f8},2}:
[...]
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR10
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR10
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR10.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR10.testlabels
— Functiontestlabels([indices]; [dir])
Returns the CIFAR-10 testset labels corresponding to the given indices
as an Int
or Vector{Int}
. The values of the labels denote the zero-based class-index that they represent (see CIFAR10.classnames
for the corresponding names). If indices
is omitted, all labels are returned.
julia> CIFAR10.testlabels() # full training set
10000-element Array{Int64,1}:
3
8
⋮
1
7
julia> CIFAR10.testlabels(1:3) # first three labels
3-element Array{Int64,1}:
3
8
8
julia> CIFAR10.testlabels(1) # first label
3
julia> CIFAR10.classnames()[CIFAR10.testlabels(1) + 1] # corresponding name
"cat"
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR10
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR10
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR10.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.CIFAR10.testdata
— Functiontestdata([T = N0f8], [indices]; [dir]) -> images, labels
Returns the CIFAR-10 testset corresponding to the given indices
as a two-element tuple. If indices
is omitted the full testset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T
. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
. The integer values of the labels correspond 1-to-1 the digit that they represent.
test_x, test_y = CIFAR10.testdata() # full datatset
test_x, test_y = CIFAR10.testdata(2) # only second observation
test_x, test_y = CIFAR10.testdata(dir="./CIFAR10") # custom folder
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing CIFAR10
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/CIFAR10
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use CIFAR10.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
Take a look at CIFAR10.testtensor
and CIFAR10.testlabels
for more information.
Utilities
MLDatasets.CIFAR10.download
— Functiondownload([dir]; [i_accept_the_terms_of_use])
Trigger the (interactive) download of the full dataset into "dir
". If no dir
is provided the dataset will be downloaded into "~/.julia/datadeps/CIFAR10".
This function will display an interactive dialog unless either the keyword parameter i_accept_the_terms_of_use
or the environment variable DATADEPS_ALWAYS_ACCEPT
is set to true
. Note that using the data responsibly and respecting copyright/terms-of-use remains your responsibility.
MLDatasets.CIFAR10.classnames
— Functionclassnames() -> Vector{String}
Return the 10 names for the CIFAR10 classes as a vector of strings.
MLDatasets.CIFAR10.convert2features
— Functionconvert2features(array)
Convert the given CIFAR-10 tensor to a feature matrix (or feature vector in the case of a single image). The purpose of this function is to drop the spatial dimensions such that traditional ML algorithms can process the dataset.
julia> CIFAR10.convert2features(CIFAR10.traintensor(Float32)) # full training data
3072×50000 Array{Float32,2}:
[...]
julia> CIFAR10.convert2features(CIFAR10.traintensor(Float32,1)) # first observation
3072-element Array{Float32,1}:
[...]
MLDatasets.CIFAR10.convert2image
— Functionconvert2image(array) -> Array{RGB}
Convert the given CIFAR-10 horizontal-major tensor (or feature vector/matrix) to a vertical-major RGB
array.
julia> CIFAR10.convert2image(CIFAR10.traintensor()) # full training dataset
32×32×50000 Array{RGB{N0f8},3}:
[...]
julia> CIFAR10.convert2image(CIFAR10.traintensor(1)) # first training image
32×32 Array{RGB{N0f8},2}:
[...]
References
Authors: Alex Krizhevsky, Vinod Nair, Geoffrey Hinton
Website: https://www.cs.toronto.edu/~kriz/cifar.html
[Krizhevsky, 2009] Alex Krizhevsky. "Learning Multiple Layers of Features from Tiny Images", Tech Report, 2009.