Fashion-MNIST
Description from the official website
Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.
Contents
Overview
The MLDatasets.FashionMNIST
sub-module provides a programmatic interface to download, load, and work with the Fashion-MNIST dataset.
using MLDatasets
# load full training set
train_x, train_y = FashionMNIST.traindata()
# load full test set
test_x, test_y = FashionMNIST.testdata()
The provided functions also allow for optional arguments, such as the directory dir
where the dataset is located, or the specific observation indices
that one wants to work with. For more information on the interface take a look at the documentation (e.g. ?FashionMNIST.traindata
).
Function | Description |
---|---|
download([dir]) | Trigger (interactive) download of the dataset |
classnames() | Return the class names as a vector of strings |
traintensor([T], [indices]; [dir]) | Load the training images as an array of eltype T |
trainlabels([indices]; [dir]) | Load the labels for the training images |
testtensor([T], [indices]; [dir]) | Load the test images as an array of eltype T |
testlabels([indices]; [dir]) | Load the labels for the test images |
traindata([T], [indices]; [dir]) | Load images and labels of the training data |
testdata([T], [indices]; [dir]) | Load images and labels of the test data |
This module also provides utility functions to make working with the Fashion-MNIST dataset in Julia more convenient.
Function | Description |
---|---|
convert2features(array) | Convert the Fashion-MNIST tensor to a flat feature matrix |
convert2image(array) | Convert the Fashion-MNIST tensor/matrix to a colorant array |
You can use the function convert2features
to convert the given Fashion-MNIST tensor to a feature matrix (or feature vector in the case of a single image). The purpose of this function is to drop the spatial dimensions such that traditional ML algorithms can process the dataset.
julia> FashionMNIST.convert2features(FashionMNIST.traintensor()) # full training data
784×60000 Array{N0f8,2}:
[...]
To visualize an image or a prediction we provide the function convert2image
to convert the given Fashion-MNIST horizontal-major tensor (or feature matrix) to a vertical-major Colorant
array. The values are also color corrected according to the website's description, which means that the digits are black on a white background.
julia> FashionMNIST.convert2image(FashionMNIST.traintensor(1)) # first training image
28×28 Array{Gray{N0f8},2}:
[...]
API Documentation
MLDatasets.FashionMNIST
— ModuleFashion-MNIST
- Authors: Han Xiao, Kashif Rasul, Roland Vollgraf
- Website: https://github.com/zalandoresearch/fashion-mnist
Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. It can serve as a drop-in replacement for MNIST.
Interface
FashionMNIST.traintensor
,FashionMNIST.trainlabels
,FashionMNIST.traindata
FashionMNIST.testtensor
,FashionMNIST.testlabels
,FashionMNIST.testdata
Utilities
Also, the FashionMNIST
module is re-exporting convert2features
and convert2image
from the MNIST
module.
Trainingset
MLDatasets.FashionMNIST.traintensor
— Functiontraintensor([T = N0f8], [indices]; [dir]) -> Array{T}
Returns the Fashion-MNIST training images corresponding to the given indices
as a multi-dimensional array of eltype T
.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
.
If the parameter indices
is omitted or an AbstractVector
, the images are returned as a 3D array (i.e. a Array{T,3}
), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, and the third dimension denotes the index of the image.
julia> FashionMNIST.traintensor() # load all training images
28×28×60000 Array{N0f8,3}:
[...]
julia> FashionMNIST.traintensor(Float32, 1:3) # first three images as Float32
28×28×3 Array{Float32,3}:
[...]
If indices
is an Integer
, the single image is returned as Matrix{T}
in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), and the second dimension denotes the pixel columns (y) of the image.
julia> FashionMNIST.traintensor(1) # load first training image
28×28 Array{N0f8,2}:
[...]
As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image
to convert an FashionMNIST array into a vertical-major Julia image with the corrected color values.
julia> FashionMNIST.convert2image(FashionMNIST.traintensor(1)) # convert to column-major colorant array
28×28 Array{Gray{N0f8},2}:
[...]
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing FashionMNIST
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/FashionMNIST
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.FashionMNIST.trainlabels
— Functiontrainlabels([indices]; [dir])
Returns the Fashion-MNIST trainset labels corresponding to the given indices
as an Int
or Vector{Int}
. The values of the labels denote the zero-based class-index that they represent (see FashionMNIST.classnames
for the corresponding names). If indices
is omitted, all labels are returned.
julia> FashionMNIST.trainlabels() # full training set
60000-element Array{Int64,1}:
9
0
⋮
0
5
julia> FashionMNIST.trainlabels(1:3) # first three labels
3-element Array{Int64,1}:
9
0
0
julia> y = FashionMNIST.trainlabels(1) # first label
9
julia> FashionMNIST.classnames()[y + 1] # corresponding name
"Ankle boot"
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing FashionMNIST
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/FashionMNIST
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.FashionMNIST.traindata
— Functiontraindata([T = N0f8], [indices]; [dir]) -> images, labels
Returns the Fashion-MNIST trainingset corresponding to the given indices
as a two-element tuple. If indices
is omitted the full trainingset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T
. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
. The integer values of the labels correspond 1-to-1 the digit that they represent.
train_x, train_y = FashionMNIST.traindata() # full datatset
train_x, train_y = FashionMNIST.traindata(2) # only second observation
train_x, train_y = FashionMNIST.traindata(dir="./FashionMNIST") # custom folder
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing FashionMNIST
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/FashionMNIST
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
Take a look at FashionMNIST.traintensor
and FashionMNIST.trainlabels
for more information.
Testset
MLDatasets.FashionMNIST.testtensor
— Functiontesttensor([T = N0f8], [indices]; [dir]) -> Array{T}
Returns the Fashion-MNIST test images corresponding to the given indices
as a multi-dimensional array of eltype T
.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
.
If the parameter indices
is omitted or an AbstractVector
, the images are returned as a 3D array (i.e. a Array{T,3}
), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, and the third dimension denotes the index of the image.
julia> FashionMNIST.testtensor() # load all test images
28×28×10000 Array{N0f8,3}:
[...]
julia> FashionMNIST.testtensor(Float32, 1:3) # first three images as Float32
28×28×3 Array{Float32,3}:
[...]
If indices
is an Integer
, the single image is returned as Matrix{T}
in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), and the second dimension denotes the pixel columns (y) of the image.
julia> FashionMNIST.testtensor(1) # load first test image
28×28 Array{N0f8,2}:
[...]
As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image
to convert an FashionMNIST array into a vertical-major Julia image with the corrected color values.
julia> FashionMNIST.convert2image(FashionMNIST.testtensor(1)) # convert to column-major colorant array
28×28 Array{Gray{N0f8},2}:
[...]
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing FashionMNIST
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/FashionMNIST
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.FashionMNIST.testlabels
— Functiontestlabels([indices]; [dir])
Returns the Fashion-MNIST testset labels corresponding to the given indices
as an Int
or Vector{Int}
. The values of the labels denote the class-index that they represent (see FashionMNIST.classnames
for the corresponding names). If indices
is omitted, all labels are returned.
julia> FashionMNIST.testlabels() # full test set
10000-element Array{Int64,1}:
9
2
⋮
1
5
julia> FashionMNIST.testlabels(1:3) # first three labels
3-element Array{Int64,1}:
9
2
1
julia> y = FashionMNIST.testlabels(1) # first label
9
julia> FashionMNIST.classnames()[y + 1] # corresponding name
"Ankle boot"
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing FashionMNIST
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/FashionMNIST
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
MLDatasets.FashionMNIST.testdata
— Functiontestdata([T = N0f8], [indices]; [dir]) -> images, labels
Returns the Fashion-MNIST testset corresponding to the given indices
as a two-element tuple. If indices
is omitted the full testset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.
The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T
. If T <: Integer
, then all values will be within 0
and 255
, otherwise the values are scaled to be between 0
and 1
. The integer values of the labels correspond 1-to-1 the digit that they represent.
test_x, test_y = FashionMNIST.testdata() # full datatset
test_x, test_y = FashionMNIST.testdata(2) # only second observation
test_x, test_y = FashionMNIST.testdata(dir="./FashionMNIST") # custom folder
The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir
. If dir
is omitted the directories in DataDeps.default_loadpath
will be searched for an existing FashionMNIST
subfolder. In case no such subfolder is found, dir
will default to ~/.julia/datadeps/FashionMNIST
. In the case that dir
does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir])
explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.
Take a look at FashionMNIST.testtensor
and FashionMNIST.testlabels
for more information.
Utilities
MLDatasets.FashionMNIST.download
— Functiondownload([dir]; [i_accept_the_terms_of_use])
Trigger the (interactive) download of the full dataset into "dir
". If no dir
is provided the dataset will be downloaded into "~/.julia/datadeps/FashionMNIST".
This function will display an interactive dialog unless either the keyword parameter i_accept_the_terms_of_use
or the environment variable DATADEPS_ALWAYS_ACCEPT
is set to true
. Note that using the data responsibly and respecting copyright/terms-of-use remains your responsibility.
MLDatasets.FashionMNIST.classnames
— Functionclassnames() -> Vector{String}
Return the 10 names for the Fashion-MNIST classes as a vector of strings.
Also, the FashionMNIST
module is re-exporting convert2features
and convert2image
from the MNIST
module.
References
Authors: Han Xiao, Kashif Rasul, Roland Vollgraf
Website: https://github.com/zalandoresearch/fashion-mnist
[Han Xiao et al. 2017] Han Xiao, Kashif Rasul, and Roland Vollgraf. "Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms." arXiv:1708.07747