Fashion-MNIST

Description from the official website

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

Contents

Overview

The MLDatasets.FashionMNIST sub-module provides a programmatic interface to download, load, and work with the Fashion-MNIST dataset.

using MLDatasets

# load full training set
train_x, train_y = FashionMNIST.traindata()

# load full test set
test_x,  test_y  = FashionMNIST.testdata()

The provided functions also allow for optional arguments, such as the directory dir where the dataset is located, or the specific observation indices that one wants to work with. For more information on the interface take a look at the documentation (e.g. ?FashionMNIST.traindata).

FunctionDescription
download([dir])Trigger (interactive) download of the dataset
classnames()Return the class names as a vector of strings
traintensor([T], [indices]; [dir])Load the training images as an array of eltype T
trainlabels([indices]; [dir])Load the labels for the training images
testtensor([T], [indices]; [dir])Load the test images as an array of eltype T
testlabels([indices]; [dir])Load the labels for the test images
traindata([T], [indices]; [dir])Load images and labels of the training data
testdata([T], [indices]; [dir])Load images and labels of the test data

This module also provides utility functions to make working with the Fashion-MNIST dataset in Julia more convenient.

FunctionDescription
convert2features(array)Convert the Fashion-MNIST tensor to a flat feature matrix
convert2image(array)Convert the Fashion-MNIST tensor/matrix to a colorant array

You can use the function convert2features to convert the given Fashion-MNIST tensor to a feature matrix (or feature vector in the case of a single image). The purpose of this function is to drop the spatial dimensions such that traditional ML algorithms can process the dataset.

julia> FashionMNIST.convert2features(FashionMNIST.traintensor()) # full training data
784×60000 Array{N0f8,2}:
[...]

To visualize an image or a prediction we provide the function convert2image to convert the given Fashion-MNIST horizontal-major tensor (or feature matrix) to a vertical-major Colorant array. The values are also color corrected according to the website's description, which means that the digits are black on a white background.

julia> FashionMNIST.convert2image(FashionMNIST.traintensor(1)) # first training image
28×28 Array{Gray{N0f8},2}:
[...]

API Documentation

MLDatasets.FashionMNISTModule

Fashion-MNIST

  • Authors: Han Xiao, Kashif Rasul, Roland Vollgraf
  • Website: https://github.com/zalandoresearch/fashion-mnist

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. It can serve as a drop-in replacement for MNIST.

Interface

Utilities

Also, the FashionMNIST module is re-exporting convert2features and convert2image from the MNIST module.

Trainingset

MLDatasets.FashionMNIST.traintensorFunction
traintensor([T = N0f8], [indices]; [dir]) -> Array{T}

Returns the Fashion-MNIST training images corresponding to the given indices as a multi-dimensional array of eltype T.

The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1.

If the parameter indices is omitted or an AbstractVector, the images are returned as a 3D array (i.e. a Array{T,3}), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, and the third dimension denotes the index of the image.

julia> FashionMNIST.traintensor() # load all training images
28×28×60000 Array{N0f8,3}:
[...]

julia> FashionMNIST.traintensor(Float32, 1:3) # first three images as Float32
28×28×3 Array{Float32,3}:
[...]

If indices is an Integer, the single image is returned as Matrix{T} in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), and the second dimension denotes the pixel columns (y) of the image.

julia> FashionMNIST.traintensor(1) # load first training image
28×28 Array{N0f8,2}:
[...]

As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image to convert an FashionMNIST array into a vertical-major Julia image with the corrected color values.

julia> FashionMNIST.convert2image(FashionMNIST.traintensor(1)) # convert to column-major colorant array
28×28 Array{Gray{N0f8},2}:
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing FashionMNIST subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/FashionMNIST. In the case that dir does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.FashionMNIST.trainlabelsFunction
trainlabels([indices]; [dir])

Returns the Fashion-MNIST trainset labels corresponding to the given indices as an Int or Vector{Int}. The values of the labels denote the zero-based class-index that they represent (see FashionMNIST.classnames for the corresponding names). If indices is omitted, all labels are returned.

julia> FashionMNIST.trainlabels() # full training set
60000-element Array{Int64,1}:
 9
 0
 ⋮
 0
 5

julia> FashionMNIST.trainlabels(1:3) # first three labels
3-element Array{Int64,1}:
 9
 0
 0

julia> y = FashionMNIST.trainlabels(1) # first label
9

julia> FashionMNIST.classnames()[y + 1] # corresponding name
"Ankle boot"

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing FashionMNIST subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/FashionMNIST. In the case that dir does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.FashionMNIST.traindataFunction
traindata([T = N0f8], [indices]; [dir]) -> images, labels

Returns the Fashion-MNIST trainingset corresponding to the given indices as a two-element tuple. If indices is omitted the full trainingset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.

The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1. The integer values of the labels correspond 1-to-1 the digit that they represent.

train_x, train_y = FashionMNIST.traindata() # full datatset
train_x, train_y = FashionMNIST.traindata(2) # only second observation
train_x, train_y = FashionMNIST.traindata(dir="./FashionMNIST") # custom folder

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing FashionMNIST subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/FashionMNIST. In the case that dir does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

Take a look at FashionMNIST.traintensor and FashionMNIST.trainlabels for more information.

Testset

MLDatasets.FashionMNIST.testtensorFunction
testtensor([T = N0f8], [indices]; [dir]) -> Array{T}

Returns the Fashion-MNIST test images corresponding to the given indices as a multi-dimensional array of eltype T.

The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1.

If the parameter indices is omitted or an AbstractVector, the images are returned as a 3D array (i.e. a Array{T,3}), in which the first dimension corresponds to the pixel rows (x) of the image, the second dimension to the pixel columns (y) of the image, and the third dimension denotes the index of the image.

julia> FashionMNIST.testtensor() # load all test images
28×28×10000 Array{N0f8,3}:
[...]

julia> FashionMNIST.testtensor(Float32, 1:3) # first three images as Float32
28×28×3 Array{Float32,3}:
[...]

If indices is an Integer, the single image is returned as Matrix{T} in horizontal-major layout, which means that the first dimension denotes the pixel rows (x), and the second dimension denotes the pixel columns (y) of the image.

julia> FashionMNIST.testtensor(1) # load first test image
28×28 Array{N0f8,2}:
[...]

As mentioned above, the images are returned in the native horizontal-major layout to preserve the original feature ordering. You can use the utility function convert2image to convert an FashionMNIST array into a vertical-major Julia image with the corrected color values.

julia> FashionMNIST.convert2image(FashionMNIST.testtensor(1)) # convert to column-major colorant array
28×28 Array{Gray{N0f8},2}:
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing FashionMNIST subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/FashionMNIST. In the case that dir does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.FashionMNIST.testlabelsFunction
testlabels([indices]; [dir])

Returns the Fashion-MNIST testset labels corresponding to the given indices as an Int or Vector{Int}. The values of the labels denote the class-index that they represent (see FashionMNIST.classnames for the corresponding names). If indices is omitted, all labels are returned.

julia> FashionMNIST.testlabels() # full test set
10000-element Array{Int64,1}:
 9
 2
 ⋮
 1
 5

julia> FashionMNIST.testlabels(1:3) # first three labels
3-element Array{Int64,1}:
 9
 2
 1

julia> y = FashionMNIST.testlabels(1) # first label
9

julia> FashionMNIST.classnames()[y + 1] # corresponding name
"Ankle boot"

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing FashionMNIST subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/FashionMNIST. In the case that dir does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.FashionMNIST.testdataFunction
testdata([T = N0f8], [indices]; [dir]) -> images, labels

Returns the Fashion-MNIST testset corresponding to the given indices as a two-element tuple. If indices is omitted the full testset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.

The image(s) is/are returned in the native horizontal-major memory layout as a single numeric array of eltype T. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1. The integer values of the labels correspond 1-to-1 the digit that they represent.

test_x, test_y = FashionMNIST.testdata() # full datatset
test_x, test_y = FashionMNIST.testdata(2) # only second observation
test_x, test_y = FashionMNIST.testdata(dir="./FashionMNIST") # custom folder

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing FashionMNIST subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/FashionMNIST. In the case that dir does not yet exist, a download prompt will be triggered. You can also use FashionMNIST.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

Take a look at FashionMNIST.testtensor and FashionMNIST.testlabels for more information.

Utilities

MLDatasets.FashionMNIST.downloadFunction
download([dir]; [i_accept_the_terms_of_use])

Trigger the (interactive) download of the full dataset into "dir". If no dir is provided the dataset will be downloaded into "~/.julia/datadeps/FashionMNIST".

This function will display an interactive dialog unless either the keyword parameter i_accept_the_terms_of_use or the environment variable DATADEPS_ALWAYS_ACCEPT is set to true. Note that using the data responsibly and respecting copyright/terms-of-use remains your responsibility.

Also, the FashionMNIST module is re-exporting convert2features and convert2image from the MNIST module.

References

  • Authors: Han Xiao, Kashif Rasul, Roland Vollgraf

  • Website: https://github.com/zalandoresearch/fashion-mnist

  • [Han Xiao et al. 2017] Han Xiao, Kashif Rasul, and Roland Vollgraf. "Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms." arXiv:1708.07747