SVHN format 2 · MLDatasets.jl

Function	Description
`download([dir])`	Trigger interactive download of the dataset
`classnames()`	Return the class names as a vector of strings
`traintensor([T], [indices]; [dir])`	Load the training images as an array of eltype `T`
`trainlabels([indices]; [dir])`	Load the labels for the training images
`traindata([T], [indices]; [dir])`	Load images and labels of the training data
`testtensor([T], [indices]; [dir])`	Load the test images as an array of eltype `T`
`testlabels([indices]; [dir])`	Load the labels for the test images
`testdata([T], [indices]; [dir])`	Load images and labels of the test data
`extratensor([T], [indices]; [dir])`	Load the extra images as an array of eltype `T`
`extralabels([indices]; [dir])`	Load the labels for the extra training images
`extradata([T], [indices]; [dir])`	Load images and labels of the extra training data

Function	Description
`convert2features(array)`	Convert the SVHN tensor to a flat feature matrix
`convert2image(array)`	Convert the SVHN tensor/matrix to a colorant array

MLDatasets.SVHN2 — Module

The Street View House Numbers (SVHN) Dataset

Authors: Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng
Website: http://ufldl.stanford.edu/housenumbers

SVHN was obtained from house numbers in Google Street View images. As such they are quite diverse in terms of orientation and image background. Similar to MNIST, SVHN has 10 classes (the digits 0-9), but unlike MNIST there is more data and the images are a little bigger (32x32 instead of 28x28) with an additional RGB color channel. The dataset is split up into three subsets: 73257 digits for training, 26032 digits for testing, and 531131 additional to use as extra training data.

Interface

SVHN2.traintensor, SVHN2.trainlabels, SVHN2.traindata
SVHN2.testtensor, SVHN2.testlabels, SVHN2.testdata
SVHN2.extratensor, SVHN2.extralabels, SVHN2.extradata

Utilities

SVHN2.download
SVHN2.classnames
SVHN2.convert2features
SVHN2.convert2image

MLDatasets.SVHN2.traintensor — Function

traintensor([T = N0f8], [indices]; [dir]) -> Array{T}

Return the SVHN training images corresponding to the given indices as a multi-dimensional array of eltype T.

The image(s) is/are returned in the native vertical-major memory layout as a single numeric array. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1.

If the parameter indices is omitted or an AbstractVector, the images are returned as a 4D array (i.e. a Array{T,4}), in which the first dimension corresponds to the pixel columns (y) of the image, the second dimension to the pixel rows (x) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.

julia> SVHN2.traintensor() # load all training images
32×32×3×73257 Array{N0f8,4}:
[...]

julia> SVHN.traintensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]

If indices is an Integer, the single image is returned as Array{T,3} in vertical-major layout, which means that the first dimension denotes the pixel columns (y), the second dimension denotes the pixel rows (x), and the third dimension the RGB color channels of the image.

julia> SVHN2.traintensor(1) # load first training image
32×32×3 Array{N0f8,3}:
[...]

As mentioned above, the color channel is encoded in the third dimension. You can use the utility function convert2image to convert an SVHN array into a Julia image with the appropriate RGB eltype.

julia> SVHN2.convert2image(SVHN2.traintensor(1))
32×32 Array{RGB{N0f8},2}:
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.trainlabels — Function

trainlabels([indices]; [dir])

Returns the SVHN training labels corresponding to the given indices as an Int or Vector{Int}. The values of the labels denote the zero-based class-index that they represent (see SVHN2.classnames for the corresponding names). If indices is omitted, all labels are returned.

julia> SVHN2.trainlabels() # full training set
73257-element Array{Int64,1}:
[...]

julia> SVHN2.trainlabels(1:3) # first three labels
3-element Array{Int64,1}:
[...]

julia> SVHN2.trainlabels(1) # first label
[...]

julia> SVHN2.classnames()[SVHN2.trainlabels(1)] # corresponding class
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.traindata — Function

traindata([T = N0f8], [indices]; [dir]) -> images, labels

Returns the SVHN trainset corresponding to the given indices as a two-element tuple. If indices is omitted the full trainset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.

The image(s) is/are returned in the native vertical-major memory layout as a single numeric array of eltype T. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1. You can use the utility function convert2image to convert an SVHN array into a Julia image with the appropriate RGB eltype. The integer values of the labels correspond 1-to-1 the digit that they represent with the exception of 0 which is encoded as 10.

Note that because of the nature of how the dataset is stored on disk, SVHN2.traindata will always load the full trainset, regardless of which observations are requested. In the case indices are provided by the user, it will simply result in a sub-setting. This option is just provided for convenience.

images, labels = SVHN2.traindata() # full dataset
images, labels = SVHN2.traindata(2) # only second observation
images, labels = SVHN2.traindata(dir="./SVHN") # custom folder

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.testtensor — Function

testtensor([T = N0f8], [indices]; [dir]) -> Array{T}

Return the SVHN test images corresponding to the given indices as a multi-dimensional array of eltype T.

The image(s) is/are returned in the native vertical-major memory layout as a single numeric array. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1.

If the parameter indices is omitted or an AbstractVector, the images are returned as a 4D array (i.e. a Array{T,4}), in which the first dimension corresponds to the pixel columns (y) of the image, the second dimension to the pixel rows (x) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.

julia> SVHN2.testtensor() # load all test images
32×32×3×26032 Array{N0f8,4}:
[...]

julia> SVHN.testtensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]

If indices is an Integer, the single image is returned as Array{T,3} in vertical-major layout, which means that the first dimension denotes the pixel columns (y), the second dimension denotes the pixel rows (x), and the third dimension the RGB color channels of the image.

julia> SVHN2.testtensor(1) # load first test image
32×32×3 Array{N0f8,3}:
[...]

As mentioned above, the color channel is encoded in the third dimension. You can use the utility function convert2image to convert an SVHN array into a Julia image with the appropriate RGB eltype.

julia> SVHN2.convert2image(SVHN2.testtensor(1))
32×32 Array{RGB{N0f8},2}:
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.testlabels — Function

testlabels([indices]; [dir])

Returns the SVHN test labels corresponding to the given indices as an Int or Vector{Int}. The values of the labels denote the zero-based class-index that they represent (see SVHN2.classnames for the corresponding names). If indices is omitted, all labels are returned.

julia> SVHN2.testlabels() # full test set
26032-element Array{Int64,1}:
[...]

julia> SVHN2.testlabels(1:3) # first three labels
3-element Array{Int64,1}:
[...]

julia> SVHN2.testlabels(1) # first label
[...]

julia> SVHN2.classnames()[SVHN2.testlabels(1)] # corresponding class
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.testdata — Function

testdata([T = N0f8], [indices]; [dir]) -> images, labels

Returns the SVHN testset corresponding to the given indices as a two-element tuple. If indices is omitted the full testset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.

The image(s) is/are returned in the native vertical-major memory layout as a single numeric array of eltype T. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1. You can use the utility function convert2image to convert an SVHN array into a Julia image with the appropriate RGB eltype. The integer values of the labels correspond 1-to-1 the digit that they represent with the exception of 0 which is encoded as 10.

Note that because of the nature of how the dataset is stored on disk, SVHN2.testdata will always load the full testset, regardless of which observations are requested. In the case indices are provided by the user, it will simply result in a sub-setting. This option is just provided for convenience.

images, labels = SVHN2.testdata() # full dataset
images, labels = SVHN2.testdata(2) # only second observation
images, labels = SVHN2.testdata(dir="./SVHN") # custom folder

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.extratensor — Function

extratensor([T = N0f8], [indices]; [dir]) -> Array{T}

Return the SVHN extra training images corresponding to the given indices as a multi-dimensional array of eltype T.

The image(s) is/are returned in the native vertical-major memory layout as a single numeric array. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1.

If the parameter indices is omitted or an AbstractVector, the images are returned as a 4D array (i.e. a Array{T,4}), in which the first dimension corresponds to the pixel columns (y) of the image, the second dimension to the pixel rows (x) of the image, the third dimension the RGB color channels, and the fourth dimension denotes the index of the image.

julia> SVHN2.extratensor() # load all extra training images
32×32×3×531131 Array{N0f8,4}:
[...]

julia> SVHN.extratensor(Float32, 1:3) # first three images as Float32
32×32×3×3 Array{Float32,4}:
[...]

If indices is an Integer, the single image is returned as Array{T,3} in vertical-major layout, which means that the first dimension denotes the pixel columns (y), the second dimension denotes the pixel rows (x), and the third dimension the RGB color channels of the image.

julia> SVHN2.extratensor(1) # load first extra training image
32×32×3 Array{N0f8,3}:
[...]

As mentioned above, the color channel is encoded in the third dimension. You can use the utility function convert2image to convert an SVHN array into a Julia image with the appropriate RGB eltype.

julia> SVHN2.convert2image(SVHN2.extratensor(1))
32×32 Array{RGB{N0f8},2}:
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.extralabels — Function

extralabels([indices]; [dir])

Returns the SVHN extra training labels corresponding to the given indices as an Int or Vector{Int}. The values of the labels denote the zero-based class-index that they represent (see SVHN2.classnames for the corresponding names). If indices is omitted, all labels are returned.

julia> SVHN2.extralabels() # full extra training set
531131-element Array{Int64,1}:
[...]

julia> SVHN2.extralabels(1:3) # first three labels
3-element Array{Int64,1}:
[...]

julia> SVHN2.extralabels(1) # first label
[...]

julia> SVHN2.classnames()[SVHN2.extralabels(1)] # corresponding class
[...]

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.extradata — Function

extradata([T = N0f8], [indices]; [dir]) -> images, labels

Returns the SVHN extra trainset corresponding to the given indices as a two-element tuple. If indices is omitted the full extra trainset is returned. The first element of the return values will be the images as a multi-dimensional array, and the second element the corresponding labels as integers.

The image(s) is/are returned in the native vertical-major memory layout as a single numeric array of eltype T. If T <: Integer, then all values will be within 0 and 255, otherwise the values are scaled to be between 0 and 1. You can use the utility function convert2image to convert an SVHN array into a Julia image with the appropriate RGB eltype. The integer values of the labels correspond 1-to-1 the digit that they represent with the exception of 0 which is encoded as 10.

Note that because of the nature of how the dataset is stored on disk, SVHN2.extradata will always load the full extra trainset, regardless of which observations are requested. In the case indices are provided by the user, it will simply result in a sub-setting. This option is just provided for convenience.

images, labels = SVHN2.extradata() # full dataset
images, labels = SVHN2.extradata(2) # only second observation
images, labels = SVHN2.extradata(dir="./SVHN") # custom folder

The corresponding resource file(s) of the dataset is/are expected to be located in the specified directory dir. If dir is omitted the directories in DataDeps.default_loadpath will be searched for an existing SVHN2 subfolder. In case no such subfolder is found, dir will default to ~/.julia/datadeps/SVHN2. In the case that dir does not yet exist, a download prompt will be triggered. You can also use SVHN2.download([dir]) explicitly for pre-downloading (or re-downloading) the dataset. Please take a look at the documentation of the package DataDeps.jl for more detail and configuration options.

MLDatasets.SVHN2.download — Function

download([dir]; [i_accept_the_terms_of_use])

Trigger the (interactive) download of the full dataset into "dir". If no dir is provided the dataset will be downloaded into "~/.julia/datadeps/SVHN2".

This function will display an interactive dialog unless either the keyword parameter i_accept_the_terms_of_use or the environment variable DATADEPS_ALWAY_ACCEPT is set to true. Note that using the data responsibly and respecting copyright/terms-of-use remains your responsibility.

MLDatasets.SVHN2.classnames — Function

classnames() -> Vector{Int}

Return the 10 digits for the SVHN classes as a vector of integers.

MLDatasets.SVHN2.convert2features — Function

convert2features(array)

Convert the given SVHN tensor to a feature matrix (or feature vector in the case of a single image). The purpose of this function is to drop the spatial dimensions such that traditional ML algorithms can process the dataset.

julia> SVHN2.convert2features(SVHN2.traindata(Float32)[1]) # full training data
3072×50000 Array{Float32,2}:
[...]

julia> SVHN2.convert2features(SVHN2.traindata(Float32,1)[1]) # first observation
3072-element Array{Float32,1}:
[...]

MLDatasets.SVHN2.convert2image — Function

convert2image(array) -> Array{RGB}

Convert the given SVHN tensor (or feature vector/matrix) to a RGB array.

julia> SVHN2.convert2image(SVHN2.traindata()[1]) # full training dataset
32×32×50000 Array{RGB{N0f8},3}:
[...]

julia> SVHN2.convert2image(SVHN2.traindata(1)[1]) # first training image
32×32 Array{RGB{N0f8},2}:
[...]

The Street View House Numbers (SVHN) Dataset

Contents

Overview

API Documentation

Trainingset

Testset

Extraset

Utilities

References