Docstrings · DataLoaders.jl

DataLoaders.BatchViewCollated — Type

BatchViewCollated(data, size; droplast = false)

A batch view of container data with collated batches of size size.

DataLoaders.BufferGetObsParallel — Type

BufferGetObsParallel(data; useprimary = false)

Like MLDataPattern.BufferGetObs but preloads observations into a buffer ring with multi-threaded workers.

DataLoaders.RingBuffer — Type

RingBuffer(size, buf)

A Channel-like data structure that rotates through size buffers. put!s work by mutating one of the buffers:

put!(ringbuffer) do buf
    rand!(buf)
end

The result can then be take!n:

res = take!(ringbuffer)

Invalidation

Only one result is valid at a time! On the next take!, the previous result will be reused as a buffer and be mutated by a put!

Base.put! — Method

put!(f!, ringbuffer::RingBuffer)

Apply f! to a buffer in ringbuffer and put into the results channel.

x = rand(10, 10)
ringbuffer = RingBuffer(1, x)
put!(ringbuffer) do buf
    @test x == buf
    copy!(buf, rand(10, 10))
end
x_ = take!(ringbuffer)
@test !(x ≈ x_)

DataLoaders.DataLoader — Function

DataLoader(
    data,
    batchsize = 1;
    partial = true,
    collate = true,
    buffered = collate,
    parallel = Threads.nthreads() > 1,
    useprimary = false,
)

Create an efficient iterator of batches over data container data.

Arguments

Positional

data: A data container supporting the LearnBase data access pattern
batchsize = 1: Number of samples to batch together. Disable batching by setting to nothing.

Keyword

partial::Bool = true: Whether to include the last batch when nobs(dataset) is not divisible by batchsize. true ensures all batches have the same size, but some samples might be dropped.
buffered::Bool = collate: If buffered is true, loads data inplace using getobs!. See Data containers for details on buffered loading.
parallel::Bool = Threads.nthreads() > 1): Whether to load data in parallel, keeping the primary thread is. Default is true if more than one thread is available.
useprimary::Bool = false: If false, keep the main thread free when loading data in parallel. Is ignored if parallel is false.

Examples

Creating a data loader with batch size 16 and iterating over it:

data = (rand(128, 10000), rand(1, 10000))
dataloader = DataLoader(data, 16)

for (xs, ys) in dataloader
end

Creating a data loader that uses buffers to load batches:

data = rand(100, 64)
dataloader = DataLoader(data, 16, buffered=true)

size(first(dataloader)) == (100, 16)

Turning off collating:

dataloader = DataLoader(data, 16, collate=false)

# Batches are a vector of observations
length(first(dataloader)) == 16

DataLoaders.batchindices — Method

batchindices(n, size, i)

Get the indices of batch i with batch size size of a collection with n elements.

Might be a partial batch if i is the last batch and n is not divisible by size.

DataLoaders.collate — Method

collate(samples)

Collates a vector of samples into a single batch. See collating.

DataLoaders.eachobsparallel — Method

eachobsparallel(data; useprimary = false, buffered = true)

Parallel data iterator for data container data. Loads data on all available threads (except the first if useprimary is false).

If buffered is true, uses getobs! to load samples inplace.