DataLoaders.BatchViewCollatedType
BatchViewCollated(data, size; droplast = false)

A batch view of container data with collated batches of size size.

DataLoaders.BufferGetObsParallelType
BufferGetObsParallel(data; useprimary = false)

Like MLDataPattern.BufferGetObs but preloads observations into a buffer ring with multi-threaded workers.

DataLoaders.RingBufferType
RingBuffer(size, buf)

A Channel-like data structure that rotates through size buffers. put!s work by mutating one of the buffers:

put!(ringbuffer) do buf
    rand!(buf)
end

The result can then be take!n:

res = take!(ringbuffer)
Invalidation

Only one result is valid at a time! On the next take!, the previous result will be reused as a buffer and be mutated by a put!

Base.put!Method
put!(f!, ringbuffer::RingBuffer)

Apply f! to a buffer in ringbuffer and put into the results channel.

x = rand(10, 10)
ringbuffer = RingBuffer(1, x)
put!(ringbuffer) do buf
    @test x == buf
    copy!(buf, rand(10, 10))
end
x_ = take!(ringbuffer)
@test !(x ≈ x_)
DataLoaders.DataLoaderFunction
DataLoader(
    data,
    batchsize = 1;
    partial = true,
    collate = true,
    buffered = collate,
    parallel = Threads.nthreads() > 1,
    useprimary = false,
)

Create an efficient iterator of batches over data container data.

Arguments

Positional

  • data: A data container supporting the LearnBase data access pattern
  • batchsize = 1: Number of samples to batch together. Disable batching by setting to nothing.

Keyword

  • partial::Bool = true: Whether to include the last batch when nobs(dataset) is not divisible by batchsize. true ensures all batches have the same size, but some samples might be dropped.
  • buffered::Bool = collate: If buffered is true, loads data inplace using getobs!. See Data containers for details on buffered loading.
  • parallel::Bool = Threads.nthreads() > 1): Whether to load data in parallel, keeping the primary thread is. Default is true if more than one thread is available.
  • useprimary::Bool = false: If false, keep the main thread free when loading data in parallel. Is ignored if parallel is false.

Examples

Creating a data loader with batch size 16 and iterating over it:

data = (rand(128, 10000), rand(1, 10000))
dataloader = DataLoader(data, 16)

for (xs, ys) in dataloader
end

Creating a data loader that uses buffers to load batches:

data = rand(100, 64)
dataloader = DataLoader(data, 16, buffered=true)

size(first(dataloader)) == (100, 16)

Turning off collating:

dataloader = DataLoader(data, 16, collate=false)

# Batches are a vector of observations
length(first(dataloader)) == 16
DataLoaders.batchindicesMethod
batchindices(n, size, i)

Get the indices of batch i with batch size size of a collection with n elements.

Might be a partial batch if i is the last batch and n is not divisible by size.

DataLoaders.eachobsparallelMethod
eachobsparallel(data; useprimary = false, buffered = true)

Parallel data iterator for data container data. Loads data on all available threads (except the first if useprimary is false).

If buffered is true, uses getobs! to load samples inplace.

See also MLDataPattern.eachobs

Order

eachobsparallel does not guarantee that the samples are returned in the correct order.

DataLoaders.obsslicesFunction
obsslices(batch, batchdim = BatchDimLast())

Iterate over views of all observations in a batch. batch can be a batched array, a tuple of batches, or a dict of batches.

batch = rand(10, 10, 4)  # batch size is 4
iter = obsslices(batch, BatchDimLast())
@assert size(first(iter)) == (10, 10)

iter2 = obsslices(batch, BatchDimFirst())
@assert size(first(iter)) == (10, 4)