BatchViewCollated(data, size; droplast = false)

A batch view of container data with collated batches of size size.

BufferGetObsParallel(data; useprimary = false)

Like MLDataPattern.BufferGetObs but preloads observations into a buffer ring with multi-threaded workers.

RingBuffer(size, buf)

A Channel-like data structure that rotates through size buffers. put!s work by mutating one of the buffers:

put!(ringbuffer) do buf

The result can then be take!n:

res = take!(ringbuffer)

Only one result is valid at a time! On the next take!, the previous result will be reused as a buffer and be mutated by a put!

put!(f!, ringbuffer::RingBuffer)

Apply f! to a buffer in ringbuffer and put into the results channel.

x = rand(10, 10)
ringbuffer = RingBuffer(1, x)
put!(ringbuffer) do buf
    @test x == buf
    copy!(buf, rand(10, 10))
x_ = take!(ringbuffer)
@test !(x ≈ x_)
    batchsize = 1;
    partial = true,
    collate = true,
    buffered = collate,
    parallel = Threads.nthreads() > 1,
    useprimary = false,

Create an efficient iterator of batches over data container data.



  • data: A data container supporting the LearnBase data access pattern
  • batchsize = 1: Number of samples to batch together. Disable batching by setting to nothing.


  • partial::Bool = true: Whether to include the last batch when nobs(dataset) is not divisible by batchsize. true ensures all batches have the same size, but some samples might be dropped.
  • buffered::Bool = collate: If buffered is true, loads data inplace using getobs!. See Data containers for details on buffered loading.
  • parallel::Bool = Threads.nthreads() > 1): Whether to load data in parallel, keeping the primary thread is. Default is true if more than one thread is available.
  • useprimary::Bool = false: If false, keep the main thread free when loading data in parallel. Is ignored if parallel is false.


Creating a data loader with batch size 16 and iterating over it:

data = (rand(128, 10000), rand(1, 10000))
dataloader = DataLoader(data, 16)

for (xs, ys) in dataloader

Creating a data loader that uses buffers to load batches:

data = rand(100, 64)
dataloader = DataLoader(data, 16, buffered=true)

size(first(dataloader)) == (100, 16)

Turning off collating:

dataloader = DataLoader(data, 16, collate=false)

# Batches are a vector of observations
length(first(dataloader)) == 16
batchindices(n, size, i)

Get the indices of batch i with batch size size of a collection with n elements.

Might be a partial batch if i is the last batch and n is not divisible by size.

eachobsparallel(data; useprimary = false, buffered = true)

Parallel data iterator for data container data. Loads data on all available threads (except the first if useprimary is false).

If buffered is true, uses getobs! to load samples inplace.

See also MLDataPattern.eachobs


eachobsparallel does not guarantee that the samples are returned in the correct order.

obsslices(batch, batchdim = BatchDimLast())

Iterate over views of all observations in a batch. batch can be a batched array, a tuple of batches, or a dict of batches.

batch = rand(10, 10, 4)  # batch size is 4
iter = obsslices(batch, BatchDimLast())
@assert size(first(iter)) == (10, 10)

iter2 = obsslices(batch, BatchDimFirst())
@assert size(first(iter)) == (10, 4)