DataLoaders.BatchViewCollated
— TypeBatchViewCollated(data, size; droplast = false)
A batch view of container data
with collated batches of size size
.
DataLoaders.BufferGetObsParallel
— TypeBufferGetObsParallel(data; useprimary = false)
Like MLDataPattern.BufferGetObs
but preloads observations into a buffer ring with multi-threaded workers.
DataLoaders.RingBuffer
— TypeRingBuffer(size, buf)
A Channel
-like data structure that rotates through size
buffers. put!
s work by mutating one of the buffers:
put!(ringbuffer) do buf
rand!(buf)
end
The result can then be take!
n:
res = take!(ringbuffer)
Only one result is valid at a time! On the next take!
, the previous result will be reused as a buffer and be mutated by a put!
Base.put!
— Methodput!(f!, ringbuffer::RingBuffer)
Apply f! to a buffer in ringbuffer
and put into the results channel.
x = rand(10, 10)
ringbuffer = RingBuffer(1, x)
put!(ringbuffer) do buf
@test x == buf
copy!(buf, rand(10, 10))
end
x_ = take!(ringbuffer)
@test !(x ≈ x_)
DataLoaders.DataLoader
— FunctionDataLoader(
data,
batchsize = 1;
partial = true,
collate = true,
buffered = collate,
parallel = Threads.nthreads() > 1,
useprimary = false,
)
Create an efficient iterator of batches over data container data
.
Arguments
Positional
data
: A data container supporting theLearnBase
data access patternbatchsize = 1
: Number of samples to batch together. Disable batching by setting tonothing
.
Keyword
partial::Bool = true
: Whether to include the last batch whennobs(dataset)
is not divisible bybatchsize
.true
ensures all batches have the same size, but some samples might be dropped.buffered::Bool = collate
: Ifbuffered
istrue
, loads data inplace usinggetobs!
. See Data containers for details on buffered loading.parallel::Bool = Threads.nthreads() > 1)
: Whether to load data in parallel, keeping the primary thread is. Default istrue
if more than one thread is available.useprimary::Bool = false
: Iffalse
, keep the main thread free when loading data in parallel. Is ignored ifparallel
isfalse
.
Examples
Creating a data loader with batch size 16 and iterating over it:
data = (rand(128, 10000), rand(1, 10000))
dataloader = DataLoader(data, 16)
for (xs, ys) in dataloader
end
Creating a data loader that uses buffers to load batches:
data = rand(100, 64)
dataloader = DataLoader(data, 16, buffered=true)
size(first(dataloader)) == (100, 16)
Turning off collating:
dataloader = DataLoader(data, 16, collate=false)
# Batches are a vector of observations
length(first(dataloader)) == 16
DataLoaders.batchindices
— Methodbatchindices(n, size, i)
Get the indices of batch i
with batch size size
of a collection with n
elements.
Might be a partial batch if i
is the last batch and n
is not divisible by size
.
DataLoaders.collate
— Methodcollate(samples)
Collates a vector of samples into a single batch. See collating.
DataLoaders.eachobsparallel
— Methodeachobsparallel(data; useprimary = false, buffered = true)
Parallel data iterator for data container data
. Loads data on all available threads (except the first if useprimary
is false
).
If buffered
is true
, uses getobs!
to load samples inplace.
See also MLDataPattern.eachobs
eachobsparallel
does not guarantee that the samples are returned in the correct order.
DataLoaders.obsslices
— Functionobsslices(batch, batchdim = BatchDimLast())
Iterate over views of all observations in a batch
. batch
can be a batched array, a tuple of batches, or a dict of batches.
batch = rand(10, 10, 4) # batch size is 4
iter = obsslices(batch, BatchDimLast())
@assert size(first(iter)) == (10, 10)
iter2 = obsslices(batch, BatchDimFirst())
@assert size(first(iter)) == (10, 4)