Docstrings · BandwidthBenchmark.jl

BandwidthBenchmark.bwbench — Method

Measure the memory bandwidth using the following streaming kernels with corresponding data access pattern (Notation: S - store, L - load, WA - write allocate). All variables are vectors, s is a scalar:

init (S1, WA): Initilize an array: `a = s`. Store only.
sum (L1): Vector reduction: `s += a`. Load only.
copy (L1, S1, WA): Classic memcopy: `a = b`.
update (L1, S1): Update vector: `a = a * scalar`. Also load + store but without write allocate.
triad (L2, S1, WA): Stream triad: `a = b + c * scalar`.
daxpy (L2, S1): Daxpy: `a = a + b * scalar`.
striad (L3, S1, WA): Schoenauer triad: `a = b + c * d`.
sdaxpy (L3, S1): Schoenauer triad without write allocate: `a = a + b * c`.

Keyword arguments:

N (default: 120_000_000): length of vectors
nthreads (default: Threads.nthreads()): number of Julia threads to use
niter (default: 10): # of times we repeat the measurement
alignment (default: 64): array alignment
verbose (default: false): print result table + thread information etc.
write_allocate (default: false): include write allocate compensation factors

BandwidthBenchmark.bwscaling — Method

Uses bwbench to measure the memory bandwidth for an increasing number of threads (1:max_nthreads). Returns a matrix whose rows correspond to the number of threads and different columns hold the bandwidth results for each kernel.

BandwidthBenchmark.bwscaling_memory_domains — Method

Similar to bwscaling but measures the memory bandwidth scaling within and across memory domains. Returns a DataFrame in which each row contains the kernel name, the number of threads per memory domain, the number of domains considered, and the measured memory bandwidth (in MB/s).

Keyword arguments

domains: memory domains to consider (logical indices, i.e. starting at 1)
max_nthreads: maximal number of threads per memory domain to consider