BandwidthBenchmark.bwbenchMethod

Measure the memory bandwidth using the following streaming kernels with corresponding data access pattern (Notation: S - store, L - load, WA - write allocate). All variables are vectors, s is a scalar:

init (S1, WA): Initilize an array: `a = s`. Store only.
sum (L1): Vector reduction: `s += a`. Load only.
copy (L1, S1, WA): Classic memcopy: `a = b`.
update (L1, S1): Update vector: `a = a * scalar`. Also load + store but without write allocate.
triad (L2, S1, WA): Stream triad: `a = b + c * scalar`.
daxpy (L2, S1): Daxpy: `a = a + b * scalar`.
striad (L3, S1, WA): Schoenauer triad: `a = b + c * d`.
sdaxpy (L3, S1): Schoenauer triad without write allocate: `a = a + b * c`.

Keyword arguments:

  • N (default: 120_000_000): length of vectors
  • nthreads (default: Threads.nthreads()): number of Julia threads to use
  • niter (default: 10): # of times we repeat the measurement
  • alignment (default: 64): array alignment
  • verbose (default: false): print result table + thread information etc.
  • write_allocate (default: false): include write allocate compensation factors
BandwidthBenchmark.bwscalingMethod

Uses bwbench to measure the memory bandwidth for an increasing number of threads (1:max_nthreads). Returns a matrix whose rows correspond to the number of threads and different columns hold the bandwidth results for each kernel.

BandwidthBenchmark.bwscaling_memory_domainsMethod

Similar to bwscaling but measures the memory bandwidth scaling within and across memory domains. Returns a DataFrame in which each row contains the kernel name, the number of threads per memory domain, the number of domains considered, and the measured memory bandwidth (in MB/s).

Keyword arguments

  • domains: memory domains to consider (logical indices, i.e. starting at 1)
  • max_nthreads: maximal number of threads per memory domain to consider