BandwidthBenchmark.bwbench
— MethodMeasure the memory bandwidth using the following streaming kernels with corresponding data access pattern (Notation: S - store, L - load, WA - write allocate). All variables are vectors, s
is a scalar:
init (S1, WA): Initilize an array: `a = s`. Store only.
sum (L1): Vector reduction: `s += a`. Load only.
copy (L1, S1, WA): Classic memcopy: `a = b`.
update (L1, S1): Update vector: `a = a * scalar`. Also load + store but without write allocate.
triad (L2, S1, WA): Stream triad: `a = b + c * scalar`.
daxpy (L2, S1): Daxpy: `a = a + b * scalar`.
striad (L3, S1, WA): Schoenauer triad: `a = b + c * d`.
sdaxpy (L3, S1): Schoenauer triad without write allocate: `a = a + b * c`.
Keyword arguments:
N
(default:120_000_000
): length of vectorsnthreads
(default:Threads.nthreads()
): number of Julia threads to useniter
(default:10
): # of times we repeat the measurementalignment
(default:64
): array alignmentverbose
(default:false
): print result table + thread information etc.write_allocate
(default:false
): include write allocate compensation factors
BandwidthBenchmark.bwscaling
— MethodUses bwbench
to measure the memory bandwidth for an increasing number of threads (1:max_nthreads
). Returns a matrix whose rows correspond to the number of threads and different columns hold the bandwidth results for each kernel.
BandwidthBenchmark.bwscaling_memory_domains
— MethodSimilar to bwscaling
but measures the memory bandwidth scaling within and across memory domains. Returns a DataFrame
in which each row contains the kernel name, the number of threads per memory domain, the number of domains considered, and the measured memory bandwidth (in MB/s).
Keyword arguments
domains
: memory domains to consider (logical indices, i.e. starting at 1)max_nthreads
: maximal number of threads per memory domain to consider