# GFlops.jl

When code performance is an issue, it is sometimes useful to get absolute
performance measurements in order to objectivise what is "slow" or
"fast". `GFlops.jl`

leverages the power of `Cassette.jl`

to automatically count
the number of floating-point operations in a piece of code. When combined with
the accuracy of `BenchmarkTools`

, this allows for easy and absolute performance
measurements.

## Installation

This package is registered and can therefore be simply be installed with

pkg> add GFlops

## Example use

This simple example shows how to track the number of operations in a vector summation:

julia> using GFlops
julia> x = rand(1000);
julia> @count_ops sum($x)
Flop Counter: 999 flop
┌─────┬─────────┐
│ │ Float64 │
├─────┼─────────┤
│ add │ 999 │
└─────┴─────────┘
julia> @gflops sum($x);
8.86 GFlops, 12.76% peak (9.99e+02 flop, 1.13e-07 s, 0 alloc: 0 bytes)

`GFlops.jl`

internally tracks several types of Floating-Point operations, for
both 32-bit and 64-bit operands. Pretty-printing a Flop Counter only
shows non-zero entries, but any individual counter can be accessed:

julia> function mixed_dot(x, y)
acc = 0.0
@inbounds @simd for i in eachindex(x, y)
acc += x[i] * y[i]
end
acc
end
mixed_dot (generic function with 1 method)
julia> x = rand(Float32, 1000); y = rand(Float32, 1000);
julia> cnt = @count_ops mixed_dot($x, $y)
Flop Counter: 1000 flop
┌─────┬─────────┬─────────┐
│ │ Float32 │ Float64 │
├─────┼─────────┼─────────┤
│ add │ 0 │ 1000 │
│ mul │ 1000 │ 0 │
└─────┴─────────┴─────────┘
julia> fieldnames(GFlops.Counter)
(:fma32, :fma64, :muladd32, :muladd64, :add32, :add64, :sub32, ...)
julia> cnt.add64
1000
julia> @gflops mixed_dot($x, $y);
9.91 GFlops, 13.36% peak (2.00e+03 flop, 2.02e-07 s, 0 alloc: 0 bytes)

## Caveats

### Fused Multiplication and Addition: FMA & MulAdd

On systems which support them, FMAs and MulAdds compute two operations (an
addition and a multiplication) in one instruction. `@count_ops`

counts each
individual FMA/MulAdd as one operation, which makes it easier to interpret
counters. However, `@gflops`

will count two floating-point operations for each
FMA, in accordance to the way high-performance benchmarks usually behave:

julia> x = 0.5; coeffs = rand(10);
# 9 MulAdds but 18 flop
julia> cnt = @count_ops evalpoly($x, $coeffs)
Flop Counter: 18 flop
┌────────┬─────────┐
│ │ Float64 │
├────────┼─────────┤
│ muladd │ 9 │
└────────┴─────────┘
julia> @gflops evalpoly($x, $coeffs);
0.87 GFlops, 1.63% peak (1.80e+01 flop, 2.06e-08 s, 0 alloc: 0 bytes)

### Non-julia code

`GFlops.jl`

does not see what happens outside the realm of Julia code. It
especially does not see operations performed in external libraries such as BLAS
calls:

julia> using LinearAlgebra
julia> @count_ops dot($x, $y)
Flop Counter: 0 flop

This is a known issue; we'll try and find a way to circumvent the problem.