Build Status

StochasticRounding

This package exports Float16sr and BFloat16sr. Two number formats that behave like their deterministic counterparts but with stochastic rounding that is proportional to the distance of the next representable numbers and therefore exact in expectation (see also example below in "Usage"). Although there is currently no known hardware implementation available, Graphcore is working on IPUs with stochastic rounding. Stochastic rounding makes the current Float16/BFloat16 software implementations considerably slower, but only x15/x3, respectively. Xoroshio128Plus, a random number generator from the Xorshift family, is used through the RandomNumbers.jl package.

Stochastic rounding is only applied on arithmetic operations, and not on type conversions or for subnormal numbers (standard round to nearest instead).

Usage

julia> a = BFloat16sr(1.0)
BFloat16sr(1.0)
julia> a/3
BFloat16sr(0.33398438)
julia> a/3
BFloat16sr(0.33203125)

As 1/3 is not exactly representable the rounding will be at 66.6% chance towards 0.33398438 and at 33.3% towards 0.33203125 such that in expectation the result is 0.33333... and therefore exact. You can use BFloat16_chance_roundup(x::Float32) to get the chance that x will be round up.

Performance

julia> using StochasticRounding, BenchmarkTools
julia> A = rand(Float32,1000,1000);
julia> B = BFloat16.(A);
julia> C = BFloat16sr.(A);
julia> D = Float16.(A);
julia> E = Float16sr.(A);
julia> @btime +($A,$A);                # Float32
  304.975 μs (2 allocations: 3.81 MiB)

julia> @btime +($B,$B);                # BFloat16
  569.064 μs (2 allocations: 1.91 MiB)

julia> @btime +($C,$C);                # BFloat16sr
  8.354 ms (8 allocations: 1.91 MiB)

julia> @btime +($D,$D);                # Float16
  7.377 ms (2 allocations: 1.91 MiB)

julia> @btime +($E,$E);                # Float16sr
  23.423 ms (8 allocations: 1.91 MiB)

Stochastic rounding imposes a x15 performance decrease for BFloat16 and x3 for Float16.