Float8s.jl
Finally a number type that you can count with your fingers. Super Mario and Zelda would be proud.
Comes in two flavours: Float8
has 3 exponent bits and 4 fraction bits, Float8_4
has 4 exponent bits and 3 fraction bits.
Both rely on conversion to Float32 to perform any arithmetic operation, similar to Float16
.
Example use
julia> using Float8s
julia> a = Float8(4)
Float8(4.0)
julia> b = Float8(3.14159)
Float8(3.125)
julia> a+b
Float8(7.0)
julia> sqrt(a)
Float8(2.0)
julia> a^2
Inf8
Most arithmetic operations are implemented. If you would like to have an additional feature, raise an issue.
Installation
Float8s.jl
is not yet registered, for the time being do
(v1.3) pkg> add https://github.com/milankl/Float8s.jl
Benchmarking
julia> using BenchmarkTools
julia> A = Float8.(randn(300,300));
julia> @btime Float32.($A);
413.303 μs (2 allocations: 351.64 KiB)
julia> 413.303/300^2*1000
4.592255555555555
Conversions from Float8 to Float32 take about 4.5ns, conversions in the other direction are about 2x slower and slightly slower than for Float16
.
julia> A = Float32.(randn(300,300));
julia> @btime Float16.($A);
674.123 μs (2 allocations: 175.89 KiB)
julia> @btime Float8.($A);
955.196 μs (2 allocations: 88.02 KiB)