Sparsity

We present sparsity handling with DifferentiationInterface.jl.

using BenchmarkTools
using DifferentiationInterface
import ForwardDiff, Zygote

Sparse AD is very useful when Jacobian or Hessian matrices have a lot of zeros. So let us write functions that satisfy this property.

f_sparse_vector(x::AbstractVector) = diff(x .^ 2) + diff(reverse(x .^ 2))
f_sparse_scalar(x::AbstractVector) = sum(f_sparse_vector(x) .^ 2)

Dense backends

When we use the jacobian or hessian operator with a dense backend, we get a dense matrix with plenty of zeros.

dense_first_order_backend = AutoForwardDiff()
J_dense = jacobian(f_sparse_vector, dense_first_order_backend, float.(1:10))
9×10 Matrix{Float64}:
 -2.0   4.0   0.0   0.0    0.0    0.0    0.0    0.0   18.0  -20.0
  0.0  -4.0   6.0   0.0    0.0    0.0    0.0   16.0  -18.0    0.0
  0.0   0.0  -6.0   8.0    0.0    0.0   14.0  -16.0    0.0    0.0
  0.0   0.0   0.0  -8.0   10.0   12.0  -14.0    0.0    0.0    0.0
  0.0   0.0   0.0   0.0    0.0    0.0    0.0    0.0    0.0    0.0
  0.0   0.0   0.0   8.0  -10.0  -12.0   14.0    0.0    0.0    0.0
  0.0   0.0   6.0  -8.0    0.0    0.0  -14.0   16.0    0.0    0.0
  0.0   4.0  -6.0   0.0    0.0    0.0    0.0  -16.0   18.0    0.0
  2.0  -4.0   0.0   0.0    0.0    0.0    0.0    0.0  -18.0   20.0
dense_second_order_backend = SecondOrder(AutoForwardDiff(), AutoZygote())
H_dense = hessian(f_sparse_scalar, dense_second_order_backend, float.(1:10))
10×10 Matrix{Float64}:
  144.0   -32.0     0.0     0.0     0.0  …     0.0      0.0   -144.0    160.0
  -32.0    96.0   -96.0     0.0     0.0        0.0   -256.0    576.0   -320.0
    0.0   -96.0   256.0  -192.0     0.0     -336.0    768.0   -432.0      0.0
    0.0     0.0  -192.0   480.0  -320.0      896.0   -512.0      0.0      0.0
    0.0     0.0     0.0  -320.0   368.0     -560.0      0.0      0.0      0.0
    0.0     0.0     0.0  -384.0   480.0  …  -672.0      0.0      0.0      0.0
    0.0     0.0  -336.0   896.0  -560.0     1536.0   -896.0      0.0      0.0
    0.0  -256.0   768.0  -512.0     0.0     -896.0   2016.0  -1152.0      0.0
 -144.0   576.0  -432.0     0.0     0.0        0.0  -1152.0   2560.0  -1440.0
  160.0  -320.0     0.0     0.0     0.0        0.0      0.0  -1440.0   1728.0

The results are correct but the procedure is very slow. By using a sparse backend, we can get the runtime to increase with the number of nonzero elements, instead of the total number of elements.

Sparse backends

Recipe to create a sparse backend: combine a dense backend, a sparsity detector and a coloring algorithm inside AutoSparse. The following are reasonable defaults:

using SparseConnectivityTracer: TracerSparsityDetector
using SparseMatrixColorings: GreedyColoringAlgorithm

sparse_first_order_backend = AutoSparse(
    AutoForwardDiff();
    sparsity_detector=TracerSparsityDetector(),
    coloring_algorithm=GreedyColoringAlgorithm(),
)

sparse_second_order_backend = AutoSparse(
    SecondOrder(AutoForwardDiff(), AutoZygote());
    sparsity_detector=TracerSparsityDetector(),
    coloring_algorithm=GreedyColoringAlgorithm(),
)

Now the resulting matrices are sparse:

jacobian(f_sparse_vector, sparse_first_order_backend, float.(1:10))
9×10 SparseArrays.SparseMatrixCSC{Float64, Int64} with 34 stored entries:
 -2.0   4.0    ⋅     ⋅      ⋅      ⋅      ⋅      ⋅    18.0  -20.0
   ⋅   -4.0   6.0    ⋅      ⋅      ⋅      ⋅    16.0  -18.0     ⋅ 
   ⋅     ⋅   -6.0   8.0     ⋅      ⋅    14.0  -16.0     ⋅      ⋅ 
   ⋅     ⋅     ⋅   -8.0   10.0   12.0  -14.0     ⋅      ⋅      ⋅ 
   ⋅     ⋅     ⋅     ⋅     0.0    0.0     ⋅      ⋅      ⋅      ⋅ 
   ⋅     ⋅     ⋅    8.0  -10.0  -12.0   14.0     ⋅      ⋅      ⋅ 
   ⋅     ⋅    6.0  -8.0     ⋅      ⋅   -14.0   16.0     ⋅      ⋅ 
   ⋅    4.0  -6.0    ⋅      ⋅      ⋅      ⋅   -16.0   18.0     ⋅ 
  2.0  -4.0    ⋅     ⋅      ⋅      ⋅      ⋅      ⋅   -18.0   20.0
hessian(f_sparse_scalar, sparse_second_order_backend, float.(1:10))
10×10 SparseArrays.SparseMatrixCSC{Float64, Int64} with 52 stored entries:
  144.0   -32.0      ⋅       ⋅       ⋅   …      ⋅        ⋅    -144.0    160.0
  -32.0    96.0   -96.0      ⋅       ⋅          ⋅    -256.0    576.0   -320.0
     ⋅    -96.0   256.0  -192.0      ⋅      -336.0    768.0   -432.0       ⋅ 
     ⋅       ⋅   -192.0   480.0  -320.0      896.0   -512.0       ⋅        ⋅ 
     ⋅       ⋅       ⋅   -320.0   368.0     -560.0       ⋅        ⋅        ⋅ 
     ⋅       ⋅       ⋅   -384.0   480.0  …  -672.0       ⋅        ⋅        ⋅ 
     ⋅       ⋅   -336.0   896.0  -560.0     1536.0   -896.0       ⋅        ⋅ 
     ⋅   -256.0   768.0  -512.0      ⋅      -896.0   2016.0  -1152.0       ⋅ 
 -144.0   576.0  -432.0      ⋅       ⋅          ⋅   -1152.0   2560.0  -1440.0
  160.0  -320.0      ⋅       ⋅       ⋅          ⋅        ⋅   -1440.0   1728.0

Sparse preparation

In the examples above, we didn't use preparation. Sparse preparation is more costly than dense preparation, but it is even more essential. Indeed, once preparation is done, sparse differentiation is much faster than dense differentiation, because it makes fewer calls to the underlying function. The speedup becomes very visible in large dimensions.

n = 1000
jac_extras_dense = prepare_jacobian(f_sparse_vector, dense_first_order_backend, randn(n));
jac_extras_sparse = prepare_jacobian(f_sparse_vector, sparse_first_order_backend, randn(n));
@benchmark jacobian($f_sparse_vector, $dense_first_order_backend, $(randn(n)), $jac_extras_dense) evals=1
BenchmarkTools.Trial: 244 samples with 1 evaluation.
 Range (minmax):  16.023 ms53.052 ms   GC (min … max): 0.00% … 28.46%
 Time  (median):     16.292 ms               GC (median):    0.00%
 Time  (mean ± σ):   19.034 ms ±  5.489 ms   GC (mean ± σ):  5.76% ±  8.21%

  █      ▁                                                 
  ██▅██▄▆▇██▇▄▅▆▁▁▁▁▅▅▁▁▅▄▄▄▄▆▄▄▁▁▁▅▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▅
  16 ms        Histogram: log(frequency) by time      47.4 ms <

 Memory estimate: 57.72 MiB, allocs estimate: 1515.
@benchmark jacobian($f_sparse_vector, $sparse_first_order_backend, $(randn(n)), $jac_extras_sparse) evals=1
BenchmarkTools.Trial: 8903 samples with 1 evaluation.
 Range (minmax):  525.935 μs 11.698 ms   GC (min … max): 0.00% … 90.62%
 Time  (median):     539.940 μs                GC (median):    0.00%
 Time  (mean ± σ):   558.431 μs ± 268.709 μs   GC (mean ± σ):  2.09% ±  4.13%

     ▅██▅                                              
  ▂▄███████▆▅▅▅▆▆▆▅▄▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂ ▃
  526 μs           Histogram: frequency by time          643 μs <

 Memory estimate: 511.77 KiB, allocs estimate: 51.