Advanced Tensor Storage Formats

julia> A = [0.0 0.0 4.4; 1.1 0.0 0.0; 2.2 0.0 5.5; 3.3 0.0 0.0]
4×3 Matrix{Float64}:
 0.0  0.0  4.4
 1.1  0.0  0.0
 2.2  0.0  5.5
 3.3  0.0  0.0
julia> A_fbr = Tensor(Dense(Dense(Element(0.0))), A)
Dense [:,1:3]
├─ [:, 1]: Dense [1:4]
│  ├─ [1]: 0.0
│  ├─ [2]: 1.1
│  ├─ [3]: 2.2
│  └─ [4]: 3.3
├─ [:, 2]: Dense [1:4]
│  ├─ [1]: 0.0
│  ├─ [2]: 0.0
│  ├─ [3]: 0.0
│  └─ [4]: 0.0
└─ [:, 3]: Dense [1:4]
   ├─ [1]: 4.4
   ├─ [2]: 0.0
   ├─ [3]: 5.5
   └─ [4]: 0.0

We refer to a node in the tree as a subfiber. All of the nodes at the same level are stored in the same datastructure, and disambiguated by an integer position. in the above example, there are three levels: the rootmost level contains only one subfiber, the root. The middle level has 3 subfibers, one for each column. The leafmost level has 12 subfibers, one for each element of the array. For example, the first level is A_fbr.lvl, and we can represent it's third position as SubFiber(A_fbr.lvl.lvl, 3). The second level is A_fbr.lvl.lvl, and we can access it's 9th position as SubFiber(A_fbr.lvl.lvl.lvl, 9). For instructional purposes, you can use parentheses to call a subfiber on an index to select among children of a subfiber.

julia> Finch.SubFiber(A_fbr.lvl.lvl, 3)
Dense [1:4]
├─ [1]: 4.4
├─ [2]: 0.0
├─ [3]: 5.5
└─ [4]: 0.0

julia> A_fbr[:, 3]
Dense [1:4]
├─ [1]: 4.4
├─ [2]: 0.0
├─ [3]: 5.5
└─ [4]: 0.0

julia> A_fbr(3)
Dense [1:4]
├─ [1]: 4.4
├─ [2]: 0.0
├─ [3]: 5.5
└─ [4]: 0.0

julia> Finch.SubFiber(A_fbr.lvl.lvl.lvl, 9)
4.4

julia> A_fbr[1, 3]
4.4

julia> A_fbr(3)(1)
4.4

When we print the tree in text, positions are numbered from top to bottom. However, if we visualize our tree with the root at the top, positions range from left to right:

Dense Format Index Tree

Because our array is sparse, (mostly zero, or another fill value), it would be more efficient to store only the nonzero values. In Finch, each level is represented with a different format. A sparse level only stores non-fill values. This time, we'll use a tensor constructor with sl (for "SparseList of nonzeros") instead of d (for "Dense"):

julia> A_fbr = Tensor(Dense(SparseList(Element(0.0))), A)
Dense [:,1:3]
├─ [:, 1]: SparseList (0.0) [1:4]
│  ├─ [2]: 1.1
│  ├─ [3]: 2.2
│  └─ [4]: 3.3
├─ [:, 2]: SparseList (0.0) [1:4]
└─ [:, 3]: SparseList (0.0) [1:4]
   ├─ [1]: 4.4
   └─ [3]: 5.5

CSC Format Index Tree

Our Dense(SparseList(Element(0.0))) format is also known as "CSC" and is equivalent to SparseMatrixCSC. The Tensor function will perform a zero-cost copy between Finch fibers and sparse matrices, when available. CSC is an excellent general-purpose representation when we expect most of the columns to have a few nonzeros. However, when most of the columns are entirely fill (a situation known as hypersparsity), it is better to compress the root level as well:

julia> A_fbr = Tensor(SparseList(SparseList(Element(0.0))), A)
SparseList (0.0) [:,1:3]
├─ [:, 1]: SparseList (0.0) [1:4]
│  ├─ [2]: 1.1
│  ├─ [3]: 2.2
│  └─ [4]: 3.3
└─ [:, 3]: SparseList (0.0) [1:4]
   ├─ [1]: 4.4
   └─ [3]: 5.5

DCSC Format Index Tree

Here we see that the entirely zero column has also been compressed. The SparseList(SparseList(Element(0.0))) format is also known as "DCSC".

The "COO" (or "Coordinate") format is often used in practice for ease of interchange between libraries. In an N-dimensional array A, COO stores N lists of indices I_1, ..., I_N where A[I_1[p], ..., I_N[p]] is the p^th stored value in column-major numbering. In Finch, COO is represented as a multi-index level, which can handle more than one index at once. We use curly brackets to declare the number of indices handled by the level:

julia> A_fbr = Tensor(SparseCOO{2}(Element(0.0)), A)
SparseCOO{2} (0.0) [:,1:3]
├─ [2, 1]: 1.1
├─ [3, 1]: 2.2
├─ [4, 1]: 3.3
├─ [1, 3]: 4.4
└─ [3, 3]: 5.5

COO Format Index Tree

The COO format is compact and straightforward, but doesn't support random access. For random access, one should use the SparseHash format. A full listing of supported formats is described after a rough description of shared common internals of level, relating to types and storage.

Types and Storage of Level

All levels have a postype, typically denoted as Tp in the constructors, used for internal pointer types but accessible by the function:

Finch.postypeFunction
postype(lvl)

Return a position type with the same flavor as those used to store the positions of the fibers contained in lvl. The name position descends from the pos or position or pointer arrays found in many definitions of CSR or CSC. In Finch, positions should be data used to access either a subfiber or some other similar auxiliary data. Thus, we often end up iterating over positions.

Additionally, many levels have a Vp or Vi in their constructors; these stand for vector of element type Tp or Ti. More generally, levels are paramterized by the types that they use for storage. By default, all levels use Vector, but a user could could change any or all of the storage types of a tensor so that the tensor would be stored on a GPU or CPU or some combination thereof, or even just via a vector with a different allocation mechanism. The storage type should behave like AbstractArray and needs to implement the usual abstract array functions and Base.resize!. See the tests for an example.

When levels are constructed in short form as in the examples above, the index, position, and storage types are inferred from the level below. All the levels at the bottom of a Tensor (Element, Pattern, Repeater) specify an index type, position type, and storage type even if they don't need them. These are used by levels that take these as parameters.

Move to: Copying Fibers to a new storage type.

If one needs to copy a tensor to another tensor with a different storage type, one can use the moveto function, described below.

Finch.movetoFunction
moveto(arr, device)

If the array is not on the given device, it creates a new version of this array on that device and copies the data in to it, according to the device trait.