MoYe

Stable Dev Build Status Coverage

The MoYe.jl library draws significant inspiration from NVIDIA's CuTe and is built with similar underlying structures.

The name Mo Ye is derived from an ancient Chinese legend of swordsmiths.

Installation

pkg> add MoYe

Quick Start

julia> data = [i for i in 1:48];
julia> a = MoYeArray(data, @Layout((6,8)))
6×8 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{6}, Static.StaticInt{8}}, Tuple{Static.StaticInt{1}, Static.StaticInt{6}}}}:
 1   7  13  19  25  31  37  43
 2   8  14  20  26  32  38  44
 3   9  15  21  27  33  39  45
 4  10  16  22  28  34  40  46
 5  11  17  23  29  35  41  47
 6  12  18  24  30  36  42  48

julia> subtile_a = @tile a static((3,4)) (1, 2) # partition a into subtiles of shape 3 x 4, returns the subtile at (1,2)
3×4 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{3}, Static.StaticInt{4}}, Tuple{Static.StaticInt{1}, Static.StaticInt{6}}}}:
 25  31  37  43
 26  32  38  44
 27  33  39  45

julia> workitems_a = @parallelize subtile_a static((3,2)) (1,1) # 3 x 2 threads, returns what thread (1,1) is working on
1×2 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{1}, Static.StaticInt{2}}, Tuple{Static.StaticInt{0}, Static.StaticInt{12}}}}:
 25  37

julia> for i in eachindex(workitems_a)
           workitems_a[i] = 0
       end

julia> a
6×8 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{6}, Static.StaticInt{8}}, Tuple{Static.StaticInt{1}, Static.StaticInt{6}}}}:
 1   7  13  19   0  31   0  43
 2   8  14  20  26  32  38  44
 3   9  15  21  27  33  39  45
 4  10  16  22  28  34  40  46
 5  11  17  23  29  35  41  47
 6  12  18  24  30  36  42  48
 
 julia> @tile subtile_a static((3,1)) (1, 2) # if you want, you can always tile a subtile
3×1 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{3}, Static.StaticInt{1}}, Tuple{Static.StaticInt{1}, Static.StaticInt{0}}}}:
 31
 32
 33

Tile Iterator

julia> data = collect(1:36);

julia> A = MoYeArray(data, @Layout((4,9)))
4×9 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{4}, Static.StaticInt{9}}, Tuple{Static.StaticInt{1}, Static.StaticInt{4}}}} with indices static(1):static(4)×static(1):static(9):
 1  5   9  13  17  21  25  29  33
 2  6  10  14  18  22  26  30  34
 3  7  11  15  19  23  27  31  35
 4  8  12  16  20  24  28  32  36

julia> tiled_A = zipped_divide(A, (@Layout(2), @Layout(3))) # 2 × 3 tile
6×6 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Tuple{Static.StaticInt{2}, Static.StaticInt{3}}, Tuple{Static.StaticInt{2}, Static.StaticInt{3}}}, Tuple{Tuple{Static.StaticInt{1}, Static.StaticInt{4}}, Tuple{Static.StaticInt{2}, Static.StaticInt{12}}}}} with indices static(1):static(6)×static(1):static(6):
  1   3  13  15  25  27
  2   4  14  16  26  28
  5   7  17  19  29  31
  6   8  18  20  30  32
  9  11  21  23  33  35
 10  12  22  24  34  36

julia> for i in axes(tiled_A, 2)
           @show view(tiled_A, :, i)
       end
view(tiled_A, :, i) = [1, 2, 5, 6, 9, 10]
view(tiled_A, :, i) = [3, 4, 7, 8, 11, 12]
view(tiled_A, :, i) = [13, 14, 17, 18, 21, 22]
view(tiled_A, :, i) = [15, 16, 19, 20, 23, 24]
view(tiled_A, :, i) = [25, 26, 29, 30, 33, 34]
view(tiled_A, :, i) = [27, 28, 31, 32, 35, 36]

Current Status

Tensor Core MMA: High-level programming on tensor cores has been implemented, as shown in the example file. However, integration with ldmatrix has not yet been accomplished.

Contributions from the community are very much welcome and encouraged. If you're interested in helping out, please don't hesitate to get in touch or submit a pull request.

Notes on WMMA

Supporting WMMA is not a priority here, it is considered an outdated class of API.