MoYe
The MoYe.jl
library draws significant inspiration from NVIDIA's CuTe and is built with similar underlying structures.
The name Mo Ye is derived from an ancient Chinese legend of swordsmiths.
Installation
pkg> add MoYe
Quick Start
julia> data = [i for i in 1:48];
julia> a = MoYeArray(data, @Layout((6,8)))
6×8 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{6}, Static.StaticInt{8}}, Tuple{Static.StaticInt{1}, Static.StaticInt{6}}}}:
1 7 13 19 25 31 37 43
2 8 14 20 26 32 38 44
3 9 15 21 27 33 39 45
4 10 16 22 28 34 40 46
5 11 17 23 29 35 41 47
6 12 18 24 30 36 42 48
julia> subtile_a = @tile a static((3,4)) (1, 2) # partition a into subtiles of shape 3 x 4, returns the subtile at (1,2)
3×4 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{3}, Static.StaticInt{4}}, Tuple{Static.StaticInt{1}, Static.StaticInt{6}}}}:
25 31 37 43
26 32 38 44
27 33 39 45
julia> workitems_a = @parallelize subtile_a static((3,2)) (1,1) # 3 x 2 threads, returns what thread (1,1) is working on
1×2 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{1}, Static.StaticInt{2}}, Tuple{Static.StaticInt{0}, Static.StaticInt{12}}}}:
25 37
julia> for i in eachindex(workitems_a)
workitems_a[i] = 0
end
julia> a
6×8 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{6}, Static.StaticInt{8}}, Tuple{Static.StaticInt{1}, Static.StaticInt{6}}}}:
1 7 13 19 0 31 0 43
2 8 14 20 26 32 38 44
3 9 15 21 27 33 39 45
4 10 16 22 28 34 40 46
5 11 17 23 29 35 41 47
6 12 18 24 30 36 42 48
julia> @tile subtile_a static((3,1)) (1, 2) # if you want, you can always tile a subtile
3×1 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{3}, Static.StaticInt{1}}, Tuple{Static.StaticInt{1}, Static.StaticInt{0}}}}:
31
32
33
Tile Iterator
julia> data = collect(1:36);
julia> A = MoYeArray(data, @Layout((4,9)))
4×9 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Static.StaticInt{4}, Static.StaticInt{9}}, Tuple{Static.StaticInt{1}, Static.StaticInt{4}}}} with indices static(1):static(4)×static(1):static(9):
1 5 9 13 17 21 25 29 33
2 6 10 14 18 22 26 30 34
3 7 11 15 19 23 27 31 35
4 8 12 16 20 24 28 32 36
julia> tiled_A = zipped_divide(A, (@Layout(2), @Layout(3))) # 2 × 3 tile
6×6 MoYeArray{Int64, 2, ViewEngine{Int64, Ptr{Int64}}, Layout{2, Tuple{Tuple{Static.StaticInt{2}, Static.StaticInt{3}}, Tuple{Static.StaticInt{2}, Static.StaticInt{3}}}, Tuple{Tuple{Static.StaticInt{1}, Static.StaticInt{4}}, Tuple{Static.StaticInt{2}, Static.StaticInt{12}}}}} with indices static(1):static(6)×static(1):static(6):
1 3 13 15 25 27
2 4 14 16 26 28
5 7 17 19 29 31
6 8 18 20 30 32
9 11 21 23 33 35
10 12 22 24 34 36
julia> for i in axes(tiled_A, 2)
@show view(tiled_A, :, i)
end
view(tiled_A, :, i) = [1, 2, 5, 6, 9, 10]
view(tiled_A, :, i) = [3, 4, 7, 8, 11, 12]
view(tiled_A, :, i) = [13, 14, 17, 18, 21, 22]
view(tiled_A, :, i) = [15, 16, 19, 20, 23, 24]
view(tiled_A, :, i) = [25, 26, 29, 30, 33, 34]
view(tiled_A, :, i) = [27, 28, 31, 32, 35, 36]
Current Status
Tensor Core MMA
: Traits-Level Support - see mma_unpack
One of the future goals is to support composable tiled mma. Contributions from the community are very much welcome and encouraged. If you're interested in helping out, please don't hesitate to get in touch or submit a pull request.
Notes on WMMA
Supporting WMMA is not a priority for MoYe, it is considered an outdated class of API.