# COSMA.jl communication optimal matrix-matrix multiplication for DistributedArrays.jl over MPI

COSMA.jl provides wrappers for eth-cscs/COSMA to do communication-optimal matrix-matrix multiplication for DArray's of element types `Float32`

, `Float64`

, `ComplexF32`

and `ComplexF64`

.

Install via the package manager

using Pkg
Pkg.add("COSMA")

A typical prerequisite is to use MPIClusterManager to setup some MPI ranks and to load the package everywhere:

using MPIClusterManagers, DistributedArrays, Distributed
manager = MPIManager(np = 6)
addprocs(manager)
@everywhere using COSMA
# Just on the host we have to configure the mapping of Julia's pids to MPI ranks (hopefully this can be removed in a later release)
COSMA.use_manager(manager)

Next create some distributed matrices and multiply them:

using LinearAlgebra
# Float64 matrices, automatically distributed over the MPI ranks
A = drand(100, 100)
B = drand(100, 100)
# Use DistributedArrays to allocate the new matrix C and multiply using COSMA
C = A * B
# Or allocate your own distributed target matrix C:
A_complex = drand(ComplexF32, 100, 100)
B_complex = drand(ComplexF32, 100, 100)
C_complex = dzeros(ComplexF32, 100, 100)
mul!(C_complex, A_complex, B_complex)

## Using a custom MPI implementation

COSMA.jl depends on MPI.jl, which ships MPICH as a default MPI library. If you need a system-specific version, see the instructions from the docs of MPI.jl.

## Notes about Julia's DArray type

COSMA supports Julia's DArray matrix distribution perfectly, and is in fact more powerful: Julia's DArray supports only a single local block per MPI rank, whereas COSMA supports an arbitrary number of them.