## LLᵀ and LLᴴ

LinearAlgebra.choleskyMethod
solver = cholesky(A::CuSparseMatrixCSR{T,Cint}; view::Char='F')

Compute the LLᴴ factorization of a sparse matrix A on an NVIDIA GPU. The type T can be Float32, Float64, ComplexF32 or ComplexF64.

Input argument

• A: a sparse Hermitian positive definite matrix stored in the CuSparseMatrixCSR format.

Keyword argument

*view: A character that specifies which triangle of the sparse matrix is provided. Possible options are L for the lower triangle, U for the upper triangle, and F for the full matrix.

Output argument

LinearAlgebra.cholesky!Method
solver = cholesky!(solver::CudssSolver{T}, A::CuSparseMatrixCSR{T,Cint})

Compute the LLᴴ factorization of a sparse matrix A on an NVIDIA GPU, reusing the symbolic factorization stored in solver. The type T can be Float32, Float64, ComplexF32 or ComplexF64.

using CUDA, CUDA.CUSPARSE
using CUDSS
using LinearAlgebra
using SparseArrays

T = ComplexF64
R = real(T)
n = 100
p = 5
A_cpu = sprand(T, n, n, 0.01)
A_cpu = A_cpu * A_cpu' + I
B_cpu = rand(T, n, p)

A_gpu = CuSparseMatrixCSR(A_cpu |> triu)
B_gpu = CuMatrix(B_cpu)
X_gpu = similar(B_gpu)

F = cholesky(A_gpu, view='U')
X_gpu = F \ B_gpu

R_gpu = B_gpu - CuSparseMatrixCSR(A_cpu) * X_gpu
norm(R_gpu)

# In-place LLᴴ
d_gpu = rand(R, n) |> CuVector
A_gpu = A_gpu + Diagonal(d_gpu)
cholesky!(F, A_gpu)

C_cpu = rand(T, n, p)
C_gpu = CuMatrix(C_cpu)
ldiv!(X_gpu, F, C_gpu)

R_gpu = C_gpu - ( CuSparseMatrixCSR(A_cpu) + Diagonal(d_gpu) ) * X_gpu
norm(R_gpu)
Note

If we only store one triangle of A_gpu, we can also use the wrappers Symmetric and Hermitian instead of using the keyword argument view in cholesky. For real matrices, both wrappers are allowed but only Hermitian can be used for complex matrices.

H_gpu = Hermitian(A_gpu, :U)
F = cholesky(H_gpu)

## LDLᵀ and LDLᴴ

LinearAlgebra.ldltMethod
solver = ldlt(A::CuSparseMatrixCSR{T,Cint}; view::Char='F')

Compute the LDLᴴ factorization of a sparse matrix A on an NVIDIA GPU. The type T can be Float32, Float64, ComplexF32 or ComplexF64.

Input argument

• A: a sparse Hermitian matrix stored in the CuSparseMatrixCSR format.

Keyword argument

*view: A character that specifies which triangle of the sparse matrix is provided. Possible options are L for the lower triangle, U for the upper triangle, and F for the full matrix.

Output argument

LinearAlgebra.ldlt!Method
solver = ldlt!(solver::CudssSolver{T}, A::CuSparseMatrixCSR{T,Cint})

Compute the LDLᴴ factorization of a sparse matrix A on an NVIDIA GPU, reusing the symbolic factorization stored in solver. The type T can be Float32, Float64, ComplexF32 or ComplexF64.

using CUDA, CUDA.CUSPARSE
using CUDSS
using LinearAlgebra
using SparseArrays

T = Float64
R = real(T)
n = 100
p = 5
A_cpu = sprand(T, n, n, 0.05) + I
A_cpu = A_cpu + A_cpu'
B_cpu = rand(T, n, p)

A_gpu = CuSparseMatrixCSR(A_cpu |> tril)
B_gpu = CuMatrix(B_cpu)
X_gpu = similar(B_gpu)

F = ldlt(A_gpu, view='L')
X_gpu = F \ B_gpu

R_gpu = B_gpu - CuSparseMatrixCSR(A_cpu) * X_gpu
norm(R_gpu)

# In-place LDLᵀ
d_gpu = rand(R, n) |> CuVector
A_gpu = A_gpu + Diagonal(d_gpu)
ldlt!(F, A_gpu)

C_cpu = rand(T, n, p)
C_gpu = CuMatrix(C_cpu)
ldiv!(X_gpu, F, C_gpu)

R_gpu = C_gpu - ( CuSparseMatrixCSR(A_cpu) + Diagonal(d_gpu) ) * X_gpu
norm(R_gpu)
Note

If we only store one triangle of A_gpu, we can also use the wrappers Symmetric and Hermitian instead of using the keyword argument view in ldlt. For real matrices, both wrappers are allowed but only Hermitian can be used for complex matrices.

S_gpu = Symmetric(A_gpu, :L)
F = ldlt(S_gpu)

## LU

LinearAlgebra.luMethod
solver = lu(A::CuSparseMatrixCSR{T,Cint})

Compute the LU factorization of a sparse matrix A on an NVIDIA GPU. The type T can be Float32, Float64, ComplexF32 or ComplexF64.

Input argument

• A: a sparse square matrix stored in the CuSparseMatrixCSR format.

Output argument

LinearAlgebra.lu!Method
solver = lu!(solver::CudssSolver{T}, A::CuSparseMatrixCSR{T,Cint})

Compute the LU factorization of a sparse matrix A on an NVIDIA GPU, reusing the symbolic factorization stored in solver. The type T can be Float32, Float64, ComplexF32 or ComplexF64.

using CUDA, CUDA.CUSPARSE
using CUDSS
using LinearAlgebra
using SparseArrays

T = Float64
n = 100
A_cpu = sprand(T, n, n, 0.05) + I
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
b_gpu = CuVector(b_cpu)

F = lu(A_gpu)
x_gpu = F \ b_gpu

r_gpu = b_gpu - A_gpu * x_gpu
norm(r_gpu)

# In-place LU
d_gpu = rand(T, n) |> CuVector
A_gpu = A_gpu + Diagonal(d_gpu)
lu!(F, A_gpu)

c_cpu = rand(T, n)
c_gpu = CuVector(c_cpu)
ldiv!(x_gpu, F, c_gpu)

r_gpu = c_gpu - A_gpu * x_gpu
norm(r_gpu)