CUDSS.jl documentation
Overview
CUDSS.jl is a Julia interface to the NVIDIA cuDSS library. NVIDIA cuDSS provides three factorizations (LDU, LDLᵀ, LLᵀ) for solving sparse linear systems on GPUs. For more details on using cuDSS, refer to the official cuDSS documentation.
Installation
julia> ]
pkg> add CUDSS
pkg> test CUDSS
Types
CUDSS.CudssMatrix
— Typematrix = CudssMatrix(v::CuVector{T})
matrix = CudssMatrix(A::CuMatrix{T})
matrix = CudssMatrix(A::CuSparseMatrixCSR{T,Cint}, struture::String, view::Char; index::Char='O')
The type T
can be Float32
, Float64
, ComplexF32
or ComplexF64
.
CudssMatrix
is a wrapper for CuVector
, CuMatrix
and CuSparseMatrixCSR
. CudssMatrix
is used to pass matrix of the linear system, as well as solution and right-hand side.
structure
specifies the stucture for sparse matrices:
"G"
: General matrix – LDU factorization;"S"
: Real symmetric matrix – LDLᵀ factorization;"H"
: Complex Hermitian matrix – LDLᴴ factorization;"SPD"
: Symmetric positive-definite matrix – LLᵀ factorization;"HPD"
: Hermitian positive-definite matrix – LLᴴ factorization.
view
specifies matrix view for sparse matrices:
'L'
: Lower-triangular matrix and all values above the main diagonal are ignored;'U'
: Upper-triangular matrix and all values below the main diagonal are ignored;'F'
: Full matrix.
index
specifies indexing base for sparse matrix indices:
'Z'
: 0-based indexing;'O'
: 1-based indexing.
CUDSS.CudssConfig
— Typeconfig = CudssConfig()
CudssConfig
stores configuration settings for the solver.
CUDSS.CudssData
— Typedata = CudssData()
data = CudssData(cudss_handle::cudssHandle_t)
CudssData
holds internal data (e.g., LU factors arrays).
CUDSS.CudssSolver
— Typesolver = CudssSolver(A::CuSparseMatrixCSR{T,Cint}, structure::String, view::Char; index::Char='O')
solver = CudssSolver(matrix::CudssMatrix{T}, config::CudssConfig, data::CudssData)
The type T
can be Float32
, Float64
, ComplexF32
or ComplexF64
.
CudssSolver
contains all structures required to solve linear systems with cuDSS. One constructor of CudssSolver
takes as input the same parameters as CudssMatrix
.
structure
specifies the stucture for sparse matrices:
"G"
: General matrix – LDU factorization;"S"
: Real symmetric matrix – LDLᵀ factorization;"H"
: Complex Hermitian matrix – LDLᴴ factorization;"SPD"
: Symmetric positive-definite matrix – LLᵀ factorization;"HPD"
: Hermitian positive-definite matrix – LLᴴ factorization.
view
specifies matrix view for sparse matrices:
'L'
: Lower-triangular matrix and all values above the main diagonal are ignored;'U'
: Upper-triangular matrix and all values below the main diagonal are ignored;'F'
: Full matrix.
index
specifies indexing base for sparse matrix indices:
'Z'
: 0-based indexing;'O'
: 1-based indexing.
CudssSolver
can be also constructed from the three structures CudssMatrix
, CudssConfig
and CudssData
if needed.
Functions
CUDSS.cudss_set
— Functioncudss_set(matrix::CudssMatrix{T}, v::CuVector{T})
cudss_set(matrix::CudssMatrix{T}, A::CuMatrix{T})
cudss_set(matrix::CudssMatrix{T}, A::CuSparseMatrixCSR{T,Cint})
cudss_set(solver::CudssSolver{T}, A::CuSparseMatrixCSR{T,Cint})
cudss_set(solver::CudssSolver, parameter::String, value)
cudss_set(config::CudssConfig, parameter::String, value)
cudss_set(data::CudssData, parameter::String, value)
The type T
can be Float32
, Float64
, ComplexF32
or ComplexF64
.
The available configuration parameters are:
"reordering_alg"
: Algorithm for the reordering phase ("default"
,"algo1"
,"algo2"
or"algo3"
);"factorization_alg"
: Algorithm for the factorization phase ("default"
,"algo1"
,"algo2"
or"algo3"
);"solve_alg"
: Algorithm for the solving phase ("default"
,"algo1"
,"algo2"
or"algo3"
);"matching_type"
: Type of matching;"solve_mode"
: Potential modificator on the system matrix (transpose or adjoint);"ir_n_steps"
: Number of steps during the iterative refinement;"ir_tol"
: Iterative refinement tolerance;"pivot_type"
: Type of pivoting ('C'
,'R'
or'N'
);"pivot_threshold"
: Pivoting threshold which is used to determine if digonal element is subject to pivoting;"pivot_epsilon"
: Pivoting epsilon, absolute value to replace singular diagonal elements;"max_lu_nnz"
: Upper limit on the number of nonzero entries in LU factors for non-symmetric matrices;"hybrid_mode"
: Memory mode –0
(default = device-only) or1
(hybrid = host/device);"hybrid_device_memory_limit"
: User-defined device memory limit (number of bytes) for the hybrid memory mode;"use_cuda_register_memory"
: A flag to enable (1
) or disable (0
) usage ofcudaHostRegister()
by the hybrid memory mode.
The available data parameters are:
"user_perm"
: User permutation to be used instead of running the reordering algorithms;"comm"
: Communicator for Multi-GPU multi-node mode.
CUDSS.cudss_get
— Functionvalue = cudss_get(solver::CudssSolver, parameter::String)
value = cudss_get(config::CudssConfig, parameter::String)
value = cudss_get(data::CudssData, parameter::String)
The available configuration parameters are:
"reordering_alg"
: Algorithm for the reordering phase;"factorization_alg"
: Algorithm for the factorization phase;"solve_alg"
: Algorithm for the solving phase;"matching_type"
: Type of matching;"solve_mode"
: Potential modificator on the system matrix (transpose or adjoint);"ir_n_steps"
: Number of steps during the iterative refinement;"ir_tol"
: Iterative refinement tolerance;"pivot_type"
: Type of pivoting;"pivot_threshold"
: Pivoting threshold which is used to determine if digonal element is subject to pivoting;"pivot_epsilon"
: Pivoting epsilon, absolute value to replace singular diagonal elements;"max_lu_nnz"
: Upper limit on the number of nonzero entries in LU factors for non-symmetric matrices;"hybrid_mode"
: Memory mode –0
(default = device-only) or1
(hybrid = host/device);"hybrid_device_memory_limit"
: User-defined device memory limit (number of bytes) for the hybrid memory mode;"use_cuda_register_memory"
: A flag to enable (1
) or disable (0
) usage ofcudaHostRegister()
by the hybrid memory mode.
The available data parameters are:
"info"
: Device-side error information;"lu_nnz"
: Number of non-zero entries in LU factors;"npivots"
: Number of pivots encountered during factorization;"inertia"
: Tuple of positive and negative indices of inertia for symmetric and hermitian non positive-definite matrix types;"perm_reorder_row"
: Reordering permutation for the rows;"perm_reorder_col"
: Reordering permutation for the columns;"perm_row"
: Final row permutation (which includes effects of both reordering and pivoting);"perm_col"
: Final column permutation (which includes effects of both reordering and pivoting);"diag"
: Diagonal of the factorized matrix;"hybrid_device_memory_min"
: Minimal amount of device memory (number of bytes) required in the hybrid memory mode.
The data parameters "info"
, "lu_nnz"
, "perm_reorder_row"
, "perm_reorder_col"
and "hybrid_device_memory_min"
require the phase "analyse"
performed by cudss
. The data parameters "npivots"
, "inertia"
and "diag"
require the phases "analyse"
and "factorization"
performed by cudss
. The data parameters "perm_row"
and "perm_col"
are available but not yet functional.
CUDSS.cudss
— Functioncudss(phase::String, solver::CudssSolver{T}, x::CuVector{T}, b::CuVector{T})
cudss(phase::String, solver::CudssSolver{T}, X::CuMatrix{T}, B::CuMatrix{T})
cudss(phase::String, solver::CudssSolver{T}, X::CudssMatrix{T}, B::CudssMatrix{T})
The type T
can be Float32
, Float64
, ComplexF32
or ComplexF64
.
The available phases are "analysis"
, "factorization"
, "refactorization"
and "solve"
. The phases "solve_fwd"
, "solve_diag"
and "solve_bwd"
are available but not yet functional.