Sparse iterative linear solvers — Conjugate Gradient (CGD) and
BiCGSTAB — accelerated on the Apple GPU via Metal, exposed as PyTorch
extensions. Solve A x = b for large sparse A on Apple Silicon, with CSR and
block-CSR (BCSR) sparse formats.
The solver runs the sparse matrix–vector products and the full iterative loop on the GPU using fused Metal kernels and SIMD-group reductions, minimizing CPU↔GPU synchronization.
Packaged release of the working solvers from the
mpsrepo. v0.1.
- Apple Silicon Mac (Metal GPU)
- Xcode command-line tools — provides the Metal compiler (
xcode-select --install) - Python ≥ 3.10, PyTorch ≥ 2.0
pip install git+https://github.com/sparseforge/mpsparseThe Metal kernels are compiled at build time and the resulting .metallib
files are bundled into the package, so they are located automatically at
runtime — no need to run from a particular directory.
import torch
import mpsparse
# A sparse matrix in COO (coordinate) form.
rows = torch.tensor([0, 1, 2, 0])
cols = torch.tensor([0, 1, 2, 1])
vals = torch.tensor([4.0, 4.0, 4.0, -1.0])
A = mpsparse.from_coo(rows, cols, vals, shape=(3, 3)) # CSR by default
b = torch.randn(3)
x = A.solve(b, method="bicgstab") # or method="cgd" for SPD matricesmethod="cgd"— Conjugate Gradient, for symmetric positive-definiteA.method="bicgstab"— BiCGSTAB, for general (nonsymmetric)A.
Pass format="bcsr" for block-structured matrices. Block padding is handled
automatically — your matrix size need not be a multiple of block_size:
A = mpsparse.from_coo(rows, cols, vals, shape=(n, n), format="bcsr", block_size=4)
x = A.solve(b, method="bicgstab")y = A.matvec(x) # sparse matrix–vector product A @ x (CSR)
A.shape # (rows, cols)The raw extension classes (mpsparse.csr_tensor, mpsparse.bcsr_tensor) are
also exposed for advanced use, but from_coo is the recommended entry point.
Benchmarked against PyTorch CPU solvers on two matrix families: structured block-sparse matrices (as in pruned ML models) and 5-point Laplacian stencils (as in finite-difference simulations). Speedups grow with problem size.
CGD
BiCGSTAB
Block-CSR
git clone https://github.com/sparseforge/mpsparse
cd mpsparse
pip install -e ".[dev]" --no-build-isolation # uses your existing torch
pytest
python examples/solve_laplacian.pyRepository layout:
src/mpsparse/
core.py # from_coo / SparseMatrix — the friendly API
spmv.cpp # CSR extension (mv, cgd, bicgstab)
bcsr.cpp # BCSR extension (cgd, bicgstab)
kernels/*.metal # Metal compute kernels
third_party/metal-cpp/ # vendored Apple metal-cpp headers
CGD and BiCGSTAB are the working core. The longer-term goal (tracked in the
mps dev repo) is a sparse direct
solver (QR / LU) and a faster on-GPU COO→CSR conversion.
See LICENSE.




