mpsparse

Sparse iterative linear solvers — Conjugate Gradient (CGD) and BiCGSTAB — accelerated on the Apple GPU via Metal, exposed as PyTorch extensions. Solve A x = b for large sparse A on Apple Silicon, with CSR and block-CSR (BCSR) sparse formats.

The solver runs the sparse matrix–vector products and the full iterative loop on the GPU using fused Metal kernels and SIMD-group reductions, minimizing CPU↔GPU synchronization.

Packaged release of the working solvers from the mps repo. v0.1.

Requirements

Apple Silicon Mac (Metal GPU)
Xcode command-line tools — provides the Metal compiler (xcode-select --install)
Python ≥ 3.10, PyTorch ≥ 2.0

Install

pip install git+https://github.com/sparseforge/mpsparse

The Metal kernels are compiled at build time and the resulting .metallib files are bundled into the package, so they are located automatically at runtime — no need to run from a particular directory.

Usage

import torch
import mpsparse

# A sparse matrix in COO (coordinate) form.
rows = torch.tensor([0, 1, 2, 0])
cols = torch.tensor([0, 1, 2, 1])
vals = torch.tensor([4.0, 4.0, 4.0, -1.0])

A = mpsparse.from_coo(rows, cols, vals, shape=(3, 3))   # CSR by default

b = torch.randn(3)
x = A.solve(b, method="bicgstab")        # or method="cgd" for SPD matrices

method="cgd" — Conjugate Gradient, for symmetric positive-definite A.
method="bicgstab" — BiCGSTAB, for general (nonsymmetric) A.

Block-CSR

Pass format="bcsr" for block-structured matrices. Block padding is handled automatically — your matrix size need not be a multiple of block_size:

A = mpsparse.from_coo(rows, cols, vals, shape=(n, n), format="bcsr", block_size=4)
x = A.solve(b, method="bicgstab")

Other operations

y = A.matvec(x)        # sparse matrix–vector product A @ x  (CSR)
A.shape                # (rows, cols)

The raw extension classes (mpsparse.csr_tensor, mpsparse.bcsr_tensor) are also exposed for advanced use, but from_coo is the recommended entry point.

Benchmarks

Benchmarked against PyTorch CPU solvers on two matrix families: structured block-sparse matrices (as in pruned ML models) and 5-point Laplacian stencils (as in finite-difference simulations). Speedups grow with problem size.

CGD

BiCGSTAB

Block-CSR

Development

git clone https://github.com/sparseforge/mpsparse
cd mpsparse
pip install -e ".[dev]" --no-build-isolation   # uses your existing torch
pytest
python examples/solve_laplacian.py

Repository layout:

src/mpsparse/
  core.py          # from_coo / SparseMatrix — the friendly API
  spmv.cpp         # CSR extension  (mv, cgd, bicgstab)
  bcsr.cpp         # BCSR extension (cgd, bicgstab)
  kernels/*.metal  # Metal compute kernels
third_party/metal-cpp/   # vendored Apple metal-cpp headers

Roadmap

CGD and BiCGSTAB are the working core. The longer-term goal (tracked in the mps dev repo) is a sparse direct solver (QR / LU) and a faster on-GPU COO→CSR conversion.

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
examples		examples
src/mpsparse		src/mpsparse
tests		tests
third_party/metal-cpp		third_party/metal-cpp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mpsparse

Requirements

Install

Usage

Block-CSR

Other operations

Benchmarks

Development

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

mpsparse

Requirements

Install

Usage

Block-CSR

Other operations

Benchmarks

Development

Roadmap

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages