Skip to content

Latest commit

 

History

History
120 lines (84 loc) · 3.41 KB

File metadata and controls

120 lines (84 loc) · 3.41 KB

mpsparse

Sparse iterative linear solvers — Conjugate Gradient (CGD) and BiCGSTAB — accelerated on the Apple GPU via Metal, exposed as PyTorch extensions. Solve A x = b for large sparse A on Apple Silicon, with CSR and block-CSR (BCSR) sparse formats.

The solver runs the sparse matrix–vector products and the full iterative loop on the GPU using fused Metal kernels and SIMD-group reductions, minimizing CPU↔GPU synchronization.

Packaged release of the working solvers from the mps repo. v0.1.

Requirements

  • Apple Silicon Mac (Metal GPU)
  • Xcode command-line tools — provides the Metal compiler (xcode-select --install)
  • Python ≥ 3.10, PyTorch ≥ 2.0

Install

pip install git+https://github.com/sparseforge/mpsparse

The Metal kernels are compiled at build time and the resulting .metallib files are bundled into the package, so they are located automatically at runtime — no need to run from a particular directory.

Usage

import torch
import mpsparse

# A sparse matrix in COO (coordinate) form.
rows = torch.tensor([0, 1, 2, 0])
cols = torch.tensor([0, 1, 2, 1])
vals = torch.tensor([4.0, 4.0, 4.0, -1.0])

A = mpsparse.from_coo(rows, cols, vals, shape=(3, 3))   # CSR by default

b = torch.randn(3)
x = A.solve(b, method="bicgstab")        # or method="cgd" for SPD matrices
  • method="cgd" — Conjugate Gradient, for symmetric positive-definite A.
  • method="bicgstab" — BiCGSTAB, for general (nonsymmetric) A.

Block-CSR

Pass format="bcsr" for block-structured matrices. Block padding is handled automatically — your matrix size need not be a multiple of block_size:

A = mpsparse.from_coo(rows, cols, vals, shape=(n, n), format="bcsr", block_size=4)
x = A.solve(b, method="bicgstab")

Other operations

y = A.matvec(x)        # sparse matrix–vector product A @ x  (CSR)
A.shape                # (rows, cols)

The raw extension classes (mpsparse.csr_tensor, mpsparse.bcsr_tensor) are also exposed for advanced use, but from_coo is the recommended entry point.

Benchmarks

Benchmarked against PyTorch CPU solvers on two matrix families: structured block-sparse matrices (as in pruned ML models) and 5-point Laplacian stencils (as in finite-difference simulations). Speedups grow with problem size.

CGD

CGD — ML block-sparse CGD — physics stencil

BiCGSTAB

BiCGSTAB timing

Block-CSR

BCSR timing BCSR speedup

Development

git clone https://github.com/sparseforge/mpsparse
cd mpsparse
pip install -e ".[dev]" --no-build-isolation   # uses your existing torch
pytest
python examples/solve_laplacian.py

Repository layout:

src/mpsparse/
  core.py          # from_coo / SparseMatrix — the friendly API
  spmv.cpp         # CSR extension  (mv, cgd, bicgstab)
  bcsr.cpp         # BCSR extension (cgd, bicgstab)
  kernels/*.metal  # Metal compute kernels
third_party/metal-cpp/   # vendored Apple metal-cpp headers

Roadmap

CGD and BiCGSTAB are the working core. The longer-term goal (tracked in the mps dev repo) is a sparse direct solver (QR / LU) and a faster on-GPU COO→CSR conversion.

License

See LICENSE.