Release 0.2.0 by dwgoon · Pull Request #6 · dwgoon/sfa

dwgoon · 2026-06-06T09:20:39Z

Summary

First feature release after 0.1.0. Bumps version 0.2.0.dev0 -> 0.2.0
and turns the PyPI publish job on; tag push of v0.2.0 from main will
trigger the release upload after merge.

What this branch carries over from 0.2.0.dev0 (highlights):

Native CUDA backend (sfa._cuda._native) for compute_influence
and SignalPropagation.propagate_iterative. AOT-compiled SASS for
SM 7.0 - SM 12.0 plus PTX fallback. Distributed as sfa-cu128
(CUDA 12.8) and sfa-cu132 (CUDA 13.2) wheels in addition to the
pure-Python sfa package.
CPU LAPACK closed-form fast path for compute_influence (scipy.linalg.solve),
with optional _blas_ctypes MKL / OpenBLAS direct call and
threadpoolctl-based thread limiting.
device= and dtype= kwargs throughout the public API, plus the
use_tf32 toggle for the Tensor Core path.
Benchmark suite under benchmarks/, with the small-network table
(vs v0.1.0 in fp64) and large-network table (GPU only, multiple
precisions) reproduced in the README.
tests/verification.py portable post-install check (rename from
the previous tests/smoke.py).
Multi-OS CI: tests.yml covers Ubuntu / Windows / macOS-14 across
Python 3.10-3.13; wheels.yml produces a universal CPU wheel
plus per-OS / per-CUDA / per-Python CUDA wheels and sdist. The
full wheels.yml matrix was dry-run green at run 27057416779
(6/6 cells, ~21 min wallclock).
Docs: rewritten Quick start, NVIDIA capitalization for FP64 / FP32
/ FP16 / TF32, expanded INSTALL.md with conda and conda-free build
paths, hardware/experimental-setup tables for Performance
benchmarks.

Test plan

CI (tests.yml) green on 0.2.0 branch before merge
After merge to main, tag v0.2.0 from main
Verify wheels.yml on the tag push: 6 build cells + sdist + publish
Verify pypi.org pages for sfa, sfa-cu128, sfa-cu132 at 0.2.0
Smoke install in a clean venv: pip install sfa-cu132; python -c "import sfa; print(sfa.__version__)"

Required before tag push

PyPI trusted-publisher relationships must be configured at pypi.org
for all three release projects, otherwise the publish step will
fail authentication:

sfa
sfa-cu128
sfa-cu132

For each: Owner dwgoon, Repository sfa, Workflow filename
wheels.yml, Environment empty.

- Move the time units (ms, s) out of the column headers and onto each cell in the Small networks and Large networks tables. The numbers now carry their own unit, so a partial copy of the table no longer loses the units; column headers are reserved for precision modes. - Rename the heading "Benchmarks" -> "Performance benchmarks" so the section is unambiguous in the table of contents. - Rename the conda environment shipped in environment-cuda.yml from "sfa-cu132" to "sfa". The env name was an arbitrary label; matching the project name makes the install snippets read more naturally (`conda activate sfa` instead of `conda activate sfa-cu132`). PyPI package names (`sfa-cu128`, `sfa-cu132`, `sfa-cu133`) are unrelated to the conda env name and remain unchanged. - Update README, INSTALL.md, and doc/install.md to use the new env name everywhere it appears.

Run 27055508425 surfaced one CPU and four CUDA failure modes when manually triggering wheels.yml. This rewrites the workflow so the full matrix is expected to pass: CPU wheel - cibuildwheel rejected the pure-Python wheel - Symptom: 'Build failed because a pure Python wheel was generated.' - Fix: drop cibuildwheel for the CPU target and use `python -m build` to produce one universal sfa-<ver>-py3-none-any.whl. The 3-OS x 4-py matrix collapses to a single job; pure-Python wheels are not platform or interpreter specific. CUDA wheels - Jimver/cuda-toolkit was too old for CUDA 13.x - Symptom: 'Error: Version not available: 13.2.0 / 13.3.0' on every cu132 / cu133 cell. - Fix: bump Jimver/cuda-toolkit v0.2.21 -> v0.2.35 (2026-03-29 release; default CUDA is 13.2 there). CUDA wheels - sub-package names rejected by Ubuntu apt - Symptom: 'Unable to locate package cuda-cublas-12-8 / cuda-cublas_dev-12-8 / cuda-nvrtc_dev-12-8'. - Cause: Jimver prefixes every `sub-packages` entry with `cuda-`, but Ubuntu's CUDA apt repos ship cuBLAS and NVRTC as `libcublas-*` / `libnvrtc-*`. They must live under the separate `non-cuda-sub-packages` input, which is passed through verbatim. - Fix: sub-packages: '["nvcc", "cudart", "cudart-dev"]' non-cuda-sub-packages: '["libcublas", "libcublas-dev", "libnvrtc", "libnvrtc-dev"]' CUDA wheels - cibuildwheel rejected CIBW_ENVIRONMENT - Symptom: 'cibuildwheel: Malformed environment option ...'. - Cause: cibuildwheel parses CIBW_ENVIRONMENT with bashlex. The unquoted semicolons inside SFA_CUDA_ARCH=sm_70;sm_75;... are interpreted as Bash statement terminators. Also, the original block unconditionally exported Linux-only CUDA_PATH=/usr/local/cuda and PATH=/usr/local/cuda/bin:$PATH, which broke the Windows runs since Jimver on Windows installs CUDA under C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vXX.Y. - Fix: quote every value that contains ';' or '<' (SFA_CUDA_ARCH, SFA_CUDA_RUNTIME_REQUIRES), and split the workflow env into CIBW_ENVIRONMENT_LINUX (with the bind-mount paths) and CIBW_ENVIRONMENT_WINDOWS (which inherits CUDA_PATH from the Jimver step automatically). CUDA wheels - cuBLAS / NVRTC headers and libs were scattered - Cause: the `network` method drops cuBLAS / NVRTC headers in /usr/include and shared libs in /usr/lib/x86_64-linux-gnu/, while setup.py only looks under $CUDA_HOME/{include,lib64}/. Inside the manylinux container the additional host dirs are not mounted, so the build would have failed at link time even after the previous fixes. - Fix: add a Linux-only staging step that copies cublas*.h, nvrtc.h, libcublas*.so*, libnvrtc*.so* into /usr/local/cuda/{include,lib64}/ before cibuildwheel runs. A single `-v /usr/local/cuda:/usr/local/cuda:ro` bind mount then exposes everything the build needs to the container. Publish job 'needs' updated to reference build_cpu_wheel (singular). The `if: false` gate stays in place; PyPI upload is still off.

Second wheels-build dry run (27056380944) made progress (CPU + sdist now pass) but the 6 CUDA cells still failed at the CUDA toolkit install: - Linux: 'Unable to locate package libnvrtc-12-8 / libnvrtc-dev-12-8'. Only cuBLAS lives in Ubuntu's `lib*` package family on the NVIDIA apt repo. NVRTC ships with the standard `cuda-` prefix (cuda-nvrtc-12-8, cuda-nvrtc-dev-12-8), so it belongs back in Jimver's `sub-packages` input, not in `non-cuda-sub-packages`. - Windows: the NVIDIA Windows installer rejected `cudart-dev_12.8` (exit code 3772776473). Windows uses unprefixed names with underscores (cublas_dev, nvrtc_dev, ...) and does not split a separate cudart-dev sub-package - the headers ship inside cudart. Fix: - Move the toolkit install to two OS-conditional steps, each with the sub-package naming convention that matches its target installer. - Linux: sub-packages now ["nvcc", "cudart", "cudart-dev", "nvrtc", "nvrtc-dev"] (all cuda- prefixed) and non-cuda-sub-packages reduced to just ["libcublas", "libcublas-dev"]. - Windows: sub-packages ["nvcc", "cudart", "cublas", "cublas_dev", "nvrtc", "nvrtc_dev"] - the working configuration before the cudart-dev typo was introduced. - Drop NVRTC from the Linux staging step. With NVRTC pulled in via the cuda- prefix it lands in $CUDA_HOME/{include,lib64} directly; only cuBLAS (still installed as a lib* package) needs to be moved out of /usr/include and /usr/lib/x86_64-linux-gnu/ so the bind mount of /usr/local/cuda sees everything. CPU wheel and sdist are unchanged.

…l, drop cu133 Third dry run (27056467626) made it past CUDA install on Linux for cu128 / cu132 (good) but surfaced three new blocking issues. Fixing each so the matrix can come up green: Issue 1 - cu133 cells fail at install with 'Version not available: 13.3.0' - Jimver/cuda-toolkit v0.2.35 does not have CUDA 13.3 in its version table yet. Drop the sfa-cu133 row from the matrix until a newer Jimver release supports it; re-add at that point. Update docs to match: README, INSTALL.md, doc/install.md, the SFA_PACKAGE_NAME example, the conda-env note about which CUDA majors CI tests, and the `pip install` snippet now reference cu132 instead of cu133. Issue 2 - Linux Build wheels fails inside auditwheel - Error: 'Cannot repair wheel, because required library "libcudart.so.12" could not be located'. - auditwheel was trying to vendor the NVIDIA runtime shared libs into the wheel. We don't want that - the wheel declares pinned PyPI dependencies on nvidia-cublas-cuXX / nvidia-cuda-runtime-cuXX / nvidia-cuda-nvrtc-cuXX through SFA_CUDA_RUNTIME_REQUIRES, so the libs arrive via pip at install time. - Fix: override CIBW_REPAIR_WHEEL_COMMAND_LINUX to pass --exclude for libcudart, libcublas, libcublasLt, libnvrtc, and libnvrtc-builtins in both soname.12 and soname.13 forms (covers cu128 and cu132). Issue 3 - Windows Build wheels fails with 'nvcc fatal : Cannot find compiler cl.exe in PATH' - cibuildwheel spawns the build in a subprocess that does not inherit the Developer Command Prompt environment, so cl.exe is not visible to nvcc even though MSVC is installed on the runner. - Fix: insert ilammy/msvc-dev-cmd@v1 step (Windows only) after the Jimver toolkit step; it exports VCINSTALLDIR / PATH and friends so any subsequent process can find cl.exe. CPU wheel, sdist, CUDA install on Linux (cu128/cu132), and CUDA install on Windows (cu128/cu132) are unchanged.

Fourth dry run (27056657014) graduated cu128-windows to a full pass (11 minutes including the test phase). Three other CUDA cells failed with two new error classes: Issue 1 - 'nvcc fatal: Unsupported gpu architecture compute_70' on both cu132-ubuntu and cu132-windows - CUDA 13 nvcc no longer accepts -gencode for sm_70. The deprecation warning had been visible since CUDA 12 ('Support for offline compilation for architectures prior to sm_75 will be removed in a future release'); CUDA 13 is that release. - Fix: remove sm_70 from the cu132 archs list. cu128 keeps sm_70 because CUDA 12.8 still supports it. Volta users (P100, V100, Quadro GV100, etc.) install sfa-cu128. Issue 2 - cu128-ubuntu test phase failed compiling scipy from source - The sfa wheel itself built and was repaired (auditwheel exclude rules worked). The failure was when cibuildwheel ran the test command in a fresh venv: cp310 installed scipy 1.15.3 from a manylinux2014 wheel and succeeded, but cp311 picked up scipy 1.16+ which has dropped manylinux2014 wheels. pip then fell back to an sdist build, the manylinux2014 container had no OpenBLAS, and meson aborted with 'Dependency OpenBLAS not found'. - Fix: set CIBW_MANYLINUX_X86_64_IMAGE to manylinux_2_28. That is the base image scipy 1.16+ targets and matches what the rest of the scientific-Python wheel matrix has converged to. The bind-mount of /usr/local/cuda continues to work the same way. After these two changes the expected matrix outcome is: - CPU universal wheel : pass (already passing) - sdist : pass (already passing) - cuda-sfa-cu128-ubuntu : should pass (manylinux_2_28 scipy) - cuda-sfa-cu128-windows : pass (already passing) - cuda-sfa-cu132-ubuntu : should pass (no sm_70, manylinux_2_28) - cuda-sfa-cu132-windows : should pass (no sm_70)

Fifth dry run (27057065902) graduated 5 of 6 cells; cu132-windows was the lone holdout, dying with CUDA\v13.2\include\cuda_runtime.h(82): fatal error C1083: Cannot open include file: 'crt/host_config.h': No such file or directory CUDA 13's Windows installer split the runtime developer headers (including crt/host_config.h, which cuda_runtime.h pulls in to wire nvcc up to the host MSVC) into a separate `cudart_dev` sub-package. The CUDA 12.8 Windows installer kept those headers inside `cudart`, so the prior config covered cu128-windows by accident but had no chance against cu132. Fix: add cudart_dev to the Windows sub-packages list. The package also exists on CUDA 12.8 (where it is a thin no-op overlay), so the same list works for both wheels. Linux is unaffected: the Linux toolkit install already lists "cudart-dev" alongside "cudart", and those headers landed in /usr/local/cuda/include/crt/ as expected. Expected outcome of the next run: all 6 cells green.

Sixth dry run (27057337822) regressed both Windows cells: adding `cudart_dev` to the Windows sub-packages list (which I thought would just be a no-op on CUDA 12.8) instead broke the install on BOTH cu128 and cu132 because that sub-package name does not exist on Windows for either CUDA version. Linux jobs were unaffected. Root cause of the underlying problem: NVIDIA's Windows installer reorganised which sub-package carries crt/host_config.h between CUDA 12 and CUDA 13, and the new owner is not consistently called `cudart_dev`. Different secondary sources name it differently and none of those names work for both 12.8 and 13.2. Fix: skip the sub-package guessing game entirely on Windows by switching to method: 'local'. Jimver then downloads the full NVIDIA installer .exe and runs it silently, which lays down the complete include tree (crt/host_config.h included) regardless of how NVIDIA re-tags individual chunks in future point releases. Cost: one extra ~3 GB download per Windows cell. cu128-windows had been completing in 11 minutes on the network method, so the local method should land it somewhere around 14-16 minutes - well within the GitHub Actions job budget. Linux keeps the network method + curated sub-packages + cuBLAS staging step. That combination was verified green in the previous run (cu128-ubuntu 4m55s, cu132-ubuntu 4m57s).

Bump version 0.2.0.dev0 -> 0.2.0 in pyproject.toml and sfa/__init__.py, and enable the publish-to-pypi job in wheels.yml (the prior `if: false` guard is replaced by a tag-ref check) so that pushing a v0.2.0 tag from main triggers wheel build + PyPI upload. The wheels.yml matrix itself is unchanged; the same six cells (CPU universal, sdist, sfa-cu128 / sfa-cu132 on ubuntu and windows) just verified green in dry run 27057416779 will run again on the tag push, this time producing 0.2.0 artifacts and shipping them to PyPI through the configured trusted-publisher relationships for sfa, sfa-cu128, and sfa-cu132.

dwgoon added 8 commits June 6, 2026 15:53

dwgoon merged commit 1528f39 into main Jun 6, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.2.0#6

Release 0.2.0#6
dwgoon merged 8 commits into
mainfrom
0.2.0

dwgoon commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dwgoon commented Jun 6, 2026

Summary

Test plan

Required before tag push

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant