Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
252 changes: 172 additions & 80 deletions .github/workflows/wheels.yml

Large diffs are not rendered by default.

24 changes: 11 additions & 13 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ NVIDIA driver new enough for that CUDA version.
| `sfa` | none | - | Linux, macOS, Windows |
| `sfa-cu128` | 12.8.x | 570 (Linux / Win) | Linux, Windows |
| `sfa-cu132` | 13.2.x | 580 | Linux, Windows |
| `sfa-cu133` | 13.3.x | 580 | Linux, Windows |

All CUDA wheels share the same AOT-compiled SASS matrix (SM 7.0
through SM 12.0: Volta, Turing, Ampere, Ada, Hopper, Blackwell), plus
Expand All @@ -37,14 +36,13 @@ that is the maximum CUDA version your driver supports.

| Package | CUDA bundled | Minimum NVIDIA driver | When to pick |
|-------------|--------------|------------------------|-----------------------------------------------------------|
| `sfa-cu133` | 13.3.x | 580 | Newest hardware / drivers; default for fresh installs. |
| `sfa-cu132` | 13.2.x | 580 | Matches the `sfa-cu132` conda env used for development. |
| `sfa-cu132` | 13.2.x | 580 | Newest CUDA stack; matches `environment-cuda.yml`. |
| `sfa-cu128` | 12.8.x | 570 | Older driver (CUDA 12 line); broadest backwards compat. |

Example (install the newest one):

```bash
pip install sfa-cu133
pip install sfa-cu132
```

Requires Python 3.10+. macOS is not supported because Apple ended
Expand Down Expand Up @@ -81,15 +79,15 @@ the host compiler, and `conda` will not install it for you.
git clone https://github.com/dwgoon/sfa.git && cd sfa

conda env create -f environment-cuda.yml
conda activate sfa-cu132
conda activate sfa
pip install -e . # builds the CUDA extension via the env's nvcc

# CPU-only variant (skip CUDA even if nvcc is on PATH):
SFA_BUILD_CUDA=0 pip install -e .
```

This is also how the project maintainers build on Windows: the
`sfa-cu132` env provides `nvcc` and cuBLAS, while system MSVC handles
This is also how the project maintainers build on Windows: the `sfa`
env provides `nvcc` and cuBLAS, while system MSVC handles
`bindings.cpp`. The resulting extension is e.g.
`sfa/_cuda/_native.cp312-win_amd64.pyd`.

Expand All @@ -98,8 +96,8 @@ is what the maintainers test against. The same workflow works for any
CUDA major / minor that has a `cuda-toolkit` build on the `nvidia`
channel: edit the two `cuda-version` / `cuda-toolkit` pins in lockstep
(see [What `environment-cuda.yml` provides](#what-environment-cudayml-provides)
below) and rename the env on the first line of the file. CUDA 12.8 and
13.3 environments have been tested in CI.
below) and rename the env on the first line of the file. CUDA 12.8
and 13.2 environments have been tested in CI.

### Option B: conda-free build (system CUDA + system C++ compiler)

Expand Down Expand Up @@ -180,7 +178,7 @@ and falls through to a CPU-only build (printing
### What `environment-cuda.yml` provides

The shipped conda environment file creates a self-contained build
environment named `sfa-cu132` that does **not** require any
environment named `sfa` that does **not** require any
system-wide CUDA install. Everything the build needs - the CUDA
compiler, the CUDA runtime, cuBLAS headers and import libs, plus the
Python build and runtime dependencies - is pulled in from the
Expand All @@ -199,14 +197,14 @@ Concretely, the file pins:

The `cuda-toolkit` meta-package pulls in `nvcc`, `cudart`, `nvrtc`,
`cccl`, `cupti`, the profiler API, and the rest of the CUDA dev
toolchain. After `conda activate sfa-cu132`, `nvcc` is on `PATH` and
toolchain. After `conda activate sfa`, `nvcc` is on `PATH` and
`setup.py`'s CUDA-extension build picks it up automatically.

Notes for adjusting the file:

- To target a different CUDA major version, change the two `nvidia::`
pins (`cuda-version` and `cuda-toolkit`) in lockstep. The env name
on the first line (`sfa-cu132`) is just a label; rename it freely.
on the first line (`sfa`) is just a label; rename it freely.
- A host C++ compiler is still required (MSVC on Windows, GCC on
Linux). The toolchain itself is not bundled by `cuda-toolkit`;
conda will not install it for you.
Expand All @@ -220,7 +218,7 @@ Notes for adjusting the file:
|----------------------|------------------------------------------------------------------------|
| `SFA_BUILD_CUDA` | `0` to force a pure-Python install. Default: build if `nvcc` is found. |
| `SFA_CUDA_ARCH` | Semicolon-separated SM list, e.g. `sm_89` (dev) or `sm_70;sm_80;sm_89`. Default: the full wheel-wide AOT matrix. |
| `SFA_PACKAGE_NAME` | Override the PyPI name (used by CI to produce e.g. `sfa-cu132` or `sfa-cu133` from the same source tree). |
| `SFA_PACKAGE_NAME` | Override the PyPI name (used by CI to produce e.g. `sfa-cu128` or `sfa-cu132` from the same source tree). |

## Verify the install

Expand Down
37 changes: 18 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ set of CUDA optimized `sfa-cuXYZ` versions:
| `sfa` | none | - | Linux, macOS, Windows |
| `sfa-cu128` | 12.8.x | 570 (Linux / Win) | Linux, Windows |
| `sfa-cu132` | 13.2.x | 580 | Linux, Windows |
| `sfa-cu133` | 13.3.x | 580 | Linux, Windows |

Each CUDA wheel ships ahead-of-time compiled SASS for NVIDIA SM 7.0
through SM 12.0 (Volta, Turing, Ampere, Ada, Hopper, Blackwell) plus a
Expand All @@ -67,7 +66,7 @@ supports.
Example (install the newest one):

```bash
pip install sfa-cu133
pip install sfa-cu132
```

> [!IMPORTANT]
Expand All @@ -87,7 +86,7 @@ self-contained env):
```bash
git clone https://github.com/dwgoon/sfa.git && cd sfa
conda env create -f environment-cuda.yml
conda activate sfa-cu132
conda activate sfa
pip install -e .
```

Expand Down Expand Up @@ -280,7 +279,7 @@ S_gpu = compute_influence(
)
```

## Benchmarks
## Performance benchmarks

### Hardware setup

Expand Down Expand Up @@ -329,24 +328,24 @@ S_gpu = compute_influence(

### Small networks

| # Nodes | # Edges | CPU iter (FP64) ms | CPU LAPACK (FP64) ms | CUDA (FP64) ms |
|---------|----------|--------------------|----------------------|-----------------------|
| 32 | 992 | 0.1 ± 0.0 | 0.2 ± 0.0 (0.4x) | 1.3 ± 0.2 (0.06x) |
| 64 | ~4.0 K | 0.2 ± 0.0 | 0.2 ± 0.0 (0.8x) | 1.4 ± 0.1 (0.13x) |
| 128 | ~16.3 K | 2.5 ± 0.0 | 0.4 ± 0.0 (**7.2x**) | 1.9 ± 0.1 (1.3x) |
| 256 | ~65.3 K | 6.9 ± 0.2 | 2.4 ± 0.1 (**2.8x**) | 3.1 ± 0.8 (2.2x) |
| 512 | ~262 K | 38.8 ± 1.7 | 190 ± 46 (0.2x) | 6.4 ± 0.2 (**6.0x**) |
| 1024 | ~1.05 M | 180 ± 8 | 486 ± 89 (0.4x) | 47 ± 10 (**3.8x**) |
| 2048 | ~4.19 M | 2140 ± 320 | 3880 ± 2990 (0.6x) | 245 ± 2 (**8.7x**) |
| 4096 | ~16.8 M | 12520 ± 2380 | 5690 ± 1390 (2.2x) | 4320 ± 580 (**2.9x**) |
| # Nodes | # Edges | CPU iter (FP64) | CPU LAPACK (FP64) | CUDA (FP64) |
|---------|----------|--------------------|---------------------------|-----------------------------|
| 32 | 992 | 0.1 ± 0.0 ms | 0.2 ± 0.0 ms (0.4x) | 1.3 ± 0.2 ms (0.06x) |
| 64 | ~4.0 K | 0.2 ± 0.0 ms | 0.2 ± 0.0 ms (0.8x) | 1.4 ± 0.1 ms (0.13x) |
| 128 | ~16.3 K | 2.5 ± 0.0 ms | 0.4 ± 0.0 ms (**7.2x**) | 1.9 ± 0.1 ms (1.3x) |
| 256 | ~65.3 K | 6.9 ± 0.2 ms | 2.4 ± 0.1 ms (**2.8x**) | 3.1 ± 0.8 ms (2.2x) |
| 512 | ~262 K | 38.8 ± 1.7 ms | 190 ± 46 ms (0.2x) | 6.4 ± 0.2 ms (**6.0x**) |
| 1024 | ~1.05 M | 180 ± 8 ms | 486 ± 89 ms (0.4x) | 47 ± 10 ms (**3.8x**) |
| 2048 | ~4.19 M | 2140 ± 320 ms | 3880 ± 2990 ms (0.6x) | 245 ± 2 ms (**8.7x**) |
| 4096 | ~16.8 M | 12520 ± 2380 ms | 5690 ± 1390 ms (2.2x) | 4320 ± 580 ms (**2.9x**) |

### Large networks

| # Nodes | # Edges | CPU LAPACK (FP64) s | CUDA TF32 (FP32) s | CUDA FP32 (no TF32) s | CUDA FP16 s |
|---------|---------|---------------------|----------------------|-----------------------|--------------------------|
| 5000 | ~25 M | 5.10 ± 2.24 | 0.366 ± 0.027 (14x) | 0.356 ± 0.034 (14x) | 0.349 ± 0.037 (**15x**) |
| 10000 | ~100 M | 17.60 ± 0.57 | 1.55 ± 0.05 (11x) | 4.07 ± 0.06 (4.3x) | 1.13 ± 0.16 (**16x**) |
| 20000 | ~400 M | 70.88 ± 0.79 | 9.13 ± 0.10 (7.8x) | 16.30 ± 0.28 (4.3x) | 4.28 ± 0.02 (**17x**) |
| # Nodes | # Edges | CPU LAPACK (FP64) | CUDA TF32 (FP32) | CUDA FP32 (no TF32) | CUDA FP16 |
|---------|---------|-------------------|--------------------------|----------------------------|----------------------------|
| 5000 | ~25 M | 5.10 ± 2.24 s | 0.366 ± 0.027 s (14x) | 0.356 ± 0.034 s (14x) | 0.349 ± 0.037 s (**15x**) |
| 10000 | ~100 M | 17.60 ± 0.57 s | 1.55 ± 0.05 s (11x) | 4.07 ± 0.06 s (4.3x) | 1.13 ± 0.16 s (**16x**) |
| 20000 | ~400 M | 70.88 ± 0.79 s | 9.13 ± 0.10 s (7.8x) | 16.30 ± 0.28 s (4.3x) | 4.28 ± 0.02 s (**17x**) |

- CPU paths show noticeably higher variance than GPU paths (CPU
LAPACK FP64 stddev reaches ~25-77% of the mean at small `N`),
Expand Down
12 changes: 5 additions & 7 deletions doc/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,7 @@ one** into a given environment.
|---------------|--------|---------------------|-----------------------------|
| `sfa` | none | - | Linux, macOS, Windows |
| `sfa-cu128` | 12.8.x | 570 (Linux / Win) | Linux, Windows |
| `sfa-cu132` | 13.2.x | 580 | Linux, Windows |
| `sfa-cu133` | 13.3.x | 580 | Linux, Windows (newest) |
| `sfa-cu132` | 13.2.x | 580 | Linux, Windows (newest) |

## Requirements

Expand All @@ -33,10 +32,9 @@ Run `nvidia-smi` and look at the "CUDA Version" column. That is the
number:

```text
nvidia-smi -> "CUDA Version: 13.3" -> any of sfa-cu128 / cu132 / cu133
nvidia-smi -> "CUDA Version: 13.0" -> sfa-cu128
nvidia-smi -> "CUDA Version: 12.8" -> sfa-cu128
nvidia-smi -> "CUDA Version: 12.6" -> upgrade your driver or use `sfa` (CPU)
nvidia-smi -> "CUDA Version: 13.2" or higher -> sfa-cu132 or sfa-cu128
nvidia-smi -> "CUDA Version: 12.8" - 13.1 -> sfa-cu128
nvidia-smi -> "CUDA Version: 12.6" -> upgrade your driver or use `sfa` (CPU)
```

When in doubt, start with `sfa-cu128` for the widest driver coverage
Expand Down Expand Up @@ -102,7 +100,7 @@ and the runtime Python deps:

```bash
conda env create -f environment-cuda.yml
conda activate sfa-cu132
conda activate sfa
pip install -e .
```

Expand Down
2 changes: 1 addition & 1 deletion environment-cuda.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: sfa-cu132
name: sfa
channels:
- nvidia
- conda-forge
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ build-backend = "setuptools.build_meta"
# wheels under the SFA_PACKAGE_NAME env var.
[project]
name = "sfa"
version = "0.2.0.dev0"
version = "0.2.0"
description = "Signal flow analysis"
readme = "README.md"
license = { text = "MIT" }
Expand Down
2 changes: 1 addition & 1 deletion sfa/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.2.0.dev0"
__version__ = "0.2.0"

from .base import *
from .containers import AlgorithmSet
Expand Down
Loading