Add setup.py to build the _C extension (CUDA and ROCm) by jeffdaily · Pull Request #12 · Luo-Yihao/FaithC

jeffdaily · 2026-06-17T14:57:36Z

The tree ships the _C sources (bindings.cpp, kernels.cu) and ops.py hard-imports them (from . import _C), but the only build config is a pyproject.toml that declares a pure-Python package with no extension module. A clean source install (pip install -e ., or pixi) therefore never compiles _C, so from . import _C fails. This adds the missing build wiring.

setup.py builds _C with PyTorch's CUDAExtension and BuildExtension. On a CUDA PyTorch it compiles the original CUDA sources unchanged; on a ROCm PyTorch BuildExtension hipifies the same sources automatically, so one source tree builds for both backends. This complements the ROCm kernel/runtime support already in main: that made the kernels HIP-clean, and this makes the extension actually build (on CUDA and ROCm alike).

The kernels use only atomicAdd, __syncthreads, dynamic shared memory and float math, with no warp-level intrinsics, so they are wavefront-size agnostic and need no per-architecture changes.

Two further changes make the build correct on Windows (both are no-ops on Linux and CUDA):

The int64 index/candidate buffers were typed long, which is 32-bit on Windows (LLP64) while torch::kInt64 tensors are 64-bit; the kernel signatures, data_ptr<>() calls and casts now use int64_t, which is correct on every platform and identical to long on Linux LP64.
setup.py adds a Windows-only /ALTERNATENAME link directive. c10.dll, built with clang-cl, does not export the c10::ValueError(SourceLocation, std::string) constructor inherited via using Error::Error;, so headers pulled in through <torch/extension.h> that expand TORCH_CHECK_VALUE fail to link (LNK2001); the directive aliases the missing import thunk to the exported c10::Error(SourceLocation, std::string) constructor. The same root cause was fixed upstream in [c10] Fix missing symbol exports for ValueError/NotImplementedError on Windows pytorch/pytorch#175340 (explicit exported constructors for the affected c10::Error subclasses); this alias keeps the extension building on PyTorch releases from before that fix.

Building

# CUDA (unchanged)
pip install -e . --no-build-isolation

# ROCm (set the arch(es) for your GPU)
PYTORCH_ROCM_ARCH=gfx90a pip install -e . --no-build-isolation

The README's Manual Setup section documents the ROCm path alongside the existing CUDA instructions. .gitignore is extended to cover the hipify build artifacts (*.hip, *.prehip, *.so.*).

Validation

Built on an AMD Instinct MI250X (gfx90a, ROCm 7.2): a clean setup.py build_ext hipifies, compiles and links _C (the .so carries a native gfx90a code object). A synthetic-tensor harness drives all four _C bindings on the GPU and compares against a pure-torch CPU reference (the atomicAdd output-slotting kernels as order-independent (a, t) pair sets, the deterministic kernels exactly and for rerun stability); all checks pass, with Moller-Trumbore dot-product drift of 3.5e-7 within the kernels' eps thresholds. A gfx90a;gfx1100 multi-architecture binary also builds with both code objects present.

The end-to-end demo additionally depends on atom3d and torch_scatter on the GPU; bringing those up on ROCm is left as a follow-up, so this change covers the _C kernel layer those higher-level paths call into.

The faithcontour._C extension has no build wiring on main: there is no setup.py and pyproject declares only a pure-Python package, so a source install never compiles _C and "from . import _C" fails at import. This adds a setup.py with a torch CUDAExtension/BuildExtension over _C/{bindings.cpp,kernels.cu}. On a CUDA PyTorch it builds the original CUDA sources unchanged; on a ROCm PyTorch BuildExtension hipifies the same sources automatically, so the extension builds for AMD GPUs with no source changes. This complements the existing ROCm kernel/runtime support already on main by providing the missing compiled artifact. setup.py also carries a Windows-only /ALTERNATENAME linker directive: c10.dll does not export the inherited c10::ValueError(SourceLocation, string) constructor that <torch/extension.h> references, so the import thunk is aliased to the exported c10::Error(SourceLocation, string) (ValueError IS-A Error with no extra data members). kernels.cu converts the int64 index and candidate buffer types and casts from long to int64_t. This is a no-op on LP64 Linux (long == int64_t) but is required on Windows LLP64 where long is 32-bit while the torch int64 tensors backing these buffers are 64-bit. .gitignore adds the hipify byproducts (*.hip, *.prehip) and versioned shared objects (*.so.*) so a ROCm build leaves the tree clean. This work was authored with the assistance of Claude, an AI assistant. Test Plan: ``` rm -f src/faithcontour/_C/kernels.hip src/faithcontour/_C/*.prehip rm -rf build cd src && HIP_VISIBLE_DEVICES=0 PYTORCH_ROCM_ARCH=gfx90a \ python setup.py build_ext --inplace HIP_VISIBLE_DEVICES=0 python3 agent_space/faithc_harness.py ``` Built cleanly on gfx90a (AMD Instinct MI250X, ROCm 7.2); the harness drives all four _C bindings on GPU against a torch CPU reference and reports all checks PASS.

PyTorch's BuildExtension on Windows adds .cu/.cuh to the MSVC compiler driver's _cpp_extensions list so the spawn wrapper can intercept those files and route them to hipcc instead of cl.exe. However it does not add .hip. After a PyTorch update (torch 2.9.1+rocm7.14, Jun 2026), the hipify step renames kernels.cu to kernels.hip before MSVC's compile loop runs, and the MSVC driver raises "Don't know how to compile *.hip" because .hip is absent from _cpp_extensions. Fix by subclassing BuildExtension and appending .hip to _cpp_extensions on Windows before delegating to the parent, which installs the spawn wrapper that routes .hip -> hipcc. This fix is Windows-only (the guard checks sys.platform == "win32" and the hasattr guard is a no-op on Linux where clang is the host compiler). Authored with the assistance of Claude, an AI assistant. Test Plan: ``` export PATH="/c/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.44.35207/bin/HostX64/x64:$PATH" VENV=/b/develop/TheRock/external-builds/pytorch/.venv rm -f src/faithcontour/_C/kernels.hip && rm -rf build/ HIP_VISIBLE_DEVICES=0 PYTORCH_ROCM_ARCH=gfx1201 \ ROCM_HOME=$VENV/Lib/site-packages/_rocm_sdk_devel \ DISTUTILS_USE_SDK=1 \ $VENV/Scripts/python.exe setup.py build_ext --inplace HIP_VISIBLE_DEVICES=0 $VENV/Scripts/python.exe agent_space/faithc_harness_win.py ``` Built for gfx1201 (AMD Radeon RX 9070 XT, RDNA4, Windows 11); the harness reports 17/17 PASS on all four _C kernel bindings.

jeffdaily force-pushed the moat-port branch from da81ea7 to 9827cc8 Compare June 17, 2026 14:59

jeffdaily force-pushed the moat-port branch from 9827cc8 to 1d47e7a Compare June 17, 2026 15:53

jeffdaily changed the title ~~Add AMD GPU (ROCm) support for the faithcontour._C extension~~ Add setup.py to build the _C extension (CUDA and ROCm) Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add setup.py to build the _C extension (CUDA and ROCm)#12

Add setup.py to build the _C extension (CUDA and ROCm)#12
jeffdaily wants to merge 2 commits into
Luo-Yihao:mainfrom
jeffdaily:moat-port

jeffdaily commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeffdaily commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Building

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jeffdaily commented Jun 17, 2026 •

edited

Loading