fix: tag tritonserver wheel with arch-specific platform tag #495
Conversation
The tritonserver wheel was shipping as
"tritonserver-0.0.0-py3-none-any.whl": version fell back to the
setuptools 0.0.0 default because `dynamic = ["version"]` had no
resolver configured, and the platform tag was "none-any" despite the
wheel containing an arch-specific CPython extension
(tritonserver/_c/triton_bindings.*.so).
Changes:
- pyproject.toml: add [tool.setuptools.dynamic] version = {file =
"TRITON_VERSION"} so the declared dynamic version resolves from the
Triton release file already staged by the wheel build.
- python/setup.py: add a bdist_wheel override that sets
root_is_pure = False so setuptools tags the wheel with the current
build platform (e.g. linux_x86_64 / linux_aarch64) via its normal
auto-detection.
- python/build_wheel.py: copy TRITON_VERSION into the wheel build
directory so the dynamic-file version lookup succeeds at build time.
Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Raw linux_x86_64 / linux_aarch64 wheels are not accepted by canonical PyPI — the platform tag must be manylinux_2_X_<arch>. Port the pattern established for tritonclient in TRI-286: after the wheel is built, run `auditwheel repair` to auto-discover the minimum manylinux tag from the embedded triton_bindings.*.so glibc symbol dependencies, with a `python -m wheel tags --platform-tag manylinux_2_28_<arch>` fallback for the "no ELF" pure-Python case. When auditwheel is not available on PATH (e.g. local non-container builds), keep the linux_<arch> wheel and log a warning so builds do not regress; the Poetry / pip-tools lock-file problem is already solved by the distinct filename. Leaves a NOTE in setup.py: the embedded binding .so is CPython-ABI-specific, so the wheel will need cp<XY>-cp<XY> python+abi tags once consumers are ready to gate installs on the exact interpreter version. Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983, TRI-286
Adopt PEP 427's optional build-tag slot so two wheels of the same version (e.g. successive reruns of a CI pipeline) can coexist in the same index without filename collision. Preferred source is GitLab's CI_PIPELINE_ID with a BUILD_NUMBER fallback for other CI systems; both are guaranteed to start with a digit as required by PEP 427. The value is forwarded through `python -m build` to the setuptools backend's `bdist_wheel --build=<N>` (alias for --build-number) via the PEP 517 `-C--build-option=` config setting. Matches the build-tag slot already used by the RHEL .zip artifact naming convention in .gitlab-ci.yml. Build-arg handoff through build.py is a separate follow-up; this change is a no-op in local non-CI builds since neither env var is set. Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Add a _compose_version() helper in build_wheel.py that appends a PEP 440 local-version segment to the base TRITON_VERSION when the NVIDIA_UPSTREAM_VERSION and/or CUDA_VERSION env vars are set. This makes the wheel filename carry the same nv<X>.cu<Y> identifiers already used by the RHEL .zip artifact naming in .gitlab-ci.yml: tritonserver-<TRITON_VERSION>+nv<NV>.cu<CUDA>-<CI_PIPELINE_ID>-cp<XY>-cp<XY>-manylinux_2_28_<arch>.whl Local-version segments are informational and do not affect pip version comparison, so existing pins like `tritonserver==2.69.0` continue to match. The helper is a no-op when neither env var is set (local non-CI builds). Env-var propagation from the CI runner into the build container is handled by the paired server PR's change to build.py's docker-run invocation. Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983, TRI-286
…TRI-983) Mirror the server-side refinement for the tritonserver wheel: 1. Use NVIDIA_BUILD_ID (from --build-id) as the PEP 427 build-tag source instead of a separate CI_PIPELINE_ID / BUILD_NUMBER env var, aligning the wheel filename with the existing Triton convention already used throughout the build system. 2. Detect CUDA_VERSION from the container-local env with a /usr/local/cuda/version.json fallback (canonical location for the installed toolkit), since CUDA_VERSION cannot be propagated from the host — only the container has it set. Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Pipeline 49141836 / job 302710609 failed with:
auditwheel.main_repair: Invalid binary wheel, found the following
shared library/libraries in purelib folder:
triton_bindings.cpython-312-x86_64-linux-gnu.so
The wheel has to be platlib compliant in order to be repaired by
auditwheel.
The wheel's filename was correctly tagged
(tritonserver-<ver>-<build>-cp312-cp312-linux_x86_64.whl), but the
WHEEL metadata inside still declared Root-Is-Purelib: true, so
auditwheel rejected the repair because finding a .so in the purelib
tree is an inconsistent state.
Root cause: the previous bdist_wheel override (via the `wheel`
package) set `self.root_is_pure = False`, but modern setuptools
(>=70) provides its own setuptools.command.bdist_wheel and ignores
overrides registered against wheel.bdist_wheel. The platform-tag
part still worked because it is derived from the `has_ext_modules`
check on the Distribution, not from root_is_pure — but the
Root-Is-Purelib metadata flag was never flipped.
Fix: subclass Distribution and override has_ext_modules() to return
True. This is the canonical way to tell setuptools the wheel is
binary without having to declare a dummy ext_module or trigger a
compilation step. setuptools then:
- sets WHEEL Root-Is-Purelib: false (required for auditwheel repair),
- auto-derives cp<XY>-cp<XY>-linux_<arch> tags from the current
interpreter and sysconfig.get_platform(), matching the filename
we already wanted.
Drop the now-unused wheel.bdist_wheel override block and register
the BinaryDistribution via setup(distclass=...).
Refs: NVBug 6098081, JIRA DLIS-8648, Linear TRI-983
Mirror the tritonfrontend setup.py fix: `_compose_version()` now consults three env sources for the +nv<X> local-version segment (NVIDIA_UPSTREAM_VERSION, NVIDIA_TRITON_SERVER_VERSION, TRITON_CONTAINER_VERSION) and prints the resolved inputs to stderr so any future gap in the env propagation chain is self-announcing in the wheel-build log rather than silently producing a wheel without the expected local-version suffix. NVIDIA_TRITON_SERVER_VERSION and TRITON_CONTAINER_VERSION are set as ENV in the buildbase image itself (via the TRITON_CONTAINER_VERSION ARG -> ENV wiring), so they survive even when the docker-run `-e NVIDIA_UPSTREAM_VERSION=<value>` forwarding does not reach the container (e.g. when FLAGS.upstream_container_version evaluates to empty on the host). Refs: Linear TRI-983
Mirror the tritonfrontend change in core/python/build_wheel.py: prefer CI_PIPELINE_ID over NVIDIA_BUILD_ID, falling back to BUILD_NUMBER. Filter the "<unknown>" default build.py emits for local builds without --build-id. Added stderr diagnostic. Refs: Linear TRI-983
…I-983)
pyproject.toml reads the wheel version from the TRITON_VERSION file via
`version = {file = "TRITON_VERSION"}`. The previous code copied the raw
source TRITON_VERSION (e.g. "2.69.0dev") into the wheel build root, so
the +nv…cu… local segment computed by _compose_version() was never
embedded in the wheel — it was set in a VERSION env var that
`python3 -m build` never reads.
Fix: compute the composed version before os.chdir and write it directly
to the wheel build root's TRITON_VERSION. The source-tree file is left
unchanged. Also remove the now-dead wenv["VERSION"] assignment.
There was a problem hiding this comment.
Pull request overview
This PR updates the Python packaging/build pipeline for tritonserver so produced wheels are correctly treated as platform-specific (not py3-none-any) and embed the proper Triton release version (instead of 0.0.0), with optional manylinux repair for PyPI distribution.
Changes:
- Mark the wheel as non-pure by using a custom
Distribution(has_ext_modules()=True) so setuptools emits an arch-specific platform tag. - Add wheel post-processing via
auditwheel repairand introduce composed versioning with a+nv…cu…local version segment. - Switch
pyproject.tomlto resolve the wheel version dynamically from aTRITON_VERSIONfile at build time.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
python/setup.py |
Forces setuptools to build a platform wheel by signaling the presence of extension modules. |
python/build_wheel.py |
Adds CUDA detection + version composition and runs auditwheel repair (with fallbacks) to produce manylinux wheels. |
pyproject.toml |
Uses setuptools dynamic versioning from TRITON_VERSION to avoid 0.0.0 wheels. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| arch = os.uname().machine | ||
| manylinux_tag = f"manylinux_2_28_{arch}" | ||
| print( | ||
| f"=== Pure-Python wheel detected; falling back to wheel tags " | ||
| f"({manylinux_tag})" | ||
| ) | ||
| copied = os.path.join(dest_dir, os.path.basename(wheel_path)) | ||
| shutil.copy(wheel_path, copied) | ||
| # `wheel tags --remove` replaces the linux_<arch> wheel in | ||
| # dest_dir with the correctly-tagged manylinux one. | ||
| r2 = subprocess.run( | ||
| [ | ||
| "python3", | ||
| "-m", | ||
| "wheel", | ||
| "tags", | ||
| "--platform-tag", | ||
| manylinux_tag, | ||
| "--remove", | ||
| copied, | ||
| ] | ||
| ) | ||
| fail_if(r2.returncode != 0, "wheel tags fallback failed") |
There was a problem hiding this comment.
This fall back tend to mitigate current behavior where using pure python tag (-any) for the wheels that has extension.
# Current behaviour:
Processing ./python/tritonserver-0.0.0-py3-none-any.whl (from tritonserver==0.0.0)
Processing ./python/tritonfrontend-2.69.0.dev0-py3-none-any.whl (from tritonfrontend==2.69.0.dev0)Hard to say what is the right way here, but we falling back on hardcoded manylinux_2_28 for those wheels as they have binary extension.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
whoisj
left a comment
There was a problem hiding this comment.
LGTM, but I have a couple of questions before I approve.
| arch = os.uname().machine | ||
| manylinux_tag = f"manylinux_2_28_{arch}" | ||
| print( | ||
| f"=== Pure-Python wheel detected; falling back to wheel tags " | ||
| f"({manylinux_tag})" | ||
| ) | ||
| copied = os.path.join(dest_dir, os.path.basename(wheel_path)) | ||
| shutil.copy(wheel_path, copied) | ||
| # `wheel tags --remove` replaces the linux_<arch> wheel in | ||
| # dest_dir with the correctly-tagged manylinux one. | ||
| r2 = subprocess.run( | ||
| [ | ||
| "python3", | ||
| "-m", | ||
| "wheel", | ||
| "tags", | ||
| "--platform-tag", | ||
| manylinux_tag, | ||
| "--remove", | ||
| copied, | ||
| ] | ||
| ) | ||
| fail_if(r2.returncode != 0, "wheel tags fallback failed") |
What does the PR do?
Fix
tritonserverPython wheel so it ships with a correct architecture-specificplatform tag instead of
py3-none-any, and with the proper Triton release versioninstead of the hard-coded
0.0.0.Changes across 8 commits:
Resolves: TRI-983core/python/setup.py— addBinaryDistribution.has_ext_modules()=Truesosetuptools emits a platlib wheel (
Root-Is-Purelib: false,cpXY-cpXY-linux_<arch>)rather than
py3-none-any.core/python/build_wheel.py— add_repair_wheel_with_auditwheel()to upgrade thewheel to
manylinux_2_X_<arch>; add_compose_version()to append a+nv{release}.cu{cudaXY}local-version segment; source build tag fromCI_PIPELINE_ID/NVIDIA_BUILD_ID/BUILD_NUMBER.core/pyproject.toml— dynamic version resolved from theTRITON_VERSIONfile atbuild time (fixes the
0.0.0regression).Related: triton-inference-server/server#8761
Where should the reviewer start?
core/python/build_wheel.py— the_repair_wheel_with_auditwheel()and_compose_version()functions;core/python/setup.py—BinaryDistribution.Test plan:
Build inside the SDK container (requires
auditwheelin the image — added viaserver#8743):
Caveats:
PyPI publishing CI jobs (
py3-tritonserver-publish) are a follow-up task.The container image continues to receive the
linux_<arch>wheel fromgeneric/wheel/dist/; themanylinuxwheel is written togeneric/for futurePyPI upload.
Related Issues: