feat: add TRT-LLM engine integration by zhengluo-nv · Pull Request #280 · ai-dynamo/modelexpress

zhengluo-nv · 2026-05-14T17:33:54Z

Overview

Adds the ModelExpress-owned side of the TRT-LLM integration. This pairs with NVIDIA/TensorRT-LLM#14151, where TRT-LLM keeps only the minimal checkpoint-loader delegation surface.

TRT-LLM passes its initialized model and mapping into modelexpress.engines.trtllm.loader.MXCheckpointLoader. ModelExpress owns source discovery, identity construction, metadata lookup/publication, NIXL transfer, heartbeat, fallback behavior, and TRT-LLM-specific tensor discovery.

TRT-LLM does not have separate source and target launch modes in this path. Every server starts the same way; the first compatible server loads natively and publishes metadata, and later servers can receive weights from a ready source through the shared ModelExpress strategy chain.

What changed

TRT-LLM adapter and loader: adds modelexpress.engines.trtllm with TrtllmAdapter and MXCheckpointLoader. The loader runs through the shared LoadStrategyChain, matching the vLLM and SGLang integration shape.

Load-format naming: the canonical TRT-LLM load format is modelexpress, avoiding mx because that conflicts with MXFP4/MXFP8 terminology. vLLM native support also uses modelexpress; vLLM plugin registration keeps mx as a backward-compatible alias for older deployments.

Shared strategy behavior: adds small shared hooks needed by TRT-LLM, including selected-strategy tracking and a source-publish path that can run after engine post-load processing.

Post-load publication: TRT-LLM publishes only after its weight loading, post-load processing, and CUDA stream synchronization have completed. This avoids advertising tensor addresses/content before TRT-LLM has finished mutating the model, while still allowing a successful P2P receiver to become a source for later replicas.

Engine-neutral config cleanup: loading config is carried through LoadContext instead of re-reading environment variables after context construction. MX_MODEL_URI remains a gate for ModelStreamer, and the resolved model URI is frozen into LoadContext by each engine adapter.

Legacy cleanup: removes the old modelexpress.trtllm_live_transfer implementation and deletes the historical trtllm_patches/v1.3.0rc5 patch scripts.

TRT-LLM example image: keeps only the minimum image-build helpers under examples/p2p_transfer_k8s/client/trtllm: install the local ModelExpress client and fix NIXL runpath. Runtime TRT-LLM patching is intentionally not committed here; local test-only patches can live outside the repo.

Compatibility notes

Requires the TRT-LLM-side delegation hooks from NVIDIA/TensorRT-LLM#14151, or an equivalent TRT-LLM image with those hooks.
The Python package owns the TRT-LLM ModelExpress behavior; TRT-LLM should import the high-level ModelExpress loader instead of low-level MX internals.
The example Dockerfile now assumes TRTLLM_IMAGE already contains the TRT-LLM ModelExpress hooks; override TRTLLM_IMAGE with a local validation image until those hooks are available in a released TRT-LLM image.
The successful e2e run used TP=4 source and target co-located on one B200 node. It validates the loader, metadata, rank matching, RDMA transfer, receiver publication, and serving readiness; cross-node fabric scheduling remains a separate cluster-capacity validation.

Testing

Unit and local validation

Focused validation after rebasing onto current origin/main:

UV_CACHE_DIR=/private/tmp/uv-cache uv run pytest \
  tests/test_model_streamer_strategy.py \
  tests/test_vllm_adapter.py \
  tests/test_trtllm_loader.py

Result: 53 passed.

Additional local checks:

git diff --check
UV_CACHE_DIR=/private/tmp/uv-cache uv run python -m py_compile \
  modelexpress/load_strategy/context.py \
  modelexpress/load_strategy/model_streamer_strategy.py \
  modelexpress/engines/vllm/adapter.py \
  modelexpress/engines/sglang/adapter.py \
  modelexpress/engines/trtllm/adapter.py \
  modelexpress/engines/trtllm/loader.py \
  tests/test_model_streamer_strategy.py \
  tests/test_vllm_adapter.py \
  tests/test_trtllm_loader.py \
  ../../examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py
bash -n examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh
pre-commit run --files <touched files>

Result: passed.

TRT-LLM e2e

Validated direct trtllm-serve with nvidia/Kimi-K2.5-NVFP4 on the nscale B200 cluster.

Check	Result
Image	`nvcr.io/nvidian/dynamo-dev/model-express-dev-containers:trtllm-mx-1.3.0rc11-narrow-2992272-publishfix2-amd64`
Source	TP=4 source cold-loaded from disk and published 4 Ready ModelExpress metadata records
Target source discovery	TP=4 target found one rank-matched ready source worker per rank
Target weight path	RDMA P2P; no disk fallback lines observed
Transfer	2188 tensors / 151.11 GB per rank in about 5.55s
Throughput	about 217-218 Gbps per rank
Target metadata	target published 4 additional Ready metadata records after post-load publish
Serving	`/health` and `/v1/models` returned HTTP 200

Follow-ups

Build the example image from a TRT-LLM base that already includes the ModelExpress hooks once such an image is available.
Run a separate cross-node TRT-LLM e2e once clean B200 nodes with enough GPUs/RDMA are available.

codecov · 2026-05-14T17:41:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

coderabbitai · 2026-05-15T17:06:43Z

Walkthrough

This PR adds TensorRT-LLM engine support to ModelExpress using the shared load strategy framework. It extends LoadContext with metadata configuration fields, refactors strategies and adapters to use context instead of environment variables, implements a TrtllmAdapter and MXCheckpointLoader, and includes comprehensive tests and Docker build infrastructure.

Changes

TensorRT-LLM Integration

Layer / File(s)	Summary
LoadContext and type system extensions `modelexpress_client/python/modelexpress/load_strategy/context.py`	LoadContext dataclass extended with metadata_server_url, metadata_port, model_streamer_uri, and selected_strategy fields. Engine config type aliases updated to include TrtllmModelConfig and TrtllmLoadConfig.
Load strategy base utilities and publication `modelexpress_client/python/modelexpress/load_strategy/base.py`, `modelexpress_client/python/modelexpress/load_strategy/__init__.py`	Adds publish_loaded_model function that registers tensors, publishes metadata, and attaches LoadContext to model for runtime retention. Updates _metadata_publication_configured to check context instead of environment. Changes NIXL port derivation to ctx.metadata_port + ctx.device_id. Records selected_strategy in LoadStrategyChain.run.
Load strategy implementations updated for context-driven configuration `modelexpress_client/python/modelexpress/load_strategy/rdma_strategy.py`, `modelexpress_client/python/modelexpress/load_strategy/model_streamer_strategy.py`, `modelexpress_client/python/modelexpress/load_strategy/default_strategy.py`	RDMA strategy now uses ctx.metadata_server_url for server checks and polls source instances with optional timeout via _source_query_timeout_s. ModelStreamer strategy activates based on ctx.model_streamer_uri. Default strategy conditionally registers tensors only when model_for_publish present.
Metadata client factory helpers `modelexpress_client/python/modelexpress/metadata/client_factory.py`, `modelexpress_client/python/tests/test_k8s_service_client.py`	Adds resolve_metadata_server_url and resolve_metadata_port helpers for normalizing metadata configuration from explicit arguments or environment variables. Tests added for URL precedence and fallback behavior.
vLLM and sglang adapter metadata context updates `modelexpress_client/python/modelexpress/engines/vllm/adapter.py`, `modelexpress_client/python/modelexpress/engines/sglang/adapter.py`	Both adapters now resolve server_url via metadata helpers and populate LoadContext with metadata_server_url, metadata_port, and model_streamer_uri. Precompute model_streamer_distributed flag at init time.
TRT-LLM adapter and load context builder `modelexpress_client/python/modelexpress/engines/trtllm/__init__.py`, `modelexpress_client/python/modelexpress/engines/trtllm/adapter.py`	Introduces TrtllmAdapter with parallelism identity derivation, dtype/quantization resolution, CUDA parameter tensor collection with alias deduplication, and native loader fallback. Defines TrtllmModelConfig and TrtllmLoadConfig. build_trtllm_load_context constructs identity, resolves worker rank from TP/CP/PP mapping, wires metadata client and LoadContext.
TRT-LLM checkpoint loader with strategy chain integration `modelexpress_client/python/modelexpress/engines/trtllm/loader.py`	Implements MXCheckpointLoader subclassing HfCheckpointLoader. load_weights runs LoadStrategyChain when model provided, falls back to disk on failure, treats "rdma" strategy selection as successful P2P. Implements post_load_publish and publish_as_source for source publication. Custom logging mirrors to stderr and per-rank log files. Removes old trtllm_live_transfer.py module.
TRT-LLM loader and adapter integration tests `modelexpress_client/python/tests/test_trtllm_loader.py`	Comprehensive 801-line test suite covering adapter inheritance, native loader integration, RDMA receiver behavior, identity construction, load context creation, worker rank calculation, checkpoint loader strategy chain outcomes (RDMA/default/default-none), weight preservation, HF mapper initialization, post-load publication, publish-as-source reuse, timeout/retry behavior, and rank logging with file/stderr mirroring.
Load strategy and adapter test updates `modelexpress_client/python/tests/test_model_streamer_strategy.py`, `modelexpress_client/python/tests/test_vllm_loader.py`, `modelexpress_client/python/tests/test_sglang_loader.py`, `modelexpress_client/python/tests/test_vllm_adapter.py`	ModelStreamer tests refactored to use LoadContext.model_streamer_uri. RDMA/vLLM tests updated to set metadata_server_url on context instead of patching environment. Added test for source instance polling with timeout. Reorganized distributed-flag patching to wrap adapter construction.
Docker image and patch scripts for TRT-LLM testing `examples/p2p_transfer_k8s/client/trtllm/Dockerfile`, `examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py`, `examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh`, `examples/p2p_transfer_k8s/client/trtllm/patch_trtllm_mx_runtime.py`, `examples/p2p_transfer_k8s/client/trtllm/README.md`	Adds Dockerfile parameterizing TRT-LLM base, patching MX checkpoint loading/runtime, installing ModelExpress client, fixing NIXL runpath, and verifying patched components. Includes fix_nixl_runpath.py for RPATH patching/validation, install_modelexpress_client.sh for gRPC/NIXL setup, patch_trtllm_mx_runtime.py for TRT-LLM runtime patching. Simplifies README. Removes old Dockerfile.ph3-gcp-gb200, example Kubernetes manifests (kimi-disagg-mx-tp8-dgd.yaml, kimi-source-decode-dgd.yaml, mx-infra-decode.yaml), and trtllm_patches scripts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 trtllm hops in with fresh adapter grace,
load strategies flow through context's embrace,
metadata server finds its rightful place,
no more env vars to clutter the space—
rdma and streams now dance in sync,
docker and tests complete the link!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.47% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat: add TRT-LLM engine integration' accurately captures the main objective: introducing a comprehensive TensorRT-LLM engine adapter and loader integration into the ModelExpress framework.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

examples/p2p_transfer_k8s/client/trtllm/Dockerfile (1)

4-5: ⚡ Quick win

Pin the TRT-LLM base image to a fixed tag/digest.

Using :latest makes this image non-reproducible and can break patch assumptions as upstream changes. Prefer a fixed release tag (and ideally digest) as the default.

Suggested change

-ARG TRTLLM_IMAGE=nvcr.io/nvidia/tensorrt-llm/release:latest
+ARG TRTLLM_IMAGE=nvcr.io/nvidia/tensorrt-llm/release:<fixed-version>@sha256:<digest>

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/p2p_transfer_k8s/client/trtllm/Dockerfile` around lines 4 - 5, The
Dockerfile currently uses ARG TRTLLM_IMAGE with a :latest default which makes
builds non-reproducible; change ARG TRTLLM_IMAGE to a fixed release tag or
(preferably) an image digest (e.g., nvcr.io/...:vX.Y.Z or
nvcr.io/...@sha256:...) and keep FROM ${TRTLLM_IMAGE} so callers can still
override the ARG; update any README/build scripts to mention the pinned default
and how to override TRTLLM_IMAGE if needed.

examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh (1)

9-12: ⚡ Quick win

Pin grpcio and grpcio-tools to the same exact version.

Range-based constraints here can drift and produce non-reproducible builds or tooling/runtime skew for generated gRPC code.

Suggested change

 pip install --no-cache-dir \
-    "grpcio>=1.66.2" \
-    "grpcio-tools<=1.66.2" \
+    "grpcio==1.66.2" \
+    "grpcio-tools==1.66.2" \
     "protobuf>=5.27.0,<6.0.0"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh`
around lines 9 - 12, The pip install command currently uses range constraints
for grpcio and grpcio-tools which can drift; update the install line to pin both
packages to the same exact version (e.g., replace "grpcio>=1.66.2" and
"grpcio-tools<=1.66.2" with exact pins like "grpcio==1.66.2" and
"grpcio-tools==1.66.2") so generated gRPC code and runtime use the identical
grpc versions while leaving the protobuf constraint as-is.

examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py (1)

19-21: ⚡ Quick win

Avoid hardcoding Python 3.12 dist-packages for binding discovery.

This will fail if the base image Python/site-packages layout changes. Discover candidate site-packages dynamically, then glob under nixl_cu13.

Suggested change

 import glob
 import os
+import site
 import subprocess
@@
-    bindings = glob.glob("/usr/local/lib/python3.12/dist-packages/nixl_cu13/_bindings*.so")
+    bindings: list[str] = []
+    for pkg_dir in site.getsitepackages():
+        bindings.extend(glob.glob(f"{pkg_dir}/nixl_cu13/_bindings*.so"))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py` around lines 19
- 21, The code currently hardcodes "/usr/local/lib/python3.12/dist-packages"
when building the glob for bindings (variable bindings), which breaks on
different Python/site-packages layouts; change the discovery to iterate
site-packages locations obtained from site.getsitepackages(),
sysconfig.get_paths()["purelib"], and sys.path (filtering existing directories),
and then run glob.glob(os.path.join(site_pkg, "nixl_cu13", "_bindings*.so"))
across those locations, collecting matches into bindings; keep the existing
check on len(bindings) and nixl_lib_dir usage but replace the hardcoded path
construction with this dynamic site-packages discovery approach.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelexpress_client/python/modelexpress/load_strategy/base.py`:
- Around line 188-204: publish_loaded_model currently always registers tensors,
publishes metadata, and retains source runtime even when the incoming LoadResult
is marked publishable=False; modify publish_loaded_model so after calling
_as_load_result(result_or_model) you check result.publishable and return early
when it's False, avoiding calls to register_tensors, publish_metadata, and
_retain_source_runtime for non-publishable results (while still handling cases
where a raw nn.Module is passed by relying on _as_load_result semantics). Ensure
the check uses the LoadResult.publishable attribute and that the rest of the
function (register_tensors, publish_metadata, _retain_source_runtime) only runs
when publishable is True.

---

Nitpick comments:
In `@examples/p2p_transfer_k8s/client/trtllm/Dockerfile`:
- Around line 4-5: The Dockerfile currently uses ARG TRTLLM_IMAGE with a :latest
default which makes builds non-reproducible; change ARG TRTLLM_IMAGE to a fixed
release tag or (preferably) an image digest (e.g., nvcr.io/...:vX.Y.Z or
nvcr.io/...@sha256:...) and keep FROM ${TRTLLM_IMAGE} so callers can still
override the ARG; update any README/build scripts to mention the pinned default
and how to override TRTLLM_IMAGE if needed.

In `@examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py`:
- Around line 19-21: The code currently hardcodes
"/usr/local/lib/python3.12/dist-packages" when building the glob for bindings
(variable bindings), which breaks on different Python/site-packages layouts;
change the discovery to iterate site-packages locations obtained from
site.getsitepackages(), sysconfig.get_paths()["purelib"], and sys.path
(filtering existing directories), and then run glob.glob(os.path.join(site_pkg,
"nixl_cu13", "_bindings*.so")) across those locations, collecting matches into
bindings; keep the existing check on len(bindings) and nixl_lib_dir usage but
replace the hardcoded path construction with this dynamic site-packages
discovery approach.

In `@examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh`:
- Around line 9-12: The pip install command currently uses range constraints for
grpcio and grpcio-tools which can drift; update the install line to pin both
packages to the same exact version (e.g., replace "grpcio>=1.66.2" and
"grpcio-tools<=1.66.2" with exact pins like "grpcio==1.66.2" and
"grpcio-tools==1.66.2") so generated gRPC code and runtime use the identical
grpc versions while leaving the protobuf constraint as-is.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7ad1806e-5aac-4b8f-a1bd-3c803e3ff931

📥 Commits

Reviewing files that changed from the base of the PR and between 8820619 and 342970f.

📒 Files selected for processing (32)

examples/p2p_transfer_k8s/client/trtllm/Dockerfile
examples/p2p_transfer_k8s/client/trtllm/Dockerfile.ph3-gcp-gb200
examples/p2p_transfer_k8s/client/trtllm/README.md
examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py
examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh
examples/p2p_transfer_k8s/client/trtllm/kimi-disagg-mx-tp8-dgd.yaml
examples/p2p_transfer_k8s/client/trtllm/kimi-source-decode-dgd.yaml
examples/p2p_transfer_k8s/client/trtllm/mx-infra-decode.yaml
examples/p2p_transfer_k8s/client/trtllm/patch_trtllm_mx_runtime.py
modelexpress_client/python/modelexpress/engines/sglang/adapter.py
modelexpress_client/python/modelexpress/engines/trtllm/__init__.py
modelexpress_client/python/modelexpress/engines/trtllm/adapter.py
modelexpress_client/python/modelexpress/engines/trtllm/loader.py
modelexpress_client/python/modelexpress/engines/vllm/adapter.py
modelexpress_client/python/modelexpress/load_strategy/__init__.py
modelexpress_client/python/modelexpress/load_strategy/base.py
modelexpress_client/python/modelexpress/load_strategy/context.py
modelexpress_client/python/modelexpress/load_strategy/default_strategy.py
modelexpress_client/python/modelexpress/load_strategy/model_streamer_strategy.py
modelexpress_client/python/modelexpress/load_strategy/rdma_strategy.py
modelexpress_client/python/modelexpress/metadata/client_factory.py
modelexpress_client/python/modelexpress/trtllm_live_transfer.py
modelexpress_client/python/tests/test_k8s_service_client.py
modelexpress_client/python/tests/test_model_streamer_strategy.py
modelexpress_client/python/tests/test_sglang_loader.py
modelexpress_client/python/tests/test_trtllm_loader.py
modelexpress_client/python/tests/test_vllm_adapter.py
modelexpress_client/python/tests/test_vllm_loader.py
trtllm_patches/v1.3.0rc5/README.md
trtllm_patches/v1.3.0rc5/apply_patches.py
trtllm_patches/v1.3.0rc5/patch_model_loader.py
trtllm_patches/v1.3.0rc5/patch_tp_allgather.py

💤 Files with no reviewable changes (9)

trtllm_patches/v1.3.0rc5/README.md
examples/p2p_transfer_k8s/client/trtllm/Dockerfile.ph3-gcp-gb200
examples/p2p_transfer_k8s/client/trtllm/mx-infra-decode.yaml
trtllm_patches/v1.3.0rc5/patch_tp_allgather.py
trtllm_patches/v1.3.0rc5/apply_patches.py
examples/p2p_transfer_k8s/client/trtllm/kimi-source-decode-dgd.yaml
examples/p2p_transfer_k8s/client/trtllm/kimi-disagg-mx-tp8-dgd.yaml
modelexpress_client/python/modelexpress/trtllm_live_transfer.py
trtllm_patches/v1.3.0rc5/patch_model_loader.py

copy-pr-bot · 2026-05-15T17:13:08Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

zhengluo-nv · 2026-05-15T17:24:07Z

Addressed the remaining CodeRabbit nitpicks in ebc1971:

Pinned the temporary TRT-LLM validation image default from :latest to nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc11@sha256:d91c80ba8baf763782b1078267ed6b1e06363bebff4961094bf6e5679d371d04, while keeping TRTLLM_IMAGE override support. README now documents the pinned default and override.
Replaced hardcoded /usr/local/lib/python3.12/dist-packages NIXL binding discovery with dynamic discovery from site.getsitepackages(), sysconfig.get_paths()["purelib"], and sys.path.
Pinned grpcio==1.66.2 and grpcio-tools==1.66.2 in the temporary TRT-LLM image install helper, leaving the protobuf constraint unchanged.

Validation: tests/test_trtllm_loader.py passed with 23 tests, helper-script py_compile passed, git diff --check passed, and touched-file pre-commit passed.

Signed-off-by: Zheng Luo <zheluo@nvidia.com>

zhengluo-nv

Do not merge until we came to agreement with TRTLLM on the integration interface

pull-request-size Bot added the size/XXL label May 14, 2026

zhengluo-nv temporarily deployed to GITLAB May 14, 2026 17:34 — with GitHub Actions Inactive

github-actions Bot added the feat label May 14, 2026

zhengluo-nv marked this pull request as ready for review May 15, 2026 17:02

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread modelexpress_client/python/modelexpress/load_strategy/base.py Outdated

zhengluo-nv force-pushed the zheluo/engine-integration-trtllm branch from 342970f to a747b15 Compare May 15, 2026 17:13

zhengluo-nv had a problem deploying to GITLAB May 15, 2026 17:13 — with GitHub Actions Error

zhengluo-nv force-pushed the zheluo/engine-integration-trtllm branch from a747b15 to ebc1971 Compare May 15, 2026 17:17

zhengluo-nv had a problem deploying to GITLAB May 15, 2026 17:18 — with GitHub Actions Error

zhengluo-nv force-pushed the zheluo/engine-integration-trtllm branch from ebc1971 to 22c4fd9 Compare May 15, 2026 17:22

zhengluo-nv temporarily deployed to GITLAB May 15, 2026 17:22 — with GitHub Actions Inactive

zhengluo-nv self-assigned this May 18, 2026

nv-hwoo reviewed May 19, 2026

View reviewed changes

Comment thread modelexpress_client/python/modelexpress/load_strategy/model_streamer_strategy.py

Comment thread modelexpress_client/python/modelexpress/engines/trtllm/loader.py Outdated

Comment thread modelexpress_client/python/modelexpress/engines/trtllm/loader.py Outdated

zhengluo-nv force-pushed the zheluo/engine-integration-trtllm branch from 22c4fd9 to cd9635b Compare May 21, 2026 21:04

zhengluo-nv temporarily deployed to GITLAB May 21, 2026 21:04 — with GitHub Actions Inactive

feat: add TRT-LLM engine integration

21b91f6

Signed-off-by: Zheng Luo <zheluo@nvidia.com>

zhengluo-nv force-pushed the zheluo/engine-integration-trtllm branch from cd9635b to 21b91f6 Compare May 21, 2026 21:17

zhengluo-nv temporarily deployed to GITLAB May 21, 2026 21:17 — with GitHub Actions Inactive

nv-hwoo approved these changes May 21, 2026

View reviewed changes

zhengluo-nv commented May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add TRT-LLM engine integration#280

feat: add TRT-LLM engine integration#280
zhengluo-nv wants to merge 1 commit into
mainfrom
zheluo/engine-integration-trtllm

zhengluo-nv commented May 14, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

zhengluo-nv commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhengluo-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhengluo-nv commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What changed

Compatibility notes

Testing

Unit and local validation

TRT-LLM e2e

Follow-ups

Uh oh!

codecov Bot commented May 14, 2026

Codecov Report

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

copy-pr-bot Bot commented May 15, 2026

Uh oh!

zhengluo-nv commented May 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhengluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhengluo-nv commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading