Skip to content

feat: add TRT-LLM engine integration#280

Open
zhengluo-nv wants to merge 1 commit into
mainfrom
zheluo/engine-integration-trtllm
Open

feat: add TRT-LLM engine integration#280
zhengluo-nv wants to merge 1 commit into
mainfrom
zheluo/engine-integration-trtllm

Conversation

@zhengluo-nv
Copy link
Copy Markdown
Contributor

@zhengluo-nv zhengluo-nv commented May 14, 2026

Overview

Adds the ModelExpress-owned side of the TRT-LLM integration. This pairs with NVIDIA/TensorRT-LLM#14151, where TRT-LLM keeps only the minimal checkpoint-loader delegation surface.

TRT-LLM passes its initialized model and mapping into modelexpress.engines.trtllm.loader.MXCheckpointLoader. ModelExpress owns source discovery, identity construction, metadata lookup/publication, NIXL transfer, heartbeat, fallback behavior, and TRT-LLM-specific tensor discovery.

TRT-LLM does not have separate source and target launch modes in this path. Every server starts the same way; the first compatible server loads natively and publishes metadata, and later servers can receive weights from a ready source through the shared ModelExpress strategy chain.

What changed

TRT-LLM adapter and loader: adds modelexpress.engines.trtllm with TrtllmAdapter and MXCheckpointLoader. The loader runs through the shared LoadStrategyChain, matching the vLLM and SGLang integration shape.

Load-format naming: the canonical TRT-LLM load format is modelexpress, avoiding mx because that conflicts with MXFP4/MXFP8 terminology. vLLM native support also uses modelexpress; vLLM plugin registration keeps mx as a backward-compatible alias for older deployments.

Shared strategy behavior: adds small shared hooks needed by TRT-LLM, including selected-strategy tracking and a source-publish path that can run after engine post-load processing.

Post-load publication: TRT-LLM publishes only after its weight loading, post-load processing, and CUDA stream synchronization have completed. This avoids advertising tensor addresses/content before TRT-LLM has finished mutating the model, while still allowing a successful P2P receiver to become a source for later replicas.

Engine-neutral config cleanup: loading config is carried through LoadContext instead of re-reading environment variables after context construction. MX_MODEL_URI remains a gate for ModelStreamer, and the resolved model URI is frozen into LoadContext by each engine adapter.

Legacy cleanup: removes the old modelexpress.trtllm_live_transfer implementation and deletes the historical trtllm_patches/v1.3.0rc5 patch scripts.

TRT-LLM example image: keeps only the minimum image-build helpers under examples/p2p_transfer_k8s/client/trtllm: install the local ModelExpress client and fix NIXL runpath. Runtime TRT-LLM patching is intentionally not committed here; local test-only patches can live outside the repo.

Compatibility notes

  • Requires the TRT-LLM-side delegation hooks from NVIDIA/TensorRT-LLM#14151, or an equivalent TRT-LLM image with those hooks.
  • The Python package owns the TRT-LLM ModelExpress behavior; TRT-LLM should import the high-level ModelExpress loader instead of low-level MX internals.
  • The example Dockerfile now assumes TRTLLM_IMAGE already contains the TRT-LLM ModelExpress hooks; override TRTLLM_IMAGE with a local validation image until those hooks are available in a released TRT-LLM image.
  • The successful e2e run used TP=4 source and target co-located on one B200 node. It validates the loader, metadata, rank matching, RDMA transfer, receiver publication, and serving readiness; cross-node fabric scheduling remains a separate cluster-capacity validation.

Testing

Unit and local validation

Focused validation after rebasing onto current origin/main:

UV_CACHE_DIR=/private/tmp/uv-cache uv run pytest \
  tests/test_model_streamer_strategy.py \
  tests/test_vllm_adapter.py \
  tests/test_trtllm_loader.py

Result: 53 passed.

Additional local checks:

git diff --check
UV_CACHE_DIR=/private/tmp/uv-cache uv run python -m py_compile \
  modelexpress/load_strategy/context.py \
  modelexpress/load_strategy/model_streamer_strategy.py \
  modelexpress/engines/vllm/adapter.py \
  modelexpress/engines/sglang/adapter.py \
  modelexpress/engines/trtllm/adapter.py \
  modelexpress/engines/trtllm/loader.py \
  tests/test_model_streamer_strategy.py \
  tests/test_vllm_adapter.py \
  tests/test_trtllm_loader.py \
  ../../examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py
bash -n examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh
pre-commit run --files <touched files>

Result: passed.

TRT-LLM e2e

Validated direct trtllm-serve with nvidia/Kimi-K2.5-NVFP4 on the nscale B200 cluster.

Check Result
Image nvcr.io/nvidian/dynamo-dev/model-express-dev-containers:trtllm-mx-1.3.0rc11-narrow-2992272-publishfix2-amd64
Source TP=4 source cold-loaded from disk and published 4 Ready ModelExpress metadata records
Target source discovery TP=4 target found one rank-matched ready source worker per rank
Target weight path RDMA P2P; no disk fallback lines observed
Transfer 2188 tensors / 151.11 GB per rank in about 5.55s
Throughput about 217-218 Gbps per rank
Target metadata target published 4 additional Ready metadata records after post-load publish
Serving /health and /v1/models returned HTTP 200

Follow-ups

  • Build the example image from a TRT-LLM base that already includes the ModelExpress hooks once such an image is available.
  • Run a separate cross-node TRT-LLM e2e once clean B200 nodes with enough GPUs/RDMA are available.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@zhengluo-nv zhengluo-nv marked this pull request as ready for review May 15, 2026 17:02
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

Walkthrough

This PR adds TensorRT-LLM engine support to ModelExpress using the shared load strategy framework. It extends LoadContext with metadata configuration fields, refactors strategies and adapters to use context instead of environment variables, implements a TrtllmAdapter and MXCheckpointLoader, and includes comprehensive tests and Docker build infrastructure.

Changes

TensorRT-LLM Integration

Layer / File(s) Summary
LoadContext and type system extensions
modelexpress_client/python/modelexpress/load_strategy/context.py
LoadContext dataclass extended with metadata_server_url, metadata_port, model_streamer_uri, and selected_strategy fields. Engine config type aliases updated to include TrtllmModelConfig and TrtllmLoadConfig.
Load strategy base utilities and publication
modelexpress_client/python/modelexpress/load_strategy/base.py, modelexpress_client/python/modelexpress/load_strategy/__init__.py
Adds publish_loaded_model function that registers tensors, publishes metadata, and attaches LoadContext to model for runtime retention. Updates _metadata_publication_configured to check context instead of environment. Changes NIXL port derivation to ctx.metadata_port + ctx.device_id. Records selected_strategy in LoadStrategyChain.run.
Load strategy implementations updated for context-driven configuration
modelexpress_client/python/modelexpress/load_strategy/rdma_strategy.py, modelexpress_client/python/modelexpress/load_strategy/model_streamer_strategy.py, modelexpress_client/python/modelexpress/load_strategy/default_strategy.py
RDMA strategy now uses ctx.metadata_server_url for server checks and polls source instances with optional timeout via _source_query_timeout_s. ModelStreamer strategy activates based on ctx.model_streamer_uri. Default strategy conditionally registers tensors only when model_for_publish present.
Metadata client factory helpers
modelexpress_client/python/modelexpress/metadata/client_factory.py, modelexpress_client/python/tests/test_k8s_service_client.py
Adds resolve_metadata_server_url and resolve_metadata_port helpers for normalizing metadata configuration from explicit arguments or environment variables. Tests added for URL precedence and fallback behavior.
vLLM and sglang adapter metadata context updates
modelexpress_client/python/modelexpress/engines/vllm/adapter.py, modelexpress_client/python/modelexpress/engines/sglang/adapter.py
Both adapters now resolve server_url via metadata helpers and populate LoadContext with metadata_server_url, metadata_port, and model_streamer_uri. Precompute model_streamer_distributed flag at init time.
TRT-LLM adapter and load context builder
modelexpress_client/python/modelexpress/engines/trtllm/__init__.py, modelexpress_client/python/modelexpress/engines/trtllm/adapter.py
Introduces TrtllmAdapter with parallelism identity derivation, dtype/quantization resolution, CUDA parameter tensor collection with alias deduplication, and native loader fallback. Defines TrtllmModelConfig and TrtllmLoadConfig. build_trtllm_load_context constructs identity, resolves worker rank from TP/CP/PP mapping, wires metadata client and LoadContext.
TRT-LLM checkpoint loader with strategy chain integration
modelexpress_client/python/modelexpress/engines/trtllm/loader.py
Implements MXCheckpointLoader subclassing HfCheckpointLoader. load_weights runs LoadStrategyChain when model provided, falls back to disk on failure, treats "rdma" strategy selection as successful P2P. Implements post_load_publish and publish_as_source for source publication. Custom logging mirrors to stderr and per-rank log files. Removes old trtllm_live_transfer.py module.
TRT-LLM loader and adapter integration tests
modelexpress_client/python/tests/test_trtllm_loader.py
Comprehensive 801-line test suite covering adapter inheritance, native loader integration, RDMA receiver behavior, identity construction, load context creation, worker rank calculation, checkpoint loader strategy chain outcomes (RDMA/default/default-none), weight preservation, HF mapper initialization, post-load publication, publish-as-source reuse, timeout/retry behavior, and rank logging with file/stderr mirroring.
Load strategy and adapter test updates
modelexpress_client/python/tests/test_model_streamer_strategy.py, modelexpress_client/python/tests/test_vllm_loader.py, modelexpress_client/python/tests/test_sglang_loader.py, modelexpress_client/python/tests/test_vllm_adapter.py
ModelStreamer tests refactored to use LoadContext.model_streamer_uri. RDMA/vLLM tests updated to set metadata_server_url on context instead of patching environment. Added test for source instance polling with timeout. Reorganized distributed-flag patching to wrap adapter construction.
Docker image and patch scripts for TRT-LLM testing
examples/p2p_transfer_k8s/client/trtllm/Dockerfile, examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py, examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh, examples/p2p_transfer_k8s/client/trtllm/patch_trtllm_mx_runtime.py, examples/p2p_transfer_k8s/client/trtllm/README.md
Adds Dockerfile parameterizing TRT-LLM base, patching MX checkpoint loading/runtime, installing ModelExpress client, fixing NIXL runpath, and verifying patched components. Includes fix_nixl_runpath.py for RPATH patching/validation, install_modelexpress_client.sh for gRPC/NIXL setup, patch_trtllm_mx_runtime.py for TRT-LLM runtime patching. Simplifies README. Removes old Dockerfile.ph3-gcp-gb200, example Kubernetes manifests (kimi-disagg-mx-tp8-dgd.yaml, kimi-source-decode-dgd.yaml, mx-infra-decode.yaml), and trtllm_patches scripts.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes


Poem

🐰 trtllm hops in with fresh adapter grace,
load strategies flow through context's embrace,
metadata server finds its rightful place,
no more env vars to clutter the space—
rdma and streams now dance in sync,
docker and tests complete the link!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: add TRT-LLM engine integration' accurately captures the main objective: introducing a comprehensive TensorRT-LLM engine adapter and loader integration into the ModelExpress framework.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
examples/p2p_transfer_k8s/client/trtllm/Dockerfile (1)

4-5: ⚡ Quick win

Pin the TRT-LLM base image to a fixed tag/digest.

Using :latest makes this image non-reproducible and can break patch assumptions as upstream changes. Prefer a fixed release tag (and ideally digest) as the default.

Suggested change
-ARG TRTLLM_IMAGE=nvcr.io/nvidia/tensorrt-llm/release:latest
+ARG TRTLLM_IMAGE=nvcr.io/nvidia/tensorrt-llm/release:<fixed-version>@sha256:<digest>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/p2p_transfer_k8s/client/trtllm/Dockerfile` around lines 4 - 5, The
Dockerfile currently uses ARG TRTLLM_IMAGE with a :latest default which makes
builds non-reproducible; change ARG TRTLLM_IMAGE to a fixed release tag or
(preferably) an image digest (e.g., nvcr.io/...:vX.Y.Z or
nvcr.io/...@sha256:...) and keep FROM ${TRTLLM_IMAGE} so callers can still
override the ARG; update any README/build scripts to mention the pinned default
and how to override TRTLLM_IMAGE if needed.
examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh (1)

9-12: ⚡ Quick win

Pin grpcio and grpcio-tools to the same exact version.

Range-based constraints here can drift and produce non-reproducible builds or tooling/runtime skew for generated gRPC code.

Suggested change
 pip install --no-cache-dir \
-    "grpcio>=1.66.2" \
-    "grpcio-tools<=1.66.2" \
+    "grpcio==1.66.2" \
+    "grpcio-tools==1.66.2" \
     "protobuf>=5.27.0,<6.0.0"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh`
around lines 9 - 12, The pip install command currently uses range constraints
for grpcio and grpcio-tools which can drift; update the install line to pin both
packages to the same exact version (e.g., replace "grpcio>=1.66.2" and
"grpcio-tools<=1.66.2" with exact pins like "grpcio==1.66.2" and
"grpcio-tools==1.66.2") so generated gRPC code and runtime use the identical
grpc versions while leaving the protobuf constraint as-is.
examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py (1)

19-21: ⚡ Quick win

Avoid hardcoding Python 3.12 dist-packages for binding discovery.

This will fail if the base image Python/site-packages layout changes. Discover candidate site-packages dynamically, then glob under nixl_cu13.

Suggested change
 import glob
 import os
+import site
 import subprocess
@@
-    bindings = glob.glob("/usr/local/lib/python3.12/dist-packages/nixl_cu13/_bindings*.so")
+    bindings: list[str] = []
+    for pkg_dir in site.getsitepackages():
+        bindings.extend(glob.glob(f"{pkg_dir}/nixl_cu13/_bindings*.so"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py` around lines 19
- 21, The code currently hardcodes "/usr/local/lib/python3.12/dist-packages"
when building the glob for bindings (variable bindings), which breaks on
different Python/site-packages layouts; change the discovery to iterate
site-packages locations obtained from site.getsitepackages(),
sysconfig.get_paths()["purelib"], and sys.path (filtering existing directories),
and then run glob.glob(os.path.join(site_pkg, "nixl_cu13", "_bindings*.so"))
across those locations, collecting matches into bindings; keep the existing
check on len(bindings) and nixl_lib_dir usage but replace the hardcoded path
construction with this dynamic site-packages discovery approach.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelexpress_client/python/modelexpress/load_strategy/base.py`:
- Around line 188-204: publish_loaded_model currently always registers tensors,
publishes metadata, and retains source runtime even when the incoming LoadResult
is marked publishable=False; modify publish_loaded_model so after calling
_as_load_result(result_or_model) you check result.publishable and return early
when it's False, avoiding calls to register_tensors, publish_metadata, and
_retain_source_runtime for non-publishable results (while still handling cases
where a raw nn.Module is passed by relying on _as_load_result semantics). Ensure
the check uses the LoadResult.publishable attribute and that the rest of the
function (register_tensors, publish_metadata, _retain_source_runtime) only runs
when publishable is True.

---

Nitpick comments:
In `@examples/p2p_transfer_k8s/client/trtllm/Dockerfile`:
- Around line 4-5: The Dockerfile currently uses ARG TRTLLM_IMAGE with a :latest
default which makes builds non-reproducible; change ARG TRTLLM_IMAGE to a fixed
release tag or (preferably) an image digest (e.g., nvcr.io/...:vX.Y.Z or
nvcr.io/...@sha256:...) and keep FROM ${TRTLLM_IMAGE} so callers can still
override the ARG; update any README/build scripts to mention the pinned default
and how to override TRTLLM_IMAGE if needed.

In `@examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py`:
- Around line 19-21: The code currently hardcodes
"/usr/local/lib/python3.12/dist-packages" when building the glob for bindings
(variable bindings), which breaks on different Python/site-packages layouts;
change the discovery to iterate site-packages locations obtained from
site.getsitepackages(), sysconfig.get_paths()["purelib"], and sys.path
(filtering existing directories), and then run glob.glob(os.path.join(site_pkg,
"nixl_cu13", "_bindings*.so")) across those locations, collecting matches into
bindings; keep the existing check on len(bindings) and nixl_lib_dir usage but
replace the hardcoded path construction with this dynamic site-packages
discovery approach.

In `@examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh`:
- Around line 9-12: The pip install command currently uses range constraints for
grpcio and grpcio-tools which can drift; update the install line to pin both
packages to the same exact version (e.g., replace "grpcio>=1.66.2" and
"grpcio-tools<=1.66.2" with exact pins like "grpcio==1.66.2" and
"grpcio-tools==1.66.2") so generated gRPC code and runtime use the identical
grpc versions while leaving the protobuf constraint as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7ad1806e-5aac-4b8f-a1bd-3c803e3ff931

📥 Commits

Reviewing files that changed from the base of the PR and between 8820619 and 342970f.

📒 Files selected for processing (32)
  • examples/p2p_transfer_k8s/client/trtllm/Dockerfile
  • examples/p2p_transfer_k8s/client/trtllm/Dockerfile.ph3-gcp-gb200
  • examples/p2p_transfer_k8s/client/trtllm/README.md
  • examples/p2p_transfer_k8s/client/trtllm/fix_nixl_runpath.py
  • examples/p2p_transfer_k8s/client/trtllm/install_modelexpress_client.sh
  • examples/p2p_transfer_k8s/client/trtllm/kimi-disagg-mx-tp8-dgd.yaml
  • examples/p2p_transfer_k8s/client/trtllm/kimi-source-decode-dgd.yaml
  • examples/p2p_transfer_k8s/client/trtllm/mx-infra-decode.yaml
  • examples/p2p_transfer_k8s/client/trtllm/patch_trtllm_mx_runtime.py
  • modelexpress_client/python/modelexpress/engines/sglang/adapter.py
  • modelexpress_client/python/modelexpress/engines/trtllm/__init__.py
  • modelexpress_client/python/modelexpress/engines/trtllm/adapter.py
  • modelexpress_client/python/modelexpress/engines/trtllm/loader.py
  • modelexpress_client/python/modelexpress/engines/vllm/adapter.py
  • modelexpress_client/python/modelexpress/load_strategy/__init__.py
  • modelexpress_client/python/modelexpress/load_strategy/base.py
  • modelexpress_client/python/modelexpress/load_strategy/context.py
  • modelexpress_client/python/modelexpress/load_strategy/default_strategy.py
  • modelexpress_client/python/modelexpress/load_strategy/model_streamer_strategy.py
  • modelexpress_client/python/modelexpress/load_strategy/rdma_strategy.py
  • modelexpress_client/python/modelexpress/metadata/client_factory.py
  • modelexpress_client/python/modelexpress/trtllm_live_transfer.py
  • modelexpress_client/python/tests/test_k8s_service_client.py
  • modelexpress_client/python/tests/test_model_streamer_strategy.py
  • modelexpress_client/python/tests/test_sglang_loader.py
  • modelexpress_client/python/tests/test_trtllm_loader.py
  • modelexpress_client/python/tests/test_vllm_adapter.py
  • modelexpress_client/python/tests/test_vllm_loader.py
  • trtllm_patches/v1.3.0rc5/README.md
  • trtllm_patches/v1.3.0rc5/apply_patches.py
  • trtllm_patches/v1.3.0rc5/patch_model_loader.py
  • trtllm_patches/v1.3.0rc5/patch_tp_allgather.py
💤 Files with no reviewable changes (9)
  • trtllm_patches/v1.3.0rc5/README.md
  • examples/p2p_transfer_k8s/client/trtllm/Dockerfile.ph3-gcp-gb200
  • examples/p2p_transfer_k8s/client/trtllm/mx-infra-decode.yaml
  • trtllm_patches/v1.3.0rc5/patch_tp_allgather.py
  • trtllm_patches/v1.3.0rc5/apply_patches.py
  • examples/p2p_transfer_k8s/client/trtllm/kimi-source-decode-dgd.yaml
  • examples/p2p_transfer_k8s/client/trtllm/kimi-disagg-mx-tp8-dgd.yaml
  • modelexpress_client/python/modelexpress/trtllm_live_transfer.py
  • trtllm_patches/v1.3.0rc5/patch_model_loader.py

Comment thread modelexpress_client/python/modelexpress/load_strategy/base.py Outdated
@zhengluo-nv zhengluo-nv force-pushed the zheluo/engine-integration-trtllm branch from 342970f to a747b15 Compare May 15, 2026 17:13
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@zhengluo-nv
Copy link
Copy Markdown
Contributor Author

Addressed the remaining CodeRabbit nitpicks in ebc1971:

  • Pinned the temporary TRT-LLM validation image default from :latest to nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc11@sha256:d91c80ba8baf763782b1078267ed6b1e06363bebff4961094bf6e5679d371d04, while keeping TRTLLM_IMAGE override support. README now documents the pinned default and override.
  • Replaced hardcoded /usr/local/lib/python3.12/dist-packages NIXL binding discovery with dynamic discovery from site.getsitepackages(), sysconfig.get_paths()["purelib"], and sys.path.
  • Pinned grpcio==1.66.2 and grpcio-tools==1.66.2 in the temporary TRT-LLM image install helper, leaving the protobuf constraint unchanged.

Validation: tests/test_trtllm_loader.py passed with 23 tests, helper-script py_compile passed, git diff --check passed, and touched-file pre-commit passed.

@zhengluo-nv zhengluo-nv self-assigned this May 18, 2026
Comment thread modelexpress_client/python/modelexpress/engines/trtllm/loader.py Outdated
Comment thread modelexpress_client/python/modelexpress/engines/trtllm/loader.py Outdated
@zhengluo-nv zhengluo-nv force-pushed the zheluo/engine-integration-trtllm branch from 22c4fd9 to cd9635b Compare May 21, 2026 21:04
Signed-off-by: Zheng Luo <zheluo@nvidia.com>
@zhengluo-nv zhengluo-nv force-pushed the zheluo/engine-integration-trtllm branch from cd9635b to 21b91f6 Compare May 21, 2026 21:17
Copy link
Copy Markdown
Contributor Author

@zhengluo-nv zhengluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not merge until we came to agreement with TRTLLM on the integration interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants