[TRI-978][fix] Pass explicit streaming=False in base_metrics_verification_tests by mc-nv · Pull Request #6 · triton-inference-server/TensorRT-LLM

mc-nv · 2026-04-20T18:37:58Z

Summary

Fixes streaming=None crash in L0_backend_trtllm--base introduced by TRT-LLM executor pybind change.

Changes

Commit 1 — fix(test): pass explicit streaming=False in base_metrics_verification_tests
Original fix: explicitly pass streaming=False to avoid None being passed.

Commit 2 — fix(trtllm): coerce streaming=None to False; fix ensemble input name in test

model.py: coerce streaming=None → False at the model layer (mirrors [TRI-966] [fix] Fix L0_backend_trtllm NVIDIA/TensorRT-LLM#13276)
base_metrics_verification_tests.py: rename tensor "streaming" → "stream" — the ensemble model exposes stream externally and maps it to streaming internally; passing streaming directly caused [400] unexpected inference input

Fixes

Resolves: TRI-978

^{Related PRs: triton-inference-server/tensorrtllm_backend#855}

…_tests tensorrt_llm.bindings.executor.Request.__init__() no longer accepts streaming=None; the test was omitting the streaming tensor, causing get_input_scalar_by_name() to return None and the executor to crash. Relates-to: TRI-978

…in test - model.py: guard against streaming=None from get_input_scalar_by_name by coercing to bool (None → False). Mirrors NVIDIA#13276. - base_metrics_verification_tests.py: rename tensor "streaming" → "stream" to match the ensemble model's declared external input (ensemble maps "stream" → "streaming" internally; passing "streaming" directly caused a 400 rejection). Relates-to: TRI-978

…ics_verification_tests The assertion was split by a comma, making the upper-bound check (difference <= 1s) a message arg rather than part of the condition. Only the lower bound (-1s <= difference) was actually verified, causing failures on B200 SBSA where the log timestamp precedes the metrics timestamp by more than 1s. Relates-to: TRI-978

…trics_verification_tests The server writes log timestamps in local time, but dt_curl was computed via utcfromtimestamp() (UTC). On B200 SBSA runners in UTC-8 this causes an 8-hour difference, failing the ±1s tolerance check. Use fromtimestamp() so both dt_log and dt_curl are in the same local timezone. Relates-to: TRI-978

Relates-to: TRI-978

…x DST timezone offset std::get_time does not set tm_isdst, leaving it at 0 (zero-initialized). When mktime() is called on a runner in a DST-observing timezone, it treats the parsed local time as non-DST time, producing a UTC timestamp that is off by 1 hour. Setting tm_isdst=-1 lets mktime determine DST automatically, matching the behavior of localtime() used in getCurrentTimestamp().

TRT-LLM executor now requires streaming to be an explicit bool. Without it, model.py receives streaming=None causing a TypeError crash. Use FLAGS.decoupled to determine the streaming value (True for decoupled, False for standard synchronous inference).

… on aarch64

…tion_tests (#6) * fix(test): pass explicit streaming=False in base_metrics_verification_tests tensorrt_llm.bindings.executor.Request.__init__() no longer accepts streaming=None; the test was omitting the streaming tensor, causing get_input_scalar_by_name() to return None and the executor to crash. Relates-to: TRI-978 * fix(trtllm): coerce streaming=None to False; fix ensemble input name in test - model.py: guard against streaming=None from get_input_scalar_by_name by coercing to bool (None → False). Mirrors NVIDIA#13276. - base_metrics_verification_tests.py: rename tensor "streaming" → "stream" to match the ensemble model's declared external input (ensemble maps "stream" → "streaming" internally; passing "streaming" directly caused a 400 rejection). Relates-to: TRI-978 * fix(test): fix malformed assertTrue chained comparison in custom_metrics_verification_tests The assertion was split by a comma, making the upper-bound check (difference <= 1s) a message arg rather than part of the condition. Only the lower bound (-1s <= difference) was actually verified, causing failures on B200 SBSA where the log timestamp precedes the metrics timestamp by more than 1s. Relates-to: TRI-978 * fix(test): use fromtimestamp instead of utcfromtimestamp in custom_metrics_verification_tests The server writes log timestamps in local time, but dt_curl was computed via utcfromtimestamp() (UTC). On B200 SBSA runners in UTC-8 this causes an 8-hour difference, failing the ±1s tolerance check. Use fromtimestamp() so both dt_log and dt_curl are in the same local timezone. Relates-to: TRI-978 * chore: update copyright year to 2024-2026 in modified test files Relates-to: TRI-978 * fix(metrics): set tm_isdst=-1 in convertTimestampToMicroseconds to fix DST timezone offset std::get_time does not set tm_isdst, leaving it at 0 (zero-initialized). When mktime() is called on a runner in a DST-observing timezone, it treats the parsed local time as non-DST time, producing a UTC timestamp that is off by 1 hour. Setting tm_isdst=-1 lets mktime determine DST automatically, matching the behavior of localtime() used in getCurrentTimestamp(). * fix(bench): pass explicit streaming tensor in benchmark_core_model.py TRT-LLM executor now requires streaming to be an explicit bool. Without it, model.py receives streaming=None causing a TypeError crash. Use FLAGS.decoupled to determine the streaming value (True for decoupled, False for standard synchronous inference). * fix(ci): switch benchmark_core_model to grpc to avoid gevent segfault on aarch64

github-actions Bot assigned mc-nv Apr 20, 2026

mc-nv mentioned this pull request Apr 20, 2026

[TRI-978][fix] Bump tensorrt_llm submodule to fix streaming=None executor crash triton-inference-server/tensorrtllm_backend#855

Merged

mc-nv requested review from mudit-eng, pskiran1, whoisj and yinggeh April 21, 2026 17:51

mc-nv marked this pull request as ready for review April 21, 2026 17:51

yinggeh approved these changes Apr 21, 2026

View reviewed changes

yinggeh reviewed Apr 21, 2026

View reviewed changes

Comment thread triton_backend/ci/L0_backend_trtllm/base_metrics_verification_tests.py Outdated

mc-nv added 6 commits April 21, 2026 18:31

chore: update copyright year to 2024-2026 in modified test files

e2c2daf

Relates-to: TRI-978

fix(ci): switch benchmark_core_model to grpc to avoid gevent segfault…

a373e77

… on aarch64

mc-nv merged commit 2fdb9e6 into r26.04 Apr 22, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRI-978][fix] Pass explicit streaming=False in base_metrics_verification_tests#6

[TRI-978][fix] Pass explicit streaming=False in base_metrics_verification_tests#6
mc-nv merged 8 commits into
r26.04from
mchornyi/TRI-978/fix-test

mc-nv commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mc-nv commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Fixes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mc-nv commented Apr 20, 2026 •

edited

Loading