Skip to content

[TRI-978][fix] Pass explicit streaming=False in base_metrics_verification_tests#6

Merged
mc-nv merged 8 commits into
r26.04from
mchornyi/TRI-978/fix-test
Apr 22, 2026
Merged

[TRI-978][fix] Pass explicit streaming=False in base_metrics_verification_tests#6
mc-nv merged 8 commits into
r26.04from
mchornyi/TRI-978/fix-test

Conversation

@mc-nv
Copy link
Copy Markdown
Collaborator

@mc-nv mc-nv commented Apr 20, 2026

Summary

Fixes streaming=None crash in L0_backend_trtllm--base introduced by TRT-LLM executor pybind change.

Changes

Commit 1fix(test): pass explicit streaming=False in base_metrics_verification_tests
Original fix: explicitly pass streaming=False to avoid None being passed.

Commit 2fix(trtllm): coerce streaming=None to False; fix ensemble input name in test

  • model.py: coerce streaming=None → False at the model layer (mirrors [TRI-966] [fix] Fix L0_backend_trtllm NVIDIA/TensorRT-LLM#13276)
  • base_metrics_verification_tests.py: rename tensor "streaming""stream" — the ensemble model exposes stream externally and maps it to streaming internally; passing streaming directly caused [400] unexpected inference input

Fixes

Resolves: TRI-978

Related PRs: triton-inference-server/tensorrtllm_backend#855

…_tests

tensorrt_llm.bindings.executor.Request.__init__() no longer accepts
streaming=None; the test was omitting the streaming tensor, causing
get_input_scalar_by_name() to return None and the executor to crash.

Relates-to: TRI-978
…in test

- model.py: guard against streaming=None from get_input_scalar_by_name
  by coercing to bool (None → False). Mirrors NVIDIA#13276.
- base_metrics_verification_tests.py: rename tensor "streaming" → "stream"
  to match the ensemble model's declared external input (ensemble maps
  "stream" → "streaming" internally; passing "streaming" directly caused
  a 400 rejection).

Relates-to: TRI-978
@mc-nv mc-nv marked this pull request as ready for review April 21, 2026 17:51
Comment thread triton_backend/ci/L0_backend_trtllm/base_metrics_verification_tests.py Outdated
mc-nv added 6 commits April 21, 2026 18:31
…ics_verification_tests

The assertion was split by a comma, making the upper-bound check
(difference <= 1s) a message arg rather than part of the condition.
Only the lower bound (-1s <= difference) was actually verified, causing
failures on B200 SBSA where the log timestamp precedes the metrics
timestamp by more than 1s.

Relates-to: TRI-978
…trics_verification_tests

The server writes log timestamps in local time, but dt_curl was computed
via utcfromtimestamp() (UTC). On B200 SBSA runners in UTC-8 this causes
an 8-hour difference, failing the ±1s tolerance check. Use fromtimestamp()
so both dt_log and dt_curl are in the same local timezone.

Relates-to: TRI-978
…x DST timezone offset

std::get_time does not set tm_isdst, leaving it at 0 (zero-initialized).
When mktime() is called on a runner in a DST-observing timezone, it treats
the parsed local time as non-DST time, producing a UTC timestamp that is
off by 1 hour. Setting tm_isdst=-1 lets mktime determine DST automatically,
matching the behavior of localtime() used in getCurrentTimestamp().
TRT-LLM executor now requires streaming to be an explicit bool.
Without it, model.py receives streaming=None causing a TypeError crash.
Use FLAGS.decoupled to determine the streaming value (True for decoupled,
False for standard synchronous inference).
@mc-nv mc-nv merged commit 2fdb9e6 into r26.04 Apr 22, 2026
3 checks passed
mc-nv added a commit that referenced this pull request May 1, 2026
…tion_tests (#6)

* fix(test): pass explicit streaming=False in base_metrics_verification_tests

tensorrt_llm.bindings.executor.Request.__init__() no longer accepts
streaming=None; the test was omitting the streaming tensor, causing
get_input_scalar_by_name() to return None and the executor to crash.

Relates-to: TRI-978

* fix(trtllm): coerce streaming=None to False; fix ensemble input name in test

- model.py: guard against streaming=None from get_input_scalar_by_name
  by coercing to bool (None → False). Mirrors NVIDIA#13276.
- base_metrics_verification_tests.py: rename tensor "streaming" → "stream"
  to match the ensemble model's declared external input (ensemble maps
  "stream" → "streaming" internally; passing "streaming" directly caused
  a 400 rejection).

Relates-to: TRI-978

* fix(test): fix malformed assertTrue chained comparison in custom_metrics_verification_tests

The assertion was split by a comma, making the upper-bound check
(difference <= 1s) a message arg rather than part of the condition.
Only the lower bound (-1s <= difference) was actually verified, causing
failures on B200 SBSA where the log timestamp precedes the metrics
timestamp by more than 1s.

Relates-to: TRI-978

* fix(test): use fromtimestamp instead of utcfromtimestamp in custom_metrics_verification_tests

The server writes log timestamps in local time, but dt_curl was computed
via utcfromtimestamp() (UTC). On B200 SBSA runners in UTC-8 this causes
an 8-hour difference, failing the ±1s tolerance check. Use fromtimestamp()
so both dt_log and dt_curl are in the same local timezone.

Relates-to: TRI-978

* chore: update copyright year to 2024-2026 in modified test files

Relates-to: TRI-978

* fix(metrics): set tm_isdst=-1 in convertTimestampToMicroseconds to fix DST timezone offset

std::get_time does not set tm_isdst, leaving it at 0 (zero-initialized).
When mktime() is called on a runner in a DST-observing timezone, it treats
the parsed local time as non-DST time, producing a UTC timestamp that is
off by 1 hour. Setting tm_isdst=-1 lets mktime determine DST automatically,
matching the behavior of localtime() used in getCurrentTimestamp().

* fix(bench): pass explicit streaming tensor in benchmark_core_model.py

TRT-LLM executor now requires streaming to be an explicit bool.
Without it, model.py receives streaming=None causing a TypeError crash.
Use FLAGS.decoupled to determine the streaming value (True for decoupled,
False for standard synchronous inference).

* fix(ci): switch benchmark_core_model to grpc to avoid gevent segfault on aarch64
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants