[TRI-978][fix] Pass explicit streaming=False in base_metrics_verification_tests#6
Merged
Conversation
…_tests tensorrt_llm.bindings.executor.Request.__init__() no longer accepts streaming=None; the test was omitting the streaming tensor, causing get_input_scalar_by_name() to return None and the executor to crash. Relates-to: TRI-978
…in test - model.py: guard against streaming=None from get_input_scalar_by_name by coercing to bool (None → False). Mirrors NVIDIA#13276. - base_metrics_verification_tests.py: rename tensor "streaming" → "stream" to match the ensemble model's declared external input (ensemble maps "stream" → "streaming" internally; passing "streaming" directly caused a 400 rejection). Relates-to: TRI-978
yinggeh
approved these changes
Apr 21, 2026
yinggeh
reviewed
Apr 21, 2026
…ics_verification_tests The assertion was split by a comma, making the upper-bound check (difference <= 1s) a message arg rather than part of the condition. Only the lower bound (-1s <= difference) was actually verified, causing failures on B200 SBSA where the log timestamp precedes the metrics timestamp by more than 1s. Relates-to: TRI-978
…trics_verification_tests The server writes log timestamps in local time, but dt_curl was computed via utcfromtimestamp() (UTC). On B200 SBSA runners in UTC-8 this causes an 8-hour difference, failing the ±1s tolerance check. Use fromtimestamp() so both dt_log and dt_curl are in the same local timezone. Relates-to: TRI-978
Relates-to: TRI-978
…x DST timezone offset std::get_time does not set tm_isdst, leaving it at 0 (zero-initialized). When mktime() is called on a runner in a DST-observing timezone, it treats the parsed local time as non-DST time, producing a UTC timestamp that is off by 1 hour. Setting tm_isdst=-1 lets mktime determine DST automatically, matching the behavior of localtime() used in getCurrentTimestamp().
TRT-LLM executor now requires streaming to be an explicit bool. Without it, model.py receives streaming=None causing a TypeError crash. Use FLAGS.decoupled to determine the streaming value (True for decoupled, False for standard synchronous inference).
mc-nv
added a commit
that referenced
this pull request
May 1, 2026
…tion_tests (#6) * fix(test): pass explicit streaming=False in base_metrics_verification_tests tensorrt_llm.bindings.executor.Request.__init__() no longer accepts streaming=None; the test was omitting the streaming tensor, causing get_input_scalar_by_name() to return None and the executor to crash. Relates-to: TRI-978 * fix(trtllm): coerce streaming=None to False; fix ensemble input name in test - model.py: guard against streaming=None from get_input_scalar_by_name by coercing to bool (None → False). Mirrors NVIDIA#13276. - base_metrics_verification_tests.py: rename tensor "streaming" → "stream" to match the ensemble model's declared external input (ensemble maps "stream" → "streaming" internally; passing "streaming" directly caused a 400 rejection). Relates-to: TRI-978 * fix(test): fix malformed assertTrue chained comparison in custom_metrics_verification_tests The assertion was split by a comma, making the upper-bound check (difference <= 1s) a message arg rather than part of the condition. Only the lower bound (-1s <= difference) was actually verified, causing failures on B200 SBSA where the log timestamp precedes the metrics timestamp by more than 1s. Relates-to: TRI-978 * fix(test): use fromtimestamp instead of utcfromtimestamp in custom_metrics_verification_tests The server writes log timestamps in local time, but dt_curl was computed via utcfromtimestamp() (UTC). On B200 SBSA runners in UTC-8 this causes an 8-hour difference, failing the ±1s tolerance check. Use fromtimestamp() so both dt_log and dt_curl are in the same local timezone. Relates-to: TRI-978 * chore: update copyright year to 2024-2026 in modified test files Relates-to: TRI-978 * fix(metrics): set tm_isdst=-1 in convertTimestampToMicroseconds to fix DST timezone offset std::get_time does not set tm_isdst, leaving it at 0 (zero-initialized). When mktime() is called on a runner in a DST-observing timezone, it treats the parsed local time as non-DST time, producing a UTC timestamp that is off by 1 hour. Setting tm_isdst=-1 lets mktime determine DST automatically, matching the behavior of localtime() used in getCurrentTimestamp(). * fix(bench): pass explicit streaming tensor in benchmark_core_model.py TRT-LLM executor now requires streaming to be an explicit bool. Without it, model.py receives streaming=None causing a TypeError crash. Use FLAGS.decoupled to determine the streaming value (True for decoupled, False for standard synchronous inference). * fix(ci): switch benchmark_core_model to grpc to avoid gevent segfault on aarch64
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes
streaming=Nonecrash inL0_backend_trtllm--baseintroduced by TRT-LLM executor pybind change.Changes
Commit 1 —
fix(test): pass explicit streaming=False in base_metrics_verification_testsOriginal fix: explicitly pass
streaming=Falseto avoidNonebeing passed.Commit 2 —
fix(trtllm): coerce streaming=None to False; fix ensemble input name in testmodel.py: coercestreaming=None → Falseat the model layer (mirrors [TRI-966] [fix] Fix L0_backend_trtllm NVIDIA/TensorRT-LLM#13276)base_metrics_verification_tests.py: rename tensor"streaming"→"stream"— the ensemble model exposesstreamexternally and maps it tostreaminginternally; passingstreamingdirectly caused[400] unexpected inference inputFixes
Resolves: TRI-978
Related PRs: triton-inference-server/tensorrtllm_backend#855