test: Add C++ gRPC cancellation tests to L0_request_cancellation#8775
test: Add C++ gRPC cancellation tests to L0_request_cancellation#8775yinggeh wants to merge 1 commit into
Conversation
0a43ae8 to
308cad3
Compare
308cad3 to
0c063cf
Compare
|
|
||
| SERVER=/opt/tritonserver/bin/tritonserver | ||
| source ../common/util.sh | ||
| CANCEL_LOG_LINE="Cancellation notification received for" |
There was a problem hiding this comment.
probably needs a trailing space.
There was a problem hiding this comment.
This sentence looks odd. There should be something after 'for'.
Do you have an example the actual log line looks like?
There was a problem hiding this comment.
Two locations
server/src/grpc/infer_handler.cc
Lines 718 to 722 in 0c063cf
server/src/grpc/stream_infer_handler.cc
Lines 152 to 156 in 0c063cf
There was a problem hiding this comment.
As J said, there is a trailing space after 'for'.
| requests whose results are no longer required can significantly impact server | ||
| resources. | ||
| Triton supports handling request cancellation received from the gRPC Python | ||
| client or a C API user (since r23.10), and C++ client (since r26.05). |
There was a problem hiding this comment.
This change is not going in r26.05.
There was a problem hiding this comment.
Try to cherry-pick since internal team is waiting for this feature.
There was a problem hiding this comment.
kind of a big overhaul this late into the release, no?
what is the risk assessment? (discuss offline)
|
|
||
| SERVER=/opt/tritonserver/bin/tritonserver | ||
| source ../common/util.sh | ||
| CANCEL_LOG_LINE="Cancellation notification received for" |
There was a problem hiding this comment.
This sentence looks odd. There should be something after 'for'.
Do you have an example the actual log line looks like?
| TEST_LOG="./grpc_cancellation_test_cpp.$TEST_CASE.log" | ||
| SERVER_LOG="./grpc_cancellation_test_cpp.$TEST_CASE.server.log" | ||
|
|
||
| # AsyncInferMulti fans out N concurrent requests; bump to 3 CPU |
There was a problem hiding this comment.
is there a check for N >= 3?
There was a problem hiding this comment.
Can you elaborate? Check N requests, instances or cancellation?
There was a problem hiding this comment.
For N concurrent requests to fan out to 3 CPU, shouldn't we have N > 3?
There was a problem hiding this comment.
In this test, each request execution takes 10 seconds. To avoid backlog in the request queue (reduce overall test time), the model configuration is increased to 3 instances. If N > 3, requests after 3rd will wait in the queue until the first 3 requests have completed execution, which will take 10 seconds.
There was a problem hiding this comment.
I see what you mean. Here we are testing requests that are cancelled during execution. I can also add a test for in-queue request cancellation.
There was a problem hiding this comment.
Thanks. Yes, let's test for in-queue cancellation also.
What does the PR do?
Wires the new C++
grpc_cancellation_testgtest (added in the client-side companion PR) intoqa/L0_request_cancellation/test.shas a sibling of the existing Python cancellation suite. Each gtest case is run against a freshtritonserverand the count ofCancellation notification received forlog lines is asserted to match the expected count for that case.Also temporarily bumps the model's
instance_groupcount to 3 around theTestGrpcAsyncInferMulticase (reverted after) so the three fanned-out requests can execute concurrently; the test cancels two of them while letting the middle one complete naturally and would otherwise serialize on a single CPU instance.Checklist
<commit_type>: <Title>Commit Type:
Related PRs:
triton-inference-server/client#896
Where should the reviewer start?
Test plan:
L0_request_cancellation--base
Caveats:
Background
Triton has had Python-side request cancellation tests since r23.10 but no C++ counterpart, despite the C++ gRPC client gaining matching APIs in the companion client PR. This PR adds the missing test wiring so the two clients' cancellation behavior stays in lock-step on every CI run.