feat: Add request cancellation to C++ gRPC client#896
Open
yinggeh wants to merge 1 commit into
Open
Conversation
e26c27d to
8004de9
Compare
8004de9 to
a0e7e46
Compare
whoisj
approved these changes
May 14, 2026
mudit-eng
reviewed
May 14, 2026
mudit-eng
reviewed
May 14, 2026
| const std::vector<std::vector<const InferRequestedOutput*>>& outputs, | ||
| const Headers& headers, grpc_compression_algorithm compression_algorithm) | ||
| const Headers& headers, grpc_compression_algorithm compression_algorithm, | ||
| std::vector<CallContext*>* ctxs_out) |
There was a problem hiding this comment.
Pointer to pointer is tricky and error prone. Why not pass as a reference?
Contributor
Author
There was a problem hiding this comment.
Because the argument is optional. A reference cannot be null.
There was a problem hiding this comment.
Help me understand. Is it ctxs_out that would be null or the vector entries inside it? Asking because can we have an empty vector?
Contributor
Author
There was a problem hiding this comment.
The pointer value doesn't matter and can be null. See example in the document
https://github.com/triton-inference-server/client/pull/896/changes#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R603-R616
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds per-request cancellation to the C++ gRPC client, mirroring the existing Python
tritonclient.grpccancellation interfaces.AsyncInfergains an optional trailingCallContext** ctx_out. OnError::Success, the caller receives a heap-allocatedCallContextwhoseCancel()method callsgrpc::ClientContext::TryCancel().Cancel()after natural completion is a safe no-op.AsyncInferMultigains an optional trailingstd::vector<CallContext*>* ctxs_out— one handle per fanned-out request. Cancellation is per-request; the multi callback still fires exactly once after every leaf produces a result.StopStreamgains abool cancel_requests = false. When true, the streaming RPC isTryCancel'd and the stream callback receives one finalInferResultwhose status containsLocally cancelled by application!.StartStreamcan be called again —grpc_context_is rebuilt in place becausegrpc::ClientContextis non-movable and a cancelled context cannot be reused.Locally cancelled by application!, matchingtritonclient.grpc._utils.get_cancelled_error()in the Python client.Backwards compatibility
Every new parameter defaults to
nullptr/false. Existing call sites compile and behave exactly as before; cancellation is strictly opt-in.Testing
New
src/c++/tests/grpc_cancellation_test.cc(gtest) with 7 cases:TestGrpcAsyncInfertest_grpc_async_inferparity (1:1)TestGrpcAsyncInferCancelAfterCompletionIsNoOpTestGrpcAsyncInferWithoutContextStillCompletesTestGrpcAsyncInferMultiTestGrpcStreamInfertest_grpc_stream_inferparity (1:1)TestGrpcStreamCancelWithoutInferTestGrpcStreamCancelThenRestartRelated
triton-inference-server/server#8775