feat: Support for allocating GPU memory based on the selected profile by anmolgupt · Pull Request #108 · triton-inference-server/tensorrt_backend

anmolgupt · 2025-04-02T04:29:02Z

The changes in the PR support 2 main items:

the GPU memory is allocated based on the selected TensorRT profile and not based on the profile that consumes max memory even when it's not selected.
Avoid the creation of profile 0 execution context if it's required.

Signed-off-by: Anmol Gupta <14880251+anmolgupt@users.noreply.github.com>

dongfengy · 2025-04-02T20:51:39Z

For kUSER_MANAGED , the user (in this case the triton server) would need to actually allocate a piece of device memory and pass to execution context. Do you have support for this behavior? If not I would suggest you to only add kSTATIC and kON_PROFILE_CHANGE

removed user_managed option

yinggeh · 2025-04-17T18:39:41Z

  // the first context creation. As currently triton supports one
  // context per engine, in order to set the specified profile_index,
  // another context is created and the previous context is destroyed.


As currently triton supports one context per engine, in order to set the specified profile_index, another context is created and the previous context is destroyed.

Is the comment still valid? From the code, each profile_index holds a context.

if (profile_index == 0) { res.first->second.context_ = std::move(default_trt_context); } else { res.first->second.context_.reset(engine_->createExecutionContext());

I tested at my end, the changes work for me.

Signed-off-by: Anmol Gupta <14880251+anmolgupt@users.noreply.github.com>

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

anmolgupt · 2025-04-17T22:48:48Z

@yinggeh: New changes look good to me; I got the expected results on the models with these updates.

yinggeh · 2025-04-18T01:02:39Z

Updated README.md

yinggeh · 2025-04-18T19:38:44Z

LGTM. Thanks for your contribution.

anmolgupt added 2 commits February 19, 2025 11:12

Added support for ExecutionContextAllocationStrategy

5cf4ee1

Signed-off-by: Anmol Gupta <14880251+anmolgupt@users.noreply.github.com>

remove creation of profile 0 execution context if not needed

a13dee2

Signed-off-by: Anmol Gupta <14880251+anmolgupt@users.noreply.github.com>

Update model_state.cc

5741267

removed user_managed option

pskiran1 requested review from pskiran1 and tanmayv25 April 3, 2025 06:06

yinggeh self-requested a review April 17, 2025 17:24

This was referenced Apr 17, 2025

remove profile 0 context anmolgupt/tensorrt_backend#2

Closed

Added support for ExecutionContextAllocationStrategy anmolgupt/tensorrt_backend#1

Closed

Suggested changes

dccb9a1

yinggeh requested a review from rmccorm4 April 17, 2025 18:35

yinggeh reviewed Apr 17, 2025

View reviewed changes

Comment thread src/model_state.cc Outdated

changed the name to execution_context_allocation_strategy

a0e8408

Signed-off-by: Anmol Gupta <14880251+anmolgupt@users.noreply.github.com>

anmolgupt marked this pull request as ready for review April 17, 2025 21:19

anmolgupt changed the title ~~Support for allocating GPU memory based on the selected profile~~ feat: Support for allocating GPU memory based on the selected profile Apr 17, 2025

yinggeh reviewed Apr 17, 2025

View reviewed changes

Comment thread src/model_state.cc Outdated

yinggeh reviewed Apr 17, 2025

View reviewed changes

Comment thread src/model_state.cc Outdated

anmolgupt and others added 2 commits April 17, 2025 15:26

Update src/model_state.cc

092dfd9

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Update src/model_state.cc

f75cf6b

Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>

Update doc

25e7b08

yinggeh added 2 commits April 18, 2025 00:43

Remove comment that seems no longer valid

606488c

Fix pre-commit

748adf2

yinggeh mentioned this pull request Apr 18, 2025

feat: Parameterize TensorRT allocation strategy #109

Closed

11 tasks

Description on parameter values

63633e2

rmccorm4 mentioned this pull request Apr 18, 2025

test: Add config parameter "execution_context_allocation_strategy" to TensorRT backend triton-inference-server/server#8150

Merged

11 tasks

yinggeh approved these changes Apr 18, 2025

View reviewed changes

yinggeh merged commit 9d7ba1d into triton-inference-server:main Apr 18, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support for allocating GPU memory based on the selected profile#108

feat: Support for allocating GPU memory based on the selected profile#108
yinggeh merged 11 commits into
triton-inference-server:mainfrom
anmolgupt:anmolgupt/remove_profile_0_context

anmolgupt commented Apr 2, 2025

Uh oh!

dongfengy commented Apr 2, 2025

Uh oh!

yinggeh Apr 17, 2025

Uh oh!

anmolgupt Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anmolgupt commented Apr 17, 2025

Uh oh!

yinggeh commented Apr 18, 2025 •

edited

Loading

Uh oh!

yinggeh commented Apr 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

anmolgupt commented Apr 2, 2025

Uh oh!

dongfengy commented Apr 2, 2025

Uh oh!

yinggeh Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

anmolgupt Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anmolgupt commented Apr 17, 2025

Uh oh!

yinggeh commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yinggeh commented Apr 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

yinggeh commented Apr 18, 2025 •

edited

Loading