feat: Support for allocating GPU memory based on the selected profile#108
Conversation
Signed-off-by: Anmol Gupta <14880251+anmolgupt@users.noreply.github.com>
Signed-off-by: Anmol Gupta <14880251+anmolgupt@users.noreply.github.com>
|
For kUSER_MANAGED , the user (in this case the triton server) would need to actually allocate a piece of device memory and pass to execution context. Do you have support for this behavior? If not I would suggest you to only add kSTATIC and kON_PROFILE_CHANGE |
removed user_managed option
| // the first context creation. As currently triton supports one | ||
| // context per engine, in order to set the specified profile_index, | ||
| // another context is created and the previous context is destroyed. |
There was a problem hiding this comment.
As currently triton supports one context per engine, in order to set the specified profile_index, another context is created and the previous context is destroyed.
Is the comment still valid? From the code, each profile_index holds a context.
if (profile_index == 0) {
res.first->second.context_ = std::move(default_trt_context);
} else {
res.first->second.context_.reset(engine_->createExecutionContext());
There was a problem hiding this comment.
I tested at my end, the changes work for me.
Signed-off-by: Anmol Gupta <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
|
@yinggeh: New changes look good to me; I got the expected results on the models with these updates. |
|
Updated README.md |
|
LGTM. Thanks for your contribution. |
The changes in the PR support 2 main items: