Adding nvtx memory regions to pool MR#1952
Conversation
Signed-off-by: niranda perera <niranda.perera@gmail.com>
| rmm::detail::format_bytes(size) + ")", | ||
| rmm::out_of_memory); | ||
| auto const block = this->underlying().get_block(size, stream_event); | ||
| auto const block = get_block(size, stream_event); |
There was a problem hiding this comment.
❓ question: Why drop the CRTP indirection here? This doesn't seem related to this PR.
There was a problem hiding this comment.
@harrism that's right. But when I was reading the code, what I gathered was, get_block is not implemented by the derived class. It's not mentioned here as well. https://github.com/nirandaperera/rmm/blob/adding_nvtx_pool/cpp/include/rmm/mr/device/detail/stream_ordered_memory_resource.hpp#L70-L76
So, IINM, we can simply call the method, without the indirection.
| #endif | ||
|
|
||
| #ifdef RMM_NVTX | ||
| void* heap_key; |
There was a problem hiding this comment.
So this adds some overhead on every suballocation. And the insertion into the nvtx_heaps map is a small overhead on upstream allocations.
Can you please benchmark this cost with the random allocations benchmark with NVTX on and off and report it in the PR? Is NVTX enabled by default? Depending on these costs, we may want it off by default.
There was a problem hiding this comment.
@harrism Yes, there is an overhead here. In particular
- Inserting and querying from the
nvtx_heaps_unordered map. - calling
lower_boundonunstream_blocks_set (which is logarithmic)
I think we can alleviate 2, if we add a void* upstream_ member to the block class, rather than the bool head. Then IINM, is_head() will be upstream_ == ptr_. But then, we are adding additional 3-bytes to the block class.
There was a problem hiding this comment.
Do you think its a worthwhile change?
There was a problem hiding this comment.
I would like to see benchmarks, if you don't mind. :)
Signed-off-by: niranda perera <niranda.perera@gmail.com>
Signed-off-by: niranda perera <niranda.perera@gmail.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
@nirandaperera Have you had a chance to run the benchmarks Mark was looking for to see any perf differences? |
Description
Checklist