Skip to content

Add OrderScheme.get_boundaries API#1039

Merged
rapids-bot[bot] merged 10 commits into
rapidsai:mainfrom
rjzamora:get_boundaries
May 22, 2026
Merged

Add OrderScheme.get_boundaries API#1039
rapids-bot[bot] merged 10 commits into
rapidsai:mainfrom
rjzamora:get_boundaries

Conversation

@rjzamora
Copy link
Copy Markdown
Member

@rjzamora rjzamora commented May 15, 2026

While implementing a prototype to use OrderScheme to sort a table in cudf_polars, I realized we need a Python method to extract the boundaries table.

Note: I decided it was better to return tuple[Table, Slice] than TableChunk, because this data is not uniquely owned, and we don't really have a python API for shared_ptr<TableChunk>.

Comment thread python/rapidsmpf/rapidsmpf/streaming/cudf/channel_metadata.pyx Outdated
Comment on lines +189 to +197
cdef const cpp_TableChunk* chunk = self._handle.boundaries.get()
cdef Stream stream = Stream._from_cudaStream_t(chunk.stream().value())
tbl = Table.from_table_view_of_arbitrary(
chunk.table_view(), owner=self, stream=stream
)
return TableChunk.from_pylibcudf_table(
tbl, stream, exclusive_view=False, br=br
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am looking at this, did the OrderScheme need to keep the BufferResource corresponding to the TableChunk we created it from alive, I think yes?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the buffer resource attached to the context? The context should outlive this metadata I think. Am I misunderstanding the question?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I think @wence- was asking what happens if users do not live up to that contract.

I am okay with this PR as-is, since the current contract is that the BufferResource outlives this metadata.

That said, I think this highlights a broader design issue we should address separately. We still do not have a clean ownership/lifetime story around BufferResource in Python.

I think it is time for me to start working on #641 :)

return self._handle.boundaries.get().shape().first

def get_boundaries(self, BufferResource br not None) -> TableChunk:
"""Return the boundary rows as a zero-copy TableChunk view."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please to docstring correctly.

@rjzamora
Copy link
Copy Markdown
Member Author

/merge

@rapids-bot rapids-bot Bot merged commit 8d94191 into rapidsai:main May 22, 2026
58 checks passed
@rjzamora rjzamora deleted the get_boundaries branch May 22, 2026 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants