Add OrderScheme.get_boundaries API#1039
Conversation
| cdef const cpp_TableChunk* chunk = self._handle.boundaries.get() | ||
| cdef Stream stream = Stream._from_cudaStream_t(chunk.stream().value()) | ||
| tbl = Table.from_table_view_of_arbitrary( | ||
| chunk.table_view(), owner=self, stream=stream | ||
| ) | ||
| return TableChunk.from_pylibcudf_table( | ||
| tbl, stream, exclusive_view=False, br=br | ||
| ) | ||
|
|
There was a problem hiding this comment.
As I am looking at this, did the OrderScheme need to keep the BufferResource corresponding to the TableChunk we created it from alive, I think yes?
There was a problem hiding this comment.
Isn't the buffer resource attached to the context? The context should outlive this metadata I think. Am I misunderstanding the question?
There was a problem hiding this comment.
You are right, I think @wence- was asking what happens if users do not live up to that contract.
I am okay with this PR as-is, since the current contract is that the BufferResource outlives this metadata.
That said, I think this highlights a broader design issue we should address separately. We still do not have a clean ownership/lifetime story around BufferResource in Python.
I think it is time for me to start working on #641 :)
| return self._handle.boundaries.get().shape().first | ||
|
|
||
| def get_boundaries(self, BufferResource br not None) -> TableChunk: | ||
| """Return the boundary rows as a zero-copy TableChunk view.""" |
There was a problem hiding this comment.
Please to docstring correctly.
|
/merge |
While implementing a prototype to use
OrderSchemeto sort a table in cudf_polars, I realized we need a Python method to extract the boundaries table.Note: I decided it was better to returntuple[Table, Slice]thanTableChunk, because this data is not uniquely owned, and we don't really have a python API forshared_ptr<TableChunk>.