-
Notifications
You must be signed in to change notification settings - Fork 127
feat: register external buffers #1115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
nathanielsimard
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is good overall, but I have to think if it would be better to have a user managed memory pool instead. So all memory handles/bindings would be the same, but the underlying storage would be user managed. We already have different memory pools for different usage, so that could also work. Do you have an opinion about this?
I think either approach works, but the pool approach has the benefit that bindings and handles are simple and don't need branching whenever they're used. Let me give it a shot and see how it looks. |
|
Yup, I think the pool version ends up being cleaner! One thing to note: I made it take ownership of the buffer since that's what the pool kind of leans toward with its |
| pub struct UserManagedPool { | ||
| slices: HashMap<SliceId, Slice>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would also allow manually deallocating a buffer on the user managed pool. That could be a lot faster than calling cleanup on all memory pools. It would be another way to dealloc a buffer, with fined grain control instead of relying on a "GC" like cleanup. Cleanup is not called often, since most of our pools don't need to deallocate, pretty much only when switching models to reset the memory pools allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I added an unregister API to allow this to let the caller either use or drop the underlying buffer.
f3721db to
c7c14b0
Compare
| let storage_handle = self.memory_pool.storage().register_external(buffer); | ||
| let slice_handle = self.memory_pool.register_external(storage_handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registering the buffer in the storage should be done by the memory pool also. A single call to register_external should be necessary.
| let storage_handle = self.memory_pool.unregister_external(&handle.memory)?; | ||
| self.memory_pool.storage().take(&storage_handle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, unregister_external should return the storage resource rather than the storage handle.
| /// Register an external wgpu buffer for use in kernel execution. | ||
| /// | ||
| /// Ownership of the buffer is transferred to CubeCL. The buffer will be dropped | ||
| /// when released or when all references are dropped and cleanup runs. | ||
| pub fn register_external(&mut self, buffer: wgpu::Buffer, stream_id: StreamId) -> Handle { | ||
| let stream = self.scheduler.stream(&stream_id); | ||
| stream.mem_manage.register_external(buffer, stream_id) | ||
| } | ||
|
|
||
| /// Immediately unregister an external buffer. | ||
| /// | ||
| /// The caller must ensure all GPU operations using this buffer have completed before this call. | ||
| /// | ||
| /// Returns the buffer if found, allowing the caller to use or drop it. | ||
| pub fn unregister_external( | ||
| &mut self, | ||
| handle: &Handle, | ||
| stream_id: StreamId, | ||
| ) -> Option<wgpu::Buffer> { | ||
| let stream = self.scheduler.stream(&stream_id); | ||
| stream.mem_manage.unregister_external(handle) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completness, those functions could be included in the ComputeServer trait and return/receive ComputeStorage::Resource instead. That would allow to have a unified API to register external buffers for all cubecl runtimes.
This PR allows registering external buffers with CubeCL, which is very useful as it enables zero-copy transfer. It internally uses a
BindingMemoryenum to distinguish external vs managed buffers as suggested in #291. I tried to keep this PR as minimal and non-invasive as possible - DLPack integration and such is left for future work.Fixes #291
Validate your PR with burn.
It is important that you make sure that you don't introduce any bugs in burn.
Instructions