Skip to content

Conversation

@mertalev
Copy link

@mertalev mertalev commented Dec 19, 2025

This PR allows registering external buffers with CubeCL, which is very useful as it enables zero-copy transfer. It internally uses a BindingMemory enum to distinguish external vs managed buffers as suggested in #291. I tried to keep this PR as minimal and non-invasive as possible - DLPack integration and such is left for future work.

Fixes #291

Validate your PR with burn.

It is important that you make sure that you don't introduce any bugs in burn.

Instructions

  • Create a new branch or fork of the burn repo
  • Update the main Cargo.toml with this PR hash.
  • Fix any broken tests or compilation errors in burn.
  •  Submit a PR in burn with your fixes and link it here.

Copy link
Member

@nathanielsimard nathanielsimard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is good overall, but I have to think if it would be better to have a user managed memory pool instead. So all memory handles/bindings would be the same, but the underlying storage would be user managed. We already have different memory pools for different usage, so that could also work. Do you have an opinion about this?

@mertalev
Copy link
Author

The PR is good overall, but I have to think if it would be better to have a user managed memory pool instead. So all memory handles/bindings would be the same, but the underlying storage would be user managed. We already have different memory pools for different usage, so that could also work. Do you have an opinion about this?

I think either approach works, but the pool approach has the benefit that bindings and handles are simple and don't need branching whenever they're used. Let me give it a shot and see how it looks.

@mertalev
Copy link
Author

mertalev commented Dec 21, 2025

Yup, I think the pool version ends up being cleaner! One thing to note: I made it take ownership of the buffer since that's what the pool kind of leans toward with its cleanup etc., but I suppose dlpack integration would want borrow semantics for more flexibility. What do you think?

Comment on lines +16 to +17
pub struct UserManagedPool {
slices: HashMap<SliceId, Slice>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would also allow manually deallocating a buffer on the user managed pool. That could be a lot faster than calling cleanup on all memory pools. It would be another way to dealloc a buffer, with fined grain control instead of relying on a "GC" like cleanup. Cleanup is not called often, since most of our pools don't need to deallocate, pretty much only when switching models to reset the memory pools allocations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I added an unregister API to allow this to let the caller either use or drop the underlying buffer.

@mertalev mertalev force-pushed the feat/buffer-sharing branch from f3721db to c7c14b0 Compare January 1, 2026 04:28
Comment on lines +169 to +170
let storage_handle = self.memory_pool.storage().register_external(buffer);
let slice_handle = self.memory_pool.register_external(storage_handle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering the buffer in the storage should be done by the memory pool also. A single call to register_external should be necessary.

Comment on lines +180 to +181
let storage_handle = self.memory_pool.unregister_external(&handle.memory)?;
self.memory_pool.storage().take(&storage_handle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, unregister_external should return the storage resource rather than the storage handle.

Comment on lines +159 to +180
/// Register an external wgpu buffer for use in kernel execution.
///
/// Ownership of the buffer is transferred to CubeCL. The buffer will be dropped
/// when released or when all references are dropped and cleanup runs.
pub fn register_external(&mut self, buffer: wgpu::Buffer, stream_id: StreamId) -> Handle {
let stream = self.scheduler.stream(&stream_id);
stream.mem_manage.register_external(buffer, stream_id)
}

/// Immediately unregister an external buffer.
///
/// The caller must ensure all GPU operations using this buffer have completed before this call.
///
/// Returns the buffer if found, allowing the caller to use or drop it.
pub fn unregister_external(
&mut self,
handle: &Handle,
stream_id: StreamId,
) -> Option<wgpu::Buffer> {
let stream = self.scheduler.stream(&stream_id);
stream.mem_manage.unregister_external(handle)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completness, those functions could be included in the ComputeServer trait and return/receive ComputeStorage::Resource instead. That would allow to have a unified API to register external buffers for all cubecl runtimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Direct buffer passing

2 participants