Added Plugin system in cubecl-runtime with an example in cubecl cuda… #1057

Q-Lucca · 2025-11-19T22:41:01Z

Hello Lucca from Use-Ai.rs here,

Decided on branding therefore know my final account for my project.

So, to enable the possibility to add additional features to ComputeServer which can be accessed through the ComputeClient i propose the Plugin system implemented here. For now i used it to get nccl to cubecl-cuda. The system allows to introduce and initialize additional types and execute additional functions over ComputeServer. This is used to not lose threads when building worker based Runtime for integrated production deployments. A library released next week will be in need for a implementation of that kind.

This is a draft with the first docs within the files. Before implementing full docs for cubecl i would need to know if you guys would generally accept a full proposal?

…which implements nccl

nathanielsimard · 2025-11-24T20:36:46Z

crates/cubecl-runtime/src/plugin.rs

+    type ClientHandle: Send;
+    /// This is a type which needs to be build by a `ComputeServer`.
+    type ServerHandle: Send;


It is strange to have two handles here, I think it would be cleaner to have maybe 3 associated types:

Input (ServerHandle) the input of the function.

Output (ReturnVal) the output of the function.

State (InitType) the state of the plugin that needs an init.

I would ditch the client handle.

First of all thanks for your time and feedback. :)

Fair enough, I'm not particularly creative and consistent with my choice of words :D So yeah fully with you and will definitely improve on that in future iterations.

nathanielsimard · 2025-11-24T20:38:30Z

crates/cubecl-runtime/src/plugin.rs

+    /// Since we know how the `ServerHandle` will look the `Plugin` will
+    /// be able to define additional functions which can be executed over the
+    /// `ServerHandle` by a `ComputeServer`
+    type Fns: FnOnce(Self::ServerHandle) -> Result<Self::ReturnVal, PluginError>;


Why use an associated type for a function rather than.. a function:

fn execute(input: Self::Input) -> Result<Self::Output, PluginError>;

I think it would be cleaner.

In my targeted architecture the goal is to decouple cluster function orchestration from the whole cluster lifecycle. So a deployment could profit from ai research while handling production without the need of komplex re-deploynment iterations. There is an hole reseaoning for this decision, but ... (Conv0)

nathanielsimard · 2025-11-24T20:40:03Z

crates/cubecl-runtime/src/plugin.rs

+/// Type we want for initialisation.
+pub trait PluginType {
+    type Insert: Send + Sync;
+
+    fn init(self) -> Self::Insert;
+}


I would put that function inside the Plugin trait:

pub trait Plugin { ... fn init(state: Self::State); }

I don't think we need a returned type here.

Thats true xD Made the type function while designing my fn orchestrator. The function use case is not necessary even with my architecture xD Unnecessary and like you said in future iterations.

nathanielsimard · 2025-11-24T20:42:07Z

crates/cubecl-runtime/src/plugin.rs

+        &mut self,
+        client_handle: SP::ClientHandle,
+        stream_id: StreamId,
+        op: SP::Fns,


I don't think we need to pass the function as input here, or maybe there's something I'm not understanding.

(Conv0) ... fair enough, would say there is too much of my architecture leaking in here. These concepts are more for my specific case not for general use. Your suggestion is definitely a better match for CubeCL's general API.

nathanielsimard · 2025-11-24T20:49:35Z

crates/cubecl-cuda/src/compute/nccl.rs

+                client0
+                    .plugin_fn::<NcclExtension>(handle0, op)


I think it's probably a bit complex for the user API. I like the idea of plugin, but the function call can use enum instead of function definitions. If for instance the Plugin::Input is an enum, like AllReduce, Init, Gather, etc. We don't have to support many functions per extension. It would be cleaner to call:

client.execute::<NcclExtension>(NcclArg::AllReduce(NcclReduce::Avg, inputs, output));

Do you think that would be better?

With a deeper understanding of the differentiations you introduced me to this makes sense. Also had the idea to improve on the execution method of ComputeClient, but i'm carefull with reworking implementations due to the respect of the author. Coming from a corporate employe background my additions already felt illegal. :D Will adapt and make a more integrated solution to abstract complexities inline with CubeCl's design. Already hade a more profound look into it and got an idea will iterate on this. But didn't wanted to do that if you have had no intressed in the general approach.

No worries, I think the plugin system makes sense. It's normal to iterate over a design :)

Q-Lucca · 2025-11-25T08:39:28Z

Ok, big thanks for the input. I have a deeper understanding of the differentiation in our project philosophies. The general goal for my next iteration will be a more straightforward usage within the client for general use while also allowing me to approach plugins with capabilities for optimizations needed in production grade multi service network deployments. The second iteration will propose a integration with the client's execution methods, to abstract complexities in sync with CubeCL while maintaining compabilitie.

Q-Lucca added 4 commits November 19, 2025 22:19

Added Plugin system in cubecl-runtime with an example in cubecl cuda …

4d36610

…which implements nccl

Merge branch 'main' into feat/ext+cunccl

d203d52

Merge branch 'main' into feat/ext+cunccl

131a9a1

Merge branch 'main' into feat/ext+cunccl

21f5087

nathanielsimard reviewed Nov 24, 2025

View reviewed changes

Added Plugin system in cubecl-runtime with an example in cubecl cuda… #1057

Are you sure you want to change the base?

Added Plugin system in cubecl-runtime with an example in cubecl cuda… #1057

Uh oh!

Conversation

Q-Lucca commented Nov 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Q-Lucca Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Q-Lucca commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Q-Lucca Nov 25, 2025 •

edited

Loading

Q-Lucca commented Nov 25, 2025 •

edited

Loading