-
Notifications
You must be signed in to change notification settings - Fork 126
Added Plugin system in cubecl-runtime with an example in cubecl cuda… #1057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| type ClientHandle: Send; | ||
| /// This is a type which needs to be build by a `ComputeServer`. | ||
| type ServerHandle: Send; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is strange to have two handles here, I think it would be cleaner to have maybe 3 associated types:
- Input (ServerHandle) the input of the function.
- Output (ReturnVal) the output of the function.
- State (InitType) the state of the plugin that needs an init.
I would ditch the client handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all thanks for your time and feedback. :)
Fair enough, I'm not particularly creative and consistent with my choice of words :D So yeah fully with you and will definitely improve on that in future iterations.
| /// Since we know how the `ServerHandle` will look the `Plugin` will | ||
| /// be able to define additional functions which can be executed over the | ||
| /// `ServerHandle` by a `ComputeServer` | ||
| type Fns: FnOnce(Self::ServerHandle) -> Result<Self::ReturnVal, PluginError>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use an associated type for a function rather than.. a function:
fn execute(input: Self::Input) -> Result<Self::Output, PluginError>;I think it would be cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my targeted architecture the goal is to decouple cluster function orchestration from the whole cluster lifecycle. So a deployment could profit from ai research while handling production without the need of komplex re-deploynment iterations. There is an hole reseaoning for this decision, but ... (Conv0)
| /// Type we want for initialisation. | ||
| pub trait PluginType { | ||
| type Insert: Send + Sync; | ||
|
|
||
| fn init(self) -> Self::Insert; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put that function inside the Plugin trait:
pub trait Plugin {
...
fn init(state: Self::State);
}I don't think we need a returned type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats true xD Made the type function while designing my fn orchestrator. The function use case is not necessary even with my architecture xD Unnecessary and like you said in future iterations.
| &mut self, | ||
| client_handle: SP::ClientHandle, | ||
| stream_id: StreamId, | ||
| op: SP::Fns, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to pass the function as input here, or maybe there's something I'm not understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Conv0) ... fair enough, would say there is too much of my architecture leaking in here. These concepts are more for my specific case not for general use. Your suggestion is definitely a better match for CubeCL's general API.
| client0 | ||
| .plugin_fn::<NcclExtension>(handle0, op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably a bit complex for the user API. I like the idea of plugin, but the function call can use enum instead of function definitions. If for instance the Plugin::Input is an enum, like AllReduce, Init, Gather, etc. We don't have to support many functions per extension. It would be cleaner to call:
client.execute::<NcclExtension>(NcclArg::AllReduce(NcclReduce::Avg, inputs, output));Do you think that would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a deeper understanding of the differentiations you introduced me to this makes sense. Also had the idea to improve on the execution method of ComputeClient, but i'm carefull with reworking implementations due to the respect of the author. Coming from a corporate employe background my additions already felt illegal. :D Will adapt and make a more integrated solution to abstract complexities inline with CubeCl's design. Already hade a more profound look into it and got an idea will iterate on this. But didn't wanted to do that if you have had no intressed in the general approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I think the plugin system makes sense. It's normal to iterate over a design :)
|
Ok, big thanks for the input. I have a deeper understanding of the differentiation in our project philosophies. The general goal for my next iteration will be a more straightforward usage within the client for general use while also allowing me to approach plugins with capabilities for optimizations needed in production grade multi service network deployments. The second iteration will propose a integration with the client's execution methods, to abstract complexities in sync with CubeCL while maintaining compabilitie. |
Hello Lucca from Use-Ai.rs here,
Decided on branding therefore know my final account for my project.
So, to enable the possibility to add additional features to ComputeServer which can be accessed through the ComputeClient i propose the Plugin system implemented here. For now i used it to get nccl to cubecl-cuda. The system allows to introduce and initialize additional types and execute additional functions over ComputeServer. This is used to not lose threads when building worker based Runtime for integrated production deployments. A library released next week will be in need for a implementation of that kind.
This is a draft with the first docs within the files. Before implementing full docs for cubecl i would need to know if you guys would generally accept a full proposal?