Hello! Thank you for your efforts, I learned a lot from your code.
I still have some doubts though, i.e. have you tested the performance comparison between your code and the latest TensorRT MoE kernels?
btw, I know that the MoE kernels implemented by the inference framework vLLM is based on the triton implementation, and I think I will compare the performance of your code with it in the near future.
Hello! Thank you for your efforts, I learned a lot from your code.
I still have some doubts though, i.e. have you tested the performance comparison between your code and the latest TensorRT MoE kernels?
btw, I know that the MoE kernels implemented by the inference framework vLLM is based on the triton implementation, and I think I will compare the performance of your code with it in the near future.