add fatrelu#3
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements the fatrelu_and_mul activation function for XPU, including the SYCL kernel, C++ bindings, and comprehensive Python tests. The feedback highlights significant code duplication between the newly added act_and_mul_with_param_vec_kernel and the existing act_and_mul_vec_kernel, as well as their associated macros. It is recommended to refactor these components into a unified implementation that can handle activation functions both with and without additional parameters to improve maintainability.
| template < | ||
| typename scalar_t, | ||
| scalar_t (*ACT_FN)(const scalar_t&, const float), | ||
| int VEC_SIZE> | ||
| class act_and_mul_with_param_vec_kernel { | ||
| public: | ||
| act_and_mul_with_param_vec_kernel( | ||
| scalar_t* __restrict__ out, | ||
| const scalar_t* __restrict__ input, | ||
| const int d, | ||
| const float param) | ||
| : out_(out), input_(input), d_(d), param_(param) {} | ||
|
|
||
| void operator()(sycl::nd_item<1> item) const { | ||
| using vec_t = vllm::xpu::aligned_vec<scalar_t, VEC_SIZE>; | ||
| const int64_t token_idx = item.get_group(0); | ||
| const int64_t offset = item.get_local_linear_id(); | ||
| const int64_t step = item.get_local_range(0); | ||
| const int64_t bound = d_ / VEC_SIZE; | ||
|
|
||
| for (int64_t i = offset; i < bound; i += step) { | ||
| auto x_vec = | ||
| reinterpret_cast<const vec_t*>(input_)[token_idx * bound * 2 + i]; | ||
| auto y_vec = reinterpret_cast<const vec_t*>( | ||
| input_)[token_idx * bound * 2 + i + bound]; | ||
| vec_t out_vec; | ||
| #pragma unroll | ||
| for (int j = 0; j < VEC_SIZE; ++j) { | ||
| out_vec[j] = ACT_FN(x_vec[j], param_) * y_vec[j]; | ||
| } | ||
| reinterpret_cast<vec_t*>(out_)[token_idx * bound + i] = out_vec; | ||
| } | ||
| } | ||
|
|
||
| private: | ||
| scalar_t* __restrict__ out_; | ||
| const scalar_t* __restrict__ input_; | ||
| const int d_; | ||
| const float param_; | ||
| }; |
There was a problem hiding this comment.
This new kernel act_and_mul_with_param_vec_kernel is almost a complete duplicate of act_and_mul_vec_kernel. Similarly, the new macros VEC_LAUNCH_ACT_AND_MUL_WITH_PARAM and LAUNCH_ACTIVATION_GATE_KERNEL_WITH_PARAM_VEC are duplicates of existing macros. This significant code duplication makes maintenance harder because changes may need to be applied in multiple places.
Consider refactoring to unify these kernels and macros. For example, you could use a single kernel that can handle activation functions both with and without an extra parameter. This could be achieved using if constexpr and std::is_invocable_v to check the signature of the activation function pointer. This would reduce code duplication and improve maintainability.
…om numeric error (vllm-project#261) Signed-off-by: yangqun <qun.yang@intel.com>
…llm-project#264) Signed-off-by: yangqun <qun.yang@intel.com>
…#257) * tune num_kv_splits for page decode kernel Signed-off-by: baodii <di.bao@intel.com> * change min blocks in kernel Signed-off-by: baodii <di.bao@intel.com> * Redesign get_num_splits() heuristic for better split decisions Key changes to the heuristic: - Early-exit when cur_parallel >= num_xe_cores (was 4x cores) - kv_heads >= 4: fixed target of num_xe_cores * 64 / block_size (~20 at p64) - kv_heads <= 2: floor at num_xe_cores splits, scale with blocks/10 for long sequences Benchmark results (page_size=64, BMG 20 XE cores, unitrace, L2 flush): 10 configs faster (avg 1.05x), 10 neutral, 2 slower (kernel-level) Signed-off-by: baodii <di.bao@intel.com> --------- Signed-off-by: baodii <di.bao@intel.com>
* apply_scale for 4 bits Signed-off-by: mayuyuace <qiming1.zhang@intel.com> * apply_scale only for bf16 Signed-off-by: mayuyuace <qiming1.zhang@intel.com> * prefetch Signed-off-by: mayuyuace <qiming1.zhang@intel.com> * prefetch Signed-off-by: mayuyuace <qiming1.zhang@intel.com> * finetune moe gemm policy Signed-off-by: mayuyuace <qiming1.zhang@intel.com> * remove useless code and format Signed-off-by: mayuyuace <qiming1.zhang@intel.com> --------- Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
* fix overflow Signed-off-by: mayuyuace <qiming1.zhang@intel.com> * add UT for overflow Signed-off-by: mayuyuace <qiming1.zhang@intel.com> * file mode Signed-off-by: mayuyuace <qiming1.zhang@intel.com> --------- Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>
* [Test] refine test socpe definition Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * add scope Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * add scope Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * minor Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * onednn version Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * llama3 Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> * fix Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> --------- Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com> Co-authored-by: Harish Subramony <harish.subramony@intel.com>
…d_qk_norm_rope kernel (vllm-project#267) * Add fuse_norm_quant, fuse_act_quant and fused_qk_norm_rope kernel Signed-off-by: Lai, Yejing <yejing.lai@intel.com> * fix format Signed-off-by: Lai, Yejing <yejing.lai@intel.com> * fix format Signed-off-by: Lai, Yejing <yejing.lai@intel.com> * add fused_qk_norm_rope head_dim=512 case and update vec_size Signed-off-by: Lai, Yejing <yejing.lai@intel.com> --------- Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Signed-off-by: Yejing Lai <yejing.lai@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
27897c7 to
15b12eb
Compare
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.
Purpose
Test Plan
Test Result
(Optional) Documentation Update
BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)