代码位置:
xpu-perf-main/projects/micro_perf/op_defs/llm_ops/store_kv_cache.py的第273 ~ 288行
问题描述:
- 按照输入qkv包shape为[num_tokens, total_dim],274行代码src_k_data = packed_qkv[token_start:token_end, self.k_dim_start:self.k_dim_end] 取出来的src_k_data的shape也是两维[token_num, total_dim];
- 275行代码:src_k_data = src_k_data.contiguous().transpose(0, 1)进行转置后,src_k_data的shape为[total_dim, token_num];
- 276行代码:dst_k_cache = k_cache[kv_slot_id, :, cache_start:cache_end, :],dst_k_cache的存储内存shape为[slot_id, head_num, token_num, head_dim];
- 280行代码:dst_k_cache.copy_(src_k_data),在数据拷贝的时候,无法将src_k_data的shape为[total_dim, token_num] 拷贝成 dst_k_cache的存储内存shape为[slot_id, head_num, token_num, head_dim];
请问这块是否应修改成:
输入qkv包shape修改为3维[num_tokens, head_num, head_dim]; 274行代码修改为 src_k_data = packed_qkv[token_start:token_end, k_head_start:k_head_end, :]
代码位置:
xpu-perf-main/projects/micro_perf/op_defs/llm_ops/store_kv_cache.py的第273 ~ 288行
问题描述:
请问这块是否应修改成:
输入qkv包shape修改为3维[num_tokens, head_num, head_dim]; 274行代码修改为 src_k_data = packed_qkv[token_start:token_end, k_head_start:k_head_end, :]