Skip to content

micro_perf手动修改代码开启profiler后,延迟结果在部分场景下统计错误 #189

@testman0001

Description

@testman0001

复现方式:micro_perf测试moe_scatter_dynamic_quant
对应代码位置:https://github.com/bytedance/xpu-perf/blob/main/micro_perf/backends/GPU/backend_gpu.py#L178

                take_iters = prefer_iterations // 2
                iters_offset = prefer_iterations - take_iters

                removed_keys = []
                for kernel in kernel_latency_list:
                    if len(kernel_latency_list[kernel]) != prefer_iterations:
                        removed_keys.append(kernel)
                    average_latency += sum(kernel_latency_list[kernel][iters_offset:])
                for kernel in removed_keys:
                    kernel_latency_list.pop(kernel)

                average_latency /= take_iters

问题一:
这里本意可能是跳过部分数据,但是对于一次迭代有多个kernel调用的场景错误地统计了延迟,仅仅跳过了iters_offset个,按代码逻辑正常应该是跳过一半;另外疑惑的是就算实现正确了,是否应该根据warmup来判断要不要舍弃。

问题二:

if len(kernel_latency_list[kernel]) != prefer_iterations:
removed_keys.append(kernel)

这里对于一次迭代有多个kernel调用的场景是否不应该算作removed_keys,而是认为是正常现象,否则kernels打印获取不到内容

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions