Skip to content

[frontend]: automatically inject profiling instructions in DynamoComiler#665

Draft
trdthg wants to merge 1 commit intobuddy-compiler:mainfrom
trdthg:auto-time-profile
Draft

[frontend]: automatically inject profiling instructions in DynamoComiler#665
trdthg wants to merge 1 commit intobuddy-compiler:mainfrom
trdthg:auto-time-profile

Conversation

@trdthg
Copy link
Collaborator

@trdthg trdthg commented Jan 14, 2026

Summary

This commit introduces the enable_profile option to automatically measure the execution time of each graph node. When enabled, the DynamoCompiler performs the following instrumentations after lower_to_top_level_ir:

  • Inject rtclock for timestamp acquire and record_timing for data record
  • Injects global strings for each node in format "op_name_{node_index}_{node_name}"
  • Injects timing probes around each node to calculate and record the elapsed execution time.

BuddyTransformer example output

Results:
========
Total time: 6924.74 ms
Average time per iteration: 69.2474 ms

========================================
     Operator Timing Report
========================================
Operator                          Avg (ms)    Min (ms)    Max (ms)     % Total
--------------------------------------------------------------------------------
op_name_000_arg0_1                  0.0001      0.0000      0.0005     0.0001%
op_name_001_arg0_1                  0.0001      0.0000      0.0007     0.0001%
op_name_002_arg1_1                  0.0001      0.0000      0.0005     0.0001%
op_name_003_arg2_1                  0.0001      0.0000      0.0007     0.0001%
op_name_004_arg3_1                  0.0001      0.0000      0.0005     0.0001%
op_name_005_arg4_1                  0.0001      0.0000      0.0007     0.0002%
op_name_006_arg5_1                  0.0000      0.0000      0.0005     0.0001%
op_name_007_arg6_1                  0.0001      0.0000      0.0005     0.0001%
op_name_008_arg0_1                  0.0001      0.0000      0.0005     0.0001%
op_name_009_arg7_1                  0.0001      0.0000      0.0005     0.0001%
op_name_010_arg8_1                  0.0001      0.0000      0.0005     0.0001%
op_name_011_arg9_1                  0.0000      0.0000      0.0002     0.0001%
op_name_012_arg10_1                 0.0000      0.0000      0.0005     0.0001%
op_name_013_pow_1                   0.4193      0.0482     13.4954     0.7483%
op_name_014_mean                    0.3940      0.1776     18.8553     0.7032%
op_name_015_add                     0.0783      0.0076      6.9997     0.1397%
op_name_016_rsqrt                   0.0885      0.0076      7.9994     0.1579%
op_name_017_mul                     0.1312      0.0467      8.0030     0.2341%
op_name_018_mul_1                   0.0554      0.0427      1.0221     0.0988%
op_name_019_permute                 0.7298      0.7007      0.8152     1.3026%
op_name_020_view                    0.0001      0.0000      0.0005     0.0001%
op_name_021_mm                      1.8358      1.3628      9.6781     3.2768%
op_name_022_view_1                  0.0001      0.0000      0.0005     0.0001%
op_name_023_permute_1               0.0962      0.0839      0.6294     0.1716%
op_name_024_view_2                  0.0001      0.0000      0.0005     0.0001%
op_name_025_mm_1                    1.1115      0.8516      1.6091     1.9839%
op_name_026_view_3                  0.0001      0.0000      0.0005     0.0001%
op_name_027_permute_2               0.0925      0.0818      0.3469     0.1651%
op_name_028_view_4                  0.0001      0.0000      0.0005     0.0002%
op_name_029_mm_2                    0.5695      0.4818      1.5318     1.0166%
op_name_030_view_5                  0.0000      0.0000      0.0005     0.0001%
op_name_031_view_6                  0.0001      0.0000      0.0005     0.0001%
op_name_032_permute_3               0.1871      0.1359      3.1655     0.3339%
op_name_033_view_7                  0.0001      0.0000      0.0005     0.0002%
op_name_034_permute_4               0.0280      0.0196      0.0393     0.0500%
op_name_035_view_8                  0.0001      0.0000      0.0007     0.0001%
op_name_036_permute_5               0.0301      0.0188      0.3262     0.0537%
op_name_037_unsqueeze               0.0001      0.0000      0.0005     0.0001%
op_name_038_expand                  0.1038      0.0947      0.3011     0.1853%
op_name_039_clone                   0.0001      0.0000      0.0005     0.0002%
op_name_040_view_9                  0.0001      0.0000      0.0005     0.0001%
op_name_041_unsqueeze_1             0.0001      0.0000      0.0005     0.0001%
op_name_042_expand_1                0.1007      0.0925      0.2542     0.1798%
op_name_043_clone_1                 0.0001      0.0000      0.0005     0.0001%
op_name_044_view_10                 0.0001      0.0000      0.0005     0.0001%
op_name_045_permute_6               0.2353      0.2193      0.2565     0.4200%
op_name_046_expand_2                0.0001      0.0000      0.0005     0.0001%
op_name_047_view_11                 0.0001      0.0000      0.0007     0.0001%
op_name_048_expand_3                0.0000      0.0000      0.0005     0.0001%
op_name_049_view_12                 0.0001      0.0000      0.0007     0.0002%
op_name_050_bmm                     0.0580      0.0410      0.0846     0.1035%
op_name_051_view_13                 0.0001      0.0000      0.0002     0.0001%
op_name_052_mul_2                   0.0736      0.0687      0.0889     0.1313%
op_name_053_view_14                 0.0001      0.0000      0.0007     0.0001%
op_name_054_convert_element_type      0.0044      0.0036      0.0050     0.0079%
op_name_055_sub                     0.0039      0.0033      0.0048     0.0070%
op_name_056_mul_3                   0.0037      0.0031      0.0095     0.0067%
op_name_057_add_1                   0.0740      0.0694      0.1118     0.1321%
op_name_058_amax                    0.0725      0.0668      0.1240     0.1294%
op_name_059_sub_1                   0.0747      0.0715      0.0889     0.1333%
op_name_060_exp                     0.1347      0.1295      0.1948     0.2404%
op_name_061_sum_1                   0.0427      0.0412      0.0629     0.0763%
op_name_062_div                     0.1398      0.1352      0.1698     0.2495%
op_name_063_expand_4                0.0001      0.0000      0.0005     0.0001%
op_name_064_view_15                 0.0001      0.0000      0.0005     0.0001%
op_name_065_expand_5                0.0001      0.0000      0.0005     0.0001%
op_name_066_view_16                 0.0001      0.0000      0.0005     0.0001%
op_name_067_bmm_1                   0.0513      0.0427      0.0792     0.0915%
op_name_068_view_17                 0.0001      0.0000      0.0005     0.0002%
op_name_069_permute_7               0.1028      0.0985      0.1254     0.1835%
op_name_070_clone_2                 0.0001      0.0000      0.0005     0.0001%
op_name_071_view_18                 0.0001      0.0000      0.0005     0.0002%
op_name_072_permute_8               0.7876      0.7510      0.8631     1.4059%
op_name_073_view_19                 0.0001      0.0000      0.0007     0.0002%
op_name_074_mm_3                    1.9218      1.4813      4.9264     3.4303%
op_name_075_view_20                 0.0001      0.0000      0.0005     0.0001%
op_name_076_add_2                   0.0666      0.0615      0.0808     0.1190%
op_name_077_pow_2                   0.0403      0.0367      0.0527     0.0719%
op_name_078_mean_1                  0.1859      0.1736      0.3536     0.3318%
op_name_079_add_3                   0.0085      0.0074      0.0141     0.0151%
op_name_080_rsqrt_1                 0.0088      0.0076      0.0212     0.0156%
op_name_081_mul_4                   0.0434      0.0401      0.0563     0.0775%
op_name_082_mul_5                   0.0496      0.0453      0.1013     0.0885%
op_name_083_permute_9               6.9690      6.0620     23.2682    12.4393%
op_name_084_view_21                 0.0000      0.0000      0.0005     0.0001%
op_name_085_mm_4                    4.7123      4.2460     13.0131     8.4112%
op_name_086_view_22                 0.0001      0.0000      0.0005     0.0001%
op_name_087_sigmoid                 1.4877      1.3990      6.9935     2.6555%
op_name_088_mul_6                   0.3101      0.1793     11.0002     0.5536%
op_name_089_permute_10              6.0568      5.5680     14.1487    10.8111%
op_name_090_view_23                 0.0000      0.0000      0.0007     0.0001%
op_name_091_mm_5                    4.2920      3.9907     10.9949     7.6610%
op_name_092_view_24                 0.0001      0.0000      0.0007     0.0001%
op_name_093_mul_7                   0.3308      0.2422      5.9965     0.5904%
op_name_094_permute_11              7.3766      4.7812     23.3753    13.1669%
op_name_095_view_25                 0.0001      0.0000      0.0005     0.0001%
op_name_096_mm_6                   13.3981     11.8101     21.9383    23.9149%
op_name_097_view_26                 0.0001      0.0000      0.0005     0.0001%
op_name_098_add_4                   0.8518      0.1216     16.9990     1.5204%
op_name_099_output                  0.0001      0.0000      0.0005     0.0002%
op_name_100_output                  0.0001      0.0000      0.0005     0.0002%
--------------------------------------------------------------------------------
TOTAL                              56.0239                             100.0%
========================================


Output validation:
First few output values: 0.4959 -0.1718 0.8967 0.8511 -0.9225 

There were duplicate entries for arg0_1 and output in the nodes list, I don't know why, so I prepended the strings with an ID as a workaround. They also might be useful for sorting.

Checklist

  • The code builds successfully
  • Existing tests pass
  • New tests are added where appropriate
  • Code follows the project coding style
  • Documentation is updated if needed

…iler

This commit introduces the `enable_profile` option to automatically measure
the execution time of each graph node. When enabled, the DynamoCompiler performs
the following instrumentations after `lower_to_top_level_ir`:

- Inject `rtclock` for timestamp acquire and `record_timing` for data record
- Injects global strings for each node in format "op_name_{node_index}_{node_name}"
- Injects timing probes around each node to
   calculate and record the elapsed execution time.
@trdthg trdthg requested a review from zhanghb97 as a code owner January 14, 2026 09:19
@trdthg trdthg marked this pull request as draft January 14, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant