Skip to content

[Examples] add Python runner for BuddyDeepSeekR1 E2E inference#699

Open
GuoningHuang wants to merge 2 commits intobuddy-compiler:mainfrom
GuoningHuang:py
Open

[Examples] add Python runner for BuddyDeepSeekR1 E2E inference#699
GuoningHuang wants to merge 2 commits intobuddy-compiler:mainfrom
GuoningHuang:py

Conversation

@GuoningHuang
Copy link
Contributor

@GuoningHuang GuoningHuang commented Feb 25, 2026

Summary

Add a Python AOT end-to-end runner for examples/BuddyDeepSeekR1 to execute exported prefill/decode subgraphs.

Why

  • Simpler than C++ for runtime orchestration (prefill/decode loop, stop conditions, sampling).
  • Faster iteration: change runtime behavior without rebuilding C++.
  • More flexible runtime features (chat template, sampling params, eos/length controls).
  • Updated examples/BuddyDeepSeekR1/README.md with usage instructions.

performance

The performance is comparable to the C++ version:
image

@GuoningHuang GuoningHuang marked this pull request as draft February 25, 2026 16:06
@GuoningHuang GuoningHuang marked this pull request as ready for review February 25, 2026 16:07
@zhanghb97
Copy link
Member

(torch2.8) zhb@s306-gpu-p1-102:~/buddy-mlir$ python3 examples/BuddyDeepSeekR1/run-subgraphs-python.py \
  --prompt "Hello, who are you?" \
  --export-subgraphs
[Python] runtime: aot
Traceback (most recent call last):
  File "/home/zhb/buddy-mlir/examples/BuddyDeepSeekR1/run-subgraphs-python.py", line 391, in <module>
    main()
  File "/home/zhb/buddy-mlir/examples/BuddyDeepSeekR1/run-subgraphs-python.py", line 377, in main
    _run_aot(
  File "/home/zhb/buddy-mlir/examples/BuddyDeepSeekR1/run-subgraphs-python.py", line 163, in _run_aot
    runtime_lib = ctypes.CDLL(str(runtime_so), mode=rtld_global)
  File "/home/zhb/miniconda3/envs/torch2.8/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/zhb/buddy-mlir/build/examples/BuddyDeepSeekR1/libdeepseek_forward_runtime.so: undefined symbol: _mlir_ciface_rtclock

Does the process use rtclock? It looks like after generating the C interface, it wasn't linked.

@GuoningHuang
Copy link
Contributor Author

(torch2.8) zhb@s306-gpu-p1-102:~/buddy-mlir$ python3 examples/BuddyDeepSeekR1/run-subgraphs-python.py \
  --prompt "Hello, who are you?" \
  --export-subgraphs
[Python] runtime: aot
Traceback (most recent call last):
  File "/home/zhb/buddy-mlir/examples/BuddyDeepSeekR1/run-subgraphs-python.py", line 391, in <module>
    main()
  File "/home/zhb/buddy-mlir/examples/BuddyDeepSeekR1/run-subgraphs-python.py", line 377, in main
    _run_aot(
  File "/home/zhb/buddy-mlir/examples/BuddyDeepSeekR1/run-subgraphs-python.py", line 163, in _run_aot
    runtime_lib = ctypes.CDLL(str(runtime_so), mode=rtld_global)
  File "/home/zhb/miniconda3/envs/torch2.8/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/zhb/buddy-mlir/build/examples/BuddyDeepSeekR1/libdeepseek_forward_runtime.so: undefined symbol: _mlir_ciface_rtclock

Does the process use rtclock? It looks like after generating the C interface, it wasn't linked.

I have updated the corresponding README. You can try running:

python3 run-subgraphs-python.py --prompt "hello" --artifact-dir ../../build/examples/BuddyDeepSeekR1 --llvm-build-dir ../../llvm/build --omp-num-threads 48 --omp-proc-bind close

in the buddy-mlir/examples/BuddyDeepSeekR directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants