[frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek by asdf1113 · Pull Request #690 · buddy-compiler/buddy-mlir

asdf1113 · 2026-02-09T07:38:51Z

Summary

This PR extends the frontend graph partitioning support and introduces distributed parallel execution for the DeepSeek tensor-parallel example.

On the frontend side, it enhances graph_driver so that a computation graph can be partitioned into multiple subgraphs at compile time. It also fixes and improves shape inference for several operators under graph partitioning.

On the example side, this PR adds a runnable DeepSeek Tensor Parallel example under examples/BuddyTensorParallel with distributed parallel execution support. In addition, this PR includes packaging and deployment support for RISC-V RVV targets, together with updated build and run documentation for both x86 and RVV workflows.

What’s included

Extend graph_driver to support splitting a graph into multiple subgraphs
Fix shape inference for ReshapeOp and ExpandOp under graph partitioning
Improve output shape inference for FlashAttention and GQAAttention related operators
Add a runnable DeepSeek tensor-parallel example under examples/BuddyTensorParallel
Add runtime-level distributed execution support for the DeepSeek tensor-parallel example
Add an MPICH-based multi-rank launch workflow for distributed tensor-parallel execution across multiple nodes
Add RVV cross-compilation, packaging, and deployment support
Update README and build/run instructions for both x86 and RVV usage

Notes

This PR includes both frontend graph partitioning support and distributed parallel execution for the DeepSeek tensor-parallel example.
The RVV workflow packages the executable, runtime dependencies, and rank-specific parameter files for deployment across multiple RISC-V machines.

…nalg.py and tosa.py

Updated README.md to reflect changes in example build instructions and removed deprecated commands.

…ing in construct_main_graph

Copilot

Pull request overview

This PR adds frontend graph partitioning support (splitting a compiled graph into multiple subgraphs) and introduces a DeepSeek-R1 TP=2 example that runs the resulting subgraphs sequentially, along with several shape-inference updates needed to keep partitioned graphs correct.

Changes:

Add a SplitStrategy + extensive GraphDriver updates to split graphs into subgraphs and construct “main graphs” that call subgraphs.
Fix/adjust shape inference behavior for reshape/expand and attention-related ops under partitioning.
Add a new BuddyTensorParallel DeepSeek-R1 example (Python importer + C++ runner + CMake wiring) and extend MemRef utilities to support split/concat/add.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 26 comments.

Show a summary per file

File	Description
frontend/Python/ops/tosa.py	Adjust reshape/expand and attention shape inference to work with partitioned graphs.
frontend/Python/ops/linalg.py	Adjust shape inference for several ops (pow/matmul/neg/cat).
frontend/Python/graph/type.py	Change `TensorMeta` to a dict-like container with property accessors.
frontend/Python/graph/transform/fuse_ops.py	Ensure fused attention op names are unique.
frontend/Python/graph/operation.py	Add `_newshape` support and new shape-splitting helper(s).
frontend/Python/graph/graph_driver.py	Major rewrite to implement subgraph splitting/strategy and main-graph construction.
frontend/Python/graph/graph.py	Track lowered output nodes and propagate `_newshape` into placeholder shapes during import.
frontend/Python/graph/init.py	Export `SplitStrategy`.
frontend/Interfaces/buddy/Core/Container.h	Add MemRef split/concat/add helpers and a constructor taking `std::vector<size_t>`.
examples/CMakeLists.txt	Add a build option/entry for BuddyTensorParallel examples.
examples/BuddyTensorParallel/import-deepseek-r1.py	New importer that partitions DeepSeek-R1 into subgraphs and emits MLIR + weight shards.
examples/BuddyTensorParallel/dis-main.cpp	New sequential TP example runner using split/concat/add MemRef helpers.
examples/BuddyTensorParallel/README.md	New documentation for building/running the example.
examples/BuddyTensorParallel/CMakeLists.txt	Build pipeline to generate MLIR and compile/link the example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

frontend/Python/graph/operation.py

frontend/Python/graph/graph_driver.py

Copilot · 2026-02-09T07:52:08Z

frontend/Python/graph/graph_driver.py

+            for node in self._subgraphs_inputs[subgraph_name]:
+                if node.name in self._graph.node_table:
+                    call_node.add_argument(node.name)
                    continue


In construct_main_graph, the dependency wiring for CallOp inputs is broken: the check if node.name in self._graph.node_table is always true for any op in the original graph, so intermediate values produced by earlier subgraphs are incorrectly treated as external inputs. This should check whether the value is already available in main_graph.node_table (placeholders) and otherwise be sourced from an upstream CallOp result via _call_table.

examples/BuddyTensorParallel/buddy-deepseek-r1-subgraph.cpp

frontend/Python/ops/tosa.py

frontend/Python/graph/graph_driver.py

examples/BuddyTensorParallel/import-deepseek-r1.py

asdf1113 added 12 commits February 9, 2026 06:03

graph/ops: prepare core infrastructure for model parallelism

210af91

Introduce vertical subgraph splitting support

e5d1700

Fix FlashAttention and GQAAttention node naming in fuse_ops.py

9ac76e9

Fix output shape inference for cat and FlashAttention operators in li…

9cfe0fa

…nalg.py and tosa.py

Fix output shape inference for GQAAttention operators tosa.py

b8cc774

Add support for graph partitioning in graph_driver

d6b491b

Support DeepSeek model splitting and sequential execution of subgraphs

f7cb3b2

Revise BuddyTensorParallel example build instructions

c23b23e

Updated README.md to reflect changes in example build instructions and removed deprecated commands.

Fix incorrect dimension promotion when lowering enpandop to MLIR

8775e26

Fix shape inference for reshapeop in splits and single-subgraph handl…

d8d5df0

…ing in construct_main_graph

Support non-splitted single-subgraph execution in graph_driver

6820ff1

DeepSeek split graphs: execute subgraphs sequentially

9256657

asdf1113 requested a review from zhanghb97 as a code owner February 9, 2026 07:38

Copilot AI review requested due to automatic review settings February 9, 2026 07:38

Copilot started reviewing on behalf of asdf1113 February 9, 2026 07:39 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

asdf1113 added 5 commits February 24, 2026 07:27

Fix copilot suggestions

fe05ca4

Add distributed DeepSeek execution and RVV deployment support

3eba737

Update README

4b65664

update readme

b19f351

Update RVV environment setup document link

8fc607b

asdf1113 changed the title ~~[frontend] Add graph partitioning support and sequential TP example for DeepSeek~~ [frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek Mar 10, 2026

asdf1113 added 5 commits March 10, 2026 11:09

Fix distributed decode shutdown synchronization

1417831

Merge remote-tracking branch 'myrepo/split-ds' into split-llama

f57cc43

Merge remote-tracking branch 'origin/main' into split-llama

860a540

Update lit.cfg.py to exclude BuddyTensorParallel

dad0f0a

Trigger CI

90d9c7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek#690

[frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek#690
asdf1113 wants to merge 22 commits intobuddy-compiler:mainfrom
asdf1113:split-ds

asdf1113 commented Feb 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asdf1113 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What’s included

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asdf1113 commented Feb 9, 2026 •

edited

Loading