[frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek#690
[frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek#690asdf1113 wants to merge 22 commits intobuddy-compiler:mainfrom
Conversation
…nalg.py and tosa.py
Updated README.md to reflect changes in example build instructions and removed deprecated commands.
…ing in construct_main_graph
There was a problem hiding this comment.
Pull request overview
This PR adds frontend graph partitioning support (splitting a compiled graph into multiple subgraphs) and introduces a DeepSeek-R1 TP=2 example that runs the resulting subgraphs sequentially, along with several shape-inference updates needed to keep partitioned graphs correct.
Changes:
- Add a
SplitStrategy+ extensiveGraphDriverupdates to split graphs into subgraphs and construct “main graphs” that call subgraphs. - Fix/adjust shape inference behavior for reshape/expand and attention-related ops under partitioning.
- Add a new BuddyTensorParallel DeepSeek-R1 example (Python importer + C++ runner + CMake wiring) and extend
MemRefutilities to support split/concat/add.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 26 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/Python/ops/tosa.py | Adjust reshape/expand and attention shape inference to work with partitioned graphs. |
| frontend/Python/ops/linalg.py | Adjust shape inference for several ops (pow/matmul/neg/cat). |
| frontend/Python/graph/type.py | Change TensorMeta to a dict-like container with property accessors. |
| frontend/Python/graph/transform/fuse_ops.py | Ensure fused attention op names are unique. |
| frontend/Python/graph/operation.py | Add _newshape support and new shape-splitting helper(s). |
| frontend/Python/graph/graph_driver.py | Major rewrite to implement subgraph splitting/strategy and main-graph construction. |
| frontend/Python/graph/graph.py | Track lowered output nodes and propagate _newshape into placeholder shapes during import. |
| frontend/Python/graph/init.py | Export SplitStrategy. |
| frontend/Interfaces/buddy/Core/Container.h | Add MemRef split/concat/add helpers and a constructor taking std::vector<size_t>. |
| examples/CMakeLists.txt | Add a build option/entry for BuddyTensorParallel examples. |
| examples/BuddyTensorParallel/import-deepseek-r1.py | New importer that partitions DeepSeek-R1 into subgraphs and emits MLIR + weight shards. |
| examples/BuddyTensorParallel/dis-main.cpp | New sequential TP example runner using split/concat/add MemRef helpers. |
| examples/BuddyTensorParallel/README.md | New documentation for building/running the example. |
| examples/BuddyTensorParallel/CMakeLists.txt | Build pipeline to generate MLIR and compile/link the example. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for node in self._subgraphs_inputs[subgraph_name]: | ||
| if node.name in self._graph.node_table: | ||
| call_node.add_argument(node.name) | ||
| continue |
There was a problem hiding this comment.
In construct_main_graph, the dependency wiring for CallOp inputs is broken: the check if node.name in self._graph.node_table is always true for any op in the original graph, so intermediate values produced by earlier subgraphs are incorrectly treated as external inputs. This should check whether the value is already available in main_graph.node_table (placeholders) and otherwise be sourced from an upstream CallOp result via _call_table.
Summary
This PR extends the frontend graph partitioning support and introduces distributed parallel execution for the DeepSeek tensor-parallel example.
On the frontend side, it enhances
graph_driverso that a computation graph can be partitioned into multiple subgraphs at compile time. It also fixes and improves shape inference for several operators under graph partitioning.On the example side, this PR adds a runnable DeepSeek Tensor Parallel example under
examples/BuddyTensorParallelwith distributed parallel execution support. In addition, this PR includes packaging and deployment support for RISC-V RVV targets, together with updated build and run documentation for both x86 and RVV workflows.What’s included
graph_driverto support splitting a graph into multiple subgraphsReshapeOpandExpandOpunder graph partitioningFlashAttentionandGQAAttentionrelated operatorsexamples/BuddyTensorParallelNotes