Skip to content

[frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek#690

Open
asdf1113 wants to merge 22 commits intobuddy-compiler:mainfrom
asdf1113:split-ds
Open

[frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek#690
asdf1113 wants to merge 22 commits intobuddy-compiler:mainfrom
asdf1113:split-ds

Conversation

@asdf1113
Copy link
Contributor

@asdf1113 asdf1113 commented Feb 9, 2026

Summary

This PR extends the frontend graph partitioning support and introduces distributed parallel execution for the DeepSeek tensor-parallel example.

On the frontend side, it enhances graph_driver so that a computation graph can be partitioned into multiple subgraphs at compile time. It also fixes and improves shape inference for several operators under graph partitioning.

On the example side, this PR adds a runnable DeepSeek Tensor Parallel example under examples/BuddyTensorParallel with distributed parallel execution support. In addition, this PR includes packaging and deployment support for RISC-V RVV targets, together with updated build and run documentation for both x86 and RVV workflows.

What’s included

  • Extend graph_driver to support splitting a graph into multiple subgraphs
  • Fix shape inference for ReshapeOp and ExpandOp under graph partitioning
  • Improve output shape inference for FlashAttention and GQAAttention related operators
  • Add a runnable DeepSeek tensor-parallel example under examples/BuddyTensorParallel
  • Add runtime-level distributed execution support for the DeepSeek tensor-parallel example
  • Add an MPICH-based multi-rank launch workflow for distributed tensor-parallel execution across multiple nodes
  • Add RVV cross-compilation, packaging, and deployment support
  • Update README and build/run instructions for both x86 and RVV usage

Notes

  • This PR includes both frontend graph partitioning support and distributed parallel execution for the DeepSeek tensor-parallel example.
  • The RVV workflow packages the executable, runtime dependencies, and rank-specific parameter files for deployment across multiple RISC-V machines.

@asdf1113 asdf1113 requested a review from zhanghb97 as a code owner February 9, 2026 07:38
Copilot AI review requested due to automatic review settings February 9, 2026 07:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds frontend graph partitioning support (splitting a compiled graph into multiple subgraphs) and introduces a DeepSeek-R1 TP=2 example that runs the resulting subgraphs sequentially, along with several shape-inference updates needed to keep partitioned graphs correct.

Changes:

  • Add a SplitStrategy + extensive GraphDriver updates to split graphs into subgraphs and construct “main graphs” that call subgraphs.
  • Fix/adjust shape inference behavior for reshape/expand and attention-related ops under partitioning.
  • Add a new BuddyTensorParallel DeepSeek-R1 example (Python importer + C++ runner + CMake wiring) and extend MemRef utilities to support split/concat/add.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 26 comments.

Show a summary per file
File Description
frontend/Python/ops/tosa.py Adjust reshape/expand and attention shape inference to work with partitioned graphs.
frontend/Python/ops/linalg.py Adjust shape inference for several ops (pow/matmul/neg/cat).
frontend/Python/graph/type.py Change TensorMeta to a dict-like container with property accessors.
frontend/Python/graph/transform/fuse_ops.py Ensure fused attention op names are unique.
frontend/Python/graph/operation.py Add _newshape support and new shape-splitting helper(s).
frontend/Python/graph/graph_driver.py Major rewrite to implement subgraph splitting/strategy and main-graph construction.
frontend/Python/graph/graph.py Track lowered output nodes and propagate _newshape into placeholder shapes during import.
frontend/Python/graph/init.py Export SplitStrategy.
frontend/Interfaces/buddy/Core/Container.h Add MemRef split/concat/add helpers and a constructor taking std::vector<size_t>.
examples/CMakeLists.txt Add a build option/entry for BuddyTensorParallel examples.
examples/BuddyTensorParallel/import-deepseek-r1.py New importer that partitions DeepSeek-R1 into subgraphs and emits MLIR + weight shards.
examples/BuddyTensorParallel/dis-main.cpp New sequential TP example runner using split/concat/add MemRef helpers.
examples/BuddyTensorParallel/README.md New documentation for building/running the example.
examples/BuddyTensorParallel/CMakeLists.txt Build pipeline to generate MLIR and compile/link the example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +872 to 875
for node in self._subgraphs_inputs[subgraph_name]:
if node.name in self._graph.node_table:
call_node.add_argument(node.name)
continue
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In construct_main_graph, the dependency wiring for CallOp inputs is broken: the check if node.name in self._graph.node_table is always true for any op in the original graph, so intermediate values produced by earlier subgraphs are incorrectly treated as external inputs. This should check whether the value is already available in main_graph.node_table (placeholders) and otherwise be sourced from an upstream CallOp result via _call_table.

Copilot uses AI. Check for mistakes.
@asdf1113 asdf1113 changed the title [frontend] Add graph partitioning support and sequential TP example for DeepSeek [frontend] Add graph partitioning, distributed TP execution, and RVV deployment for DeepSeek Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants