Skip to content

Add pd-disaggregation release support#1

Open
fwyc0573 wants to merge 3 commits into
mainfrom
pr/pd-disaggregation-v0.2
Open

Add pd-disaggregation release support#1
fwyc0573 wants to merge 3 commits into
mainfrom
pr/pd-disaggregation-v0.2

Conversation

@fwyc0573

Copy link
Copy Markdown
Collaborator

Summary

This PR adds the public pd-disaggregation path prepared for the v0.2 release surface.

  • Add KV-cache and M2N transfer models, transfer events, scheduler wiring, metrics plumbing, and CLI release guard handling for sequential pd-disaggregation runs.
  • Add offline and online PDD examples under examples/architecture/pdd/, plus smoke tests and script cross-checks.
  • Rename the included release surface from pd-only to pd-disaggregation, with a naming guard to keep the old term out of frontier/, examples/, and tests/.

Scope

  • frontier/
  • examples/
  • tests/
  • root docs
  • generated outputs, logs, profiling data, or task memory
  • dependency or submodule updates
  • release/tag changes

Validation

Environment: conda run -n frontier, Python 3.10.20.

  • Core transfer commit targeted tests: 30 passed in 2.03s.
  • Example and smoke commit targeted tests: 46 passed in 17.82s.
  • Naming commit targeted tests: 9 passed in 0.17s.
  • Final PDD preparation gate on the PR branch: 72 passed in 18.98s.
  • CLI release guard probe: python -m frontier.main --sys_arch pd-disaggregation exits with code 1, prints the --no-enable_parallel_clusters guidance, and does not print a Python traceback.
  • Python compile check for changed files: 47 files, pass.
  • PDD shell syntax check: 12 scripts, pass.
  • git diff --check: pass.
  • Old-name scan under frontier/, examples/, and tests/: 358 tracked files scanned, 0 path hits, 0 content hits.
  • Path audit for main..HEAD: 61 changed files, 0 files outside the allowed PR scope.

Release surface

The public pd-disaggregation path is limited to sequential cluster execution in this PR. Runs that omit --no-enable_parallel_clusters fail fast with a clean CLI error because parallel cluster processing for pd-disaggregation is outside this release surface.

Dependencies

No dependency, submodule, or release tag changes are included.

fwyc0573 added 3 commits June 14, 2026 02:29
Constraint: PR preparation is local-only for worktrees/Frontier and scoped to frontier/ plus tests/unit core transfer paths; no push, release, or PR publication.

Rejected: Blindly applying the full pre-release-v0.2 patch queue | mixed commits contain examples, old pd-only names, and unrelated release hardening outside this commit boundary.

Confidence: high

Scope-risk: moderate

Directive: Keep follow-up example scripts and final naming cleanup in separate commits so review can isolate core simulator behavior from docs/examples wording.

Tested: PYTHONPATH=/local/ycfeng/frontier/worktrees/Frontier conda run -n frontier python -m pytest tests/unit/test_pd_transfer_entities.py tests/unit/test_pd_transfer_predictors.py tests/unit/test_pd_transfer_types_and_configs.py tests/unit/test_kv_transfer_completion_contract.py tests/unit/test_prefix_cache_cluster_validation.py tests/unit/test_request_generator_decode_bound_count.py -q -> 25 passed in 1.23s; changed_python_files=40 py_compile PASS; git diff --check PASS; staged_unexpected_files=0.

Not-tested: Full final preparation gate is reserved for the completed three-commit branch.
Constraint: Commit is limited to examples/architecture/pdd, examples indexes, transfer config boundary checks, and PDD smoke/cross-validation tests for the local worktrees/Frontier PR branch.

Rejected: Applying the full examples patch | target main already contains co-location offline/online layout, and broad co-location rewrites are outside the pd-disaggregation PR boundary.

Confidence: high

Scope-risk: moderate

Directive: Keep generated outputs, analysis/performance harnesses, root docs, and profiling docs out of this PR unless a future scope expansion explicitly includes them.

Tested: PYTHONPATH=/local/ycfeng/frontier/worktrees/Frontier conda run -n frontier python -m pytest tests/unit/test_examples_pdd_scripts.py tests/unit/test_pdd_scripts_cross_validate.py tests/e2e/test_pd_disaggregation_example_script_smoke.py tests/e2e/test_pdd_example_scripts_smoke.py -q -> 22 passed in 16.70s; changed_python_files=8 py_compile PASS; pdd_shell_scripts=12 bash -n PASS; git diff --check PASS; staged_unexpected_files=0.

Not-tested: Full final preparation gate is reserved for the completed three-commit branch.
Constraint: Commit is limited to final naming cleanup and the naming guard for tracked frontier/examples/tests content on the local worktrees/Frontier PR branch.

Rejected: Broad rename of historical analysis, generated outputs, profiling docs, or performance artifacts | those paths are outside patch_queue_audit.md allowlist for this PR.

Confidence: high

Scope-risk: narrow

Directive: Keep retired pd-only spellings out of source-facing frontier/examples/tests paths; use string splitting in guard tests when a negative assertion must mention a retired token.

Tested: PYTHONPATH=/local/ycfeng/frontier/worktrees/Frontier conda run -n frontier python -m pytest tests/unit/test_pd_disaggregation_naming_guard.py -q -> 1 passed in 0.10s; PYTHONPATH=/local/ycfeng/frontier/worktrees/Frontier conda run -n frontier python -m pytest tests/unit/test_examples_pdd_scripts.py -q -> 8 passed in 0.08s; changed_python_files=4 py_compile PASS; git diff --check PASS; staged_unexpected_files=0; case-insensitive old-name rg returned 0 hits.

Not-tested: Full final preparation gate is reserved for the completed three-commit branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant