Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 68 additions & 26 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

| Date | Summary of Changes |
|------------|--------------------|
| 2026-06-14 | Added PDD pd-disaggregation examples, script index, and release-scope guidance for local PR preparation. |
| 2026-06-08 | Clarified that dummy analytical co-location smoke runs validate runtime plumbing, not profiling fidelity. |
| 2026-06-08 | Split co-location examples into `offline/` and `online/`, added suite runner and cross-validation guidance. |
| 2026-06-07 | Added optimized co-location advanced MoE recipes, top-level profiling examples, and corrected metrics behavior for Thinking Mode. |
Expand All @@ -16,36 +17,43 @@ This directory contains runnable examples for the release-supported Frontier sim

## Release Scope

`pre-release-v0.1` supports only the `co-location` architecture. Historical `pd-disaggregation` and `pd-af-disaggregation` examples are intentionally not included in this branch. If those architectures are requested through CLI/config, Frontier exits with the release error documented in the top-level `README.md`.
`pre-release-v0.2` foregrounds **PDD / `pd-disaggregation`** examples: prefill runs in the `PREFILL` cluster, decode runs in the unified `DECODE` cluster, and KV cache is transferred between them. The public PDD example path uses the sequential simulator mode through `--no-enable_parallel_clusters`.

## Quick Start
The `pd-af-disaggregation` architecture and split `DECODE_ATTN` / `DECODE_FFN` release surface remain intentionally outside this examples release scope. Co-location examples are still kept as baseline comparison recipes and historical v0.1-compatible references.

The co-location examples are split by simulation mode:
## Quick Start

- `examples/architecture/co-location/offline/`: offline batch simulations. Existing offline examples were moved here unchanged in scenario intent.
- `examples/architecture/co-location/online/`: online serving simulations that mirror the offline scenarios while preserving generated request arrivals.
- `examples/architecture/co-location/run_all.sh`: one-click suite runner for all 10 co-location cases.
The PDD dense example uses dummy execution-time prediction and the analytical communication backend, so it does not require profiling data or the collective-sim binary for the first smoke run.

```bash
export PYTHONPATH=$PWD
export WANDB_DISABLED=true
export VIDUR_DISABLE_WANDB=1

# Run all five offline cases and all five online cases.
bash examples/architecture/co-location/run_all.sh
bash examples/architecture/pdd/offline/dense_model_basic.sh
```

# Run one case directly.
bash examples/architecture/co-location/offline/dense_model_basic.sh
bash examples/architecture/co-location/online/dense_model_basic_online.sh
For the complete PDD architecture suite, run:

# Thinking Mode examples are available in both modes.
bash examples/architecture/co-location/offline/thinking_mode_basic.sh
bash examples/architecture/co-location/online/thinking_mode_basic_online.sh
```bash
bash examples/architecture/pdd/run_all.sh
```

All co-location examples default to `--cc_backend_config_type analytical` so the suite is one-click runnable on a fresh checkout without building the collective-sim binary. To exercise the topology-aware backend, set `CC_BACKEND=collective_sim` and build `frontier/cc_backend/backends/collective-sim/sim/datacenter/htsim_ndp` first.
Co-location baseline and advanced recipes remain available for comparison. The current co-location layout is split into offline and online entrypoints:

Profiling commands can be validated without launching GPU kernels by using `--dry-run`:
```bash
bash examples/architecture/co-location/run_all.sh
bash examples/architecture/co-location/offline/dense_model_basic.sh
bash examples/architecture/co-location/offline/moe_model_basic.sh
bash examples/architecture/co-location/offline/thinking_mode_basic.sh
bash examples/architecture/co-location/offline/moe_spec_dec.sh
bash examples/architecture/co-location/offline/moe_prefix_caching.sh
bash examples/architecture/co-location/online/dense_model_basic_online.sh
bash examples/architecture/co-location/online/moe_model_basic_online.sh
bash examples/architecture/co-location/online/thinking_mode_basic_online.sh
bash examples/architecture/co-location/online/moe_spec_dec_online.sh
bash examples/architecture/co-location/online/moe_prefix_caching_online.sh
```

```bash
bash examples/profiling/profile_linear_op.sh --dry-run
Expand All @@ -62,6 +70,21 @@ examples/
│ └── prefix_cache_shared_session_trace.csv
├── architecture/
│ ├── README.md
│ ├── pdd/
│ │ ├── run_all.sh
│ │ ├── dense_model_basic.sh
│ │ ├── offline/
│ │ │ ├── dense_model_basic.sh
│ │ │ ├── moe_model_basic.sh
│ │ │ ├── thinking_mode_basic.sh
│ │ │ ├── moe_spec_dec.sh
│ │ │ └── moe_prefix_caching.sh
│ │ └── online/
│ │ ├── dense_model_basic_online.sh
│ │ ├── moe_model_basic_online.sh
│ │ ├── thinking_mode_basic_online.sh
│ │ ├── moe_spec_dec_online.sh
│ │ └── moe_prefix_caching_online.sh
│ └── co-location/
│ ├── run_all.sh
│ ├── offline/
Expand All @@ -88,9 +111,19 @@ examples/

## Architecture Mode

### PDD / pd-disaggregation

Separate prefill and decode clusters model prefill/decode disaggregation without splitting decode attention and decode FFN into separate public release clusters.

- `--sys_arch pd-disaggregation`
- Uses `PREFILL` and unified `DECODE` clusters.
- Supports Dense, MoE, Thinking Mode, Speculative Decoding / MTP, and Prefix Caching examples in offline and online modes.
- Uses `--no-enable_parallel_clusters` because the pre-release-v0.2 public PDD path is the sequential simulator path; parallel cluster processing is still guarded.
- Keeps `pd-af-disaggregation` and global `--use_cuda_graph` outside the v0.2 examples release surface.

### Co-location

Single monolithic cluster handles all prefill and decode work.
Single monolithic cluster handles all prefill and decode work. These examples are retained as baseline comparison recipes.

- `--sys_arch co-location`
- Supports dense and MoE model configs.
Expand All @@ -99,13 +132,22 @@ Single monolithic cluster handles all prefill and decode work.

## Key Configuration Options

### PDD Cluster Layout

- `--cluster_config_prefill_cluster_num_replicas`: Number of `PREFILL` cluster replicas.
- `--cluster_config_decode_cluster_num_replicas`: Number of unified `DECODE` cluster replicas.
- `--cluster_config_prefill_replica_config_*`: `PREFILL` replica parallelism and device fields.
- `--cluster_config_decode_replica_config_*`: `DECODE` replica parallelism and device fields.
- `--analytical_kv_cache_transfer_config_network_bandwidth_gbps`: Analytical KV transfer bandwidth.
- `--analytical_kv_cache_transfer_config_network_latency_ms`: Analytical KV transfer latency.

### Parallelism

- `--replica_config_attn_tensor_parallel_size`: Attention tensor parallelism.
- `--replica_config_moe_tensor_parallel_size`: MoE tensor parallelism.
- `--replica_config_moe_expert_parallel_size`: Expert parallelism.
- `--replica_config_num_pipeline_stages`: Pipeline parallelism.
- `--cluster_config_num_replicas`: Number of monolithic cluster replicas.
- `--replica_config_attn_tensor_parallel_size`: Attention tensor parallelism for co-location examples.
- `--replica_config_moe_tensor_parallel_size`: MoE tensor parallelism for co-location examples.
- `--replica_config_moe_expert_parallel_size`: Expert parallelism for co-location examples.
- `--replica_config_num_pipeline_stages`: Pipeline parallelism for co-location examples.
- `--cluster_config_num_replicas`: Number of monolithic cluster replicas for co-location examples.

### Request Generation

Expand All @@ -129,11 +171,11 @@ Single monolithic cluster handles all prefill and decode work.

## Running Examples

The checked-in co-location simulation examples use dummy mode (`--random_forrest_execution_time_predictor_config_enable_dummy_mode`) for quick testing without profiling data. Dummy mode skips ML predictor training and profiling metadata loading, so missing profiling CSVs do not affect smoke-test correctness.
The checked-in PDD examples use dummy mode (`--random_forrest_execution_time_predictor_config_enable_dummy_mode`), analytical communication cost modeling, and `--no-enable_parallel_clusters` for quick testing without profiling data. The expected minimal dense smoke behavior is one completed request, one KV cache transfer, and no release-guard crash. Metrics are written under `outputs/examples/pdd` by default.

These examples validate CLI/runtime plumbing and metrics artifact generation, not profiling fidelity. Use non-dummy profiling data before drawing hardware accuracy conclusions.
PDD Thinking Mode can produce multiple prefill-to-decode handoffs for one user request. The default small smoke configuration completes one request and records two KV transfers.

Offline cases write under `outputs/examples/co-location/offline/<model_type>/offline_batch/<run_id>/` by default. Online cases write under `outputs/examples/co-location/online/<model_type>/online_serving/<run_id>/` by default. The mode-specific `offline_batch` / `online_serving` path segment is added by Frontier's metrics taxonomy.
Co-location examples also use dummy mode for quick testing without profiling data. These examples validate CLI/runtime plumbing and metrics artifact generation, not profiling fidelity. Use non-dummy profiling data before drawing hardware accuracy conclusions.

Baseline co-location scripts default to `decode_cuda_graph_mode=full_decode_only` and Chunked Prefill. The Speculative Decoding / MTP recipes use `decode_cuda_graph_mode=none` because speculative decoding currently conflicts with decode CUDA Graph modeling. The Prefix Caching recipes replay `examples/fixtures/prefix_cache_shared_session_trace.csv` to exercise cache-hit behavior.

Expand All @@ -154,7 +196,7 @@ When comparing offline and online pairs, validate the following for each scenari

## Thinking Mode Example

The Thinking Mode scripts use:
The PDD and co-location Thinking Mode scripts use:

- `--enable_thinking_mode`
- `--thinking_depth 2`
Expand Down
75 changes: 71 additions & 4 deletions examples/architecture/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,18 @@
# Architecture Examples

This directory contains one-click architecture entrypoints for Frontier's release-supported runtime layout.
## Modification History

| Date | Summary of Changes |
|------------|--------------------|
| 2026-06-14 | Added PDD pd-disaggregation script list, configuration contract, and validation criteria for local PR preparation. |

This directory contains one-click architecture entrypoints for Frontier's release-supported runtime layouts.

## Release Scope

`pre-release-v0.1` supports only `co-location`. Disaggregated architecture examples are intentionally absent from this branch because the runtime guard rejects `pd-disaggregation` and `pd-af-disaggregation`.
`pre-release-v0.2` foregrounds **PDD / `pd-disaggregation`** examples. Prefill runs in the `PREFILL` cluster, decode runs in the unified `DECODE` cluster, and KV cache is transferred between them. The public PDD example path uses the sequential simulator mode through `--no-enable_parallel_clusters`.

`co-location` examples remain available as baseline comparison recipes and v0.1-compatible architecture references. `pd-af-disaggregation` and split `DECODE_ATTN` / `DECODE_FFN` public examples remain outside this release scope.

## Scripts

Expand All @@ -21,6 +29,38 @@ This directory contains one-click architecture entrypoints for Frontier's releas
| `co-location/online/thinking_mode_basic_online.sh` | Online Thinking Mode v1 co-location | Mirrors Thinking Mode offline settings with `--simulation_mode online` |
| `co-location/online/moe_spec_dec_online.sh` | Online MoE Speculative Decoding / MTP | Mirrors Speculative Decoding offline settings with `--simulation_mode online` |
| `co-location/online/moe_prefix_caching_online.sh` | Online MoE Prefix Caching | Replays the same prefix-cache fixture with `--simulation_mode online` |
| `pdd/run_all.sh` | Full PDD suite | Runs all five offline PDD cases and all five online PDD cases; pass extra Frontier CLI flags after `--` |
| `pdd/offline/dense_model_basic.sh` | Offline dense PDD baseline | Sequential `pd-disaggregation`, analytical backend, dummy execution time, Chunked Prefill, CSV/JSON metrics |
| `pdd/offline/moe_model_basic.sh` | Offline MoE PDD baseline | Sequential `pd-disaggregation`, reference-runnable shared-domain MoE topology, Chunked Prefill, CSV/JSON metrics |
| `pdd/offline/thinking_mode_basic.sh` | Offline Thinking Mode v1 PDD | Thinking Mode with two KV transfer handoffs for the one-request smoke configuration |
| `pdd/offline/moe_spec_dec.sh` | Offline MoE PDD Speculative Decoding / MTP | Speculative Decoding enabled; Prefix Caching intentionally disabled; `DECODE_CUDA_GRAPH_MODE=none` |
| `pdd/offline/moe_prefix_caching.sh` | Offline MoE PDD Prefix Caching | Sticky scheduler with `examples/fixtures/prefix_cache_shared_session_trace.csv` |
| `pdd/online/dense_model_basic_online.sh` | Online dense PDD baseline | Mirrors dense offline settings with `--simulation_mode online` |
| `pdd/online/moe_model_basic_online.sh` | Online MoE PDD baseline | Mirrors MoE offline settings with `--simulation_mode online` |
| `pdd/online/thinking_mode_basic_online.sh` | Online Thinking Mode v1 PDD | Mirrors Thinking Mode offline settings with `--simulation_mode online` |
| `pdd/online/moe_spec_dec_online.sh` | Online MoE PDD Speculative Decoding / MTP | Mirrors Speculative Decoding offline settings with `--simulation_mode online` |
| `pdd/online/moe_prefix_caching_online.sh` | Online MoE PDD Prefix Caching | Replays the same prefix-cache fixture with `--simulation_mode online` |

## PDD Configuration Contract

All PDD scripts use these release-supported defaults unless overridden from the shell:

- `--sys_arch pd-disaggregation`
- `--no-enable_parallel_clusters`
- explicit `PREFILL` and unified `DECODE` cluster settings
- `--cc_backend_config_type analytical`
- dummy execution-time prediction enabled by default
- CSV/JSON metrics enabled by default through `--metrics_config_write_metrics` and `--metrics_config_store_request_metrics`
- plots, Chrome trace, and JSON event trace disabled for lightweight one-click artifacts

MoE PDD scripts also enforce the shared-domain invariant before launching Frontier:

```text
PREFILL_ATTN_TP * PREFILL_ATTN_DP == PREFILL_MOE_TP * PREFILL_MOE_EP
DECODE_ATTN_TP * DECODE_ATTN_DP == DECODE_MOE_TP * DECODE_MOE_EP
```

This fail-fast check prevents known non-runnable MoE topology combinations from entering the simulator.

## Thinking Mode v1

Expand All @@ -34,10 +74,29 @@ The Thinking Mode examples use:
- `--cc_backend_config_type analytical` so the one-click smoke run works on a minimal single-replica layout
- CSV/JSON metrics enabled by default, with plots, Chrome trace, and JSON event trace disabled for lightweight artifacts

Under PDD, one user request can produce multiple prefill-to-decode KV handoffs. The default Thinking Mode smoke case completes one request and records two KV transfers.

## Recommended Start Order

```bash
# Full suite.
# Full PDD suite for pre-release-v0.2.
bash examples/architecture/pdd/run_all.sh

# PDD offline cases.
bash examples/architecture/pdd/offline/dense_model_basic.sh
bash examples/architecture/pdd/offline/moe_model_basic.sh
bash examples/architecture/pdd/offline/thinking_mode_basic.sh
bash examples/architecture/pdd/offline/moe_spec_dec.sh
bash examples/architecture/pdd/offline/moe_prefix_caching.sh

# PDD online cases.
bash examples/architecture/pdd/online/dense_model_basic_online.sh
bash examples/architecture/pdd/online/moe_model_basic_online.sh
bash examples/architecture/pdd/online/thinking_mode_basic_online.sh
bash examples/architecture/pdd/online/moe_spec_dec_online.sh
bash examples/architecture/pdd/online/moe_prefix_caching_online.sh

# Full co-location comparison suite.
bash examples/architecture/co-location/run_all.sh

# Offline cases.
Expand All @@ -55,7 +114,7 @@ bash examples/architecture/co-location/online/moe_spec_dec_online.sh
bash examples/architecture/co-location/online/moe_prefix_caching_online.sh
```

Use the baseline scripts first, then use the Speculative Decoding / MTP and Prefix Caching recipes as advanced cases.
Use the dense baseline scripts first, then use the Thinking Mode, Speculative Decoding / MTP, and Prefix Caching recipes as advanced cases.

## Cross-validation Criteria

Expand All @@ -66,3 +125,11 @@ For each offline/online pair:
3. Record expected request count, actual request rows, completed request rows, total input tokens, total output tokens, mean TTFT, mean latency, and request throughput when present.
4. Confirm offline outputs include the `offline_batch` taxonomy segment and online outputs include `online_serving`.
5. Treat latency differences as expected when online mode preserves request arrival times; investigate only if counts, token totals, output files, or finite numeric metrics diverge unexpectedly.

For every PDD script, the release gate should additionally record:

1. The script exits with code `0`.
2. `request_metrics.csv` and `system_metrics.json` exist in the metrics output directory.
3. Request row count, `total_requests`, and `completed_requests` match the expected case size.
4. KV transfer count, total KV bytes, and KV transfer time are present and positive.
5. Request-level `ttft`, `tpot`, `request_e2e_time`, and `transfer_kv_cache` are finite and positive.
16 changes: 16 additions & 0 deletions examples/architecture/pdd/dense_model_basic.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash
# =============================================================================
# Compatibility entrypoint for the PDD dense offline example
# =============================================================================
# The canonical pre-release-v0.2 PDD example layout is:
# examples/architecture/pdd/offline/dense_model_basic.sh
# examples/architecture/pdd/online/dense_model_basic_online.sh
#
# Keep this top-level script as a backward-compatible alias for users who ran
# the early PDD dense smoke entrypoint before the offline/online split.
# =============================================================================

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
exec bash "$SCRIPT_DIR/offline/dense_model_basic.sh" "$@"
Loading