mohossam01 · mohossam01 · May 16, 2026 · May 16, 2026 · May 16, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,32 @@ Versioning: [SemVer](https://semver.org/spec/v2.0.0.html).
 
 ## [Unreleased]
 
+### Changed
+
+- **Bundled template catalog refreshed.** `plotsim.list_templates()`
+  now returns exactly six domain templates: `banking`, `health`,
+  `hr`, `marketing`, `retail`, `saas`. Each is schema-realistic
+  (real column topology and FK shapes for the domain), output-
+  realistic (pool, range, distribution, correlation, and seasonality
+  choices match the domain's real data shape), and feature-deep —
+  every template exercises SCD2, lifecycle stages, 3+ correlations,
+  causal lag, seasonality, and 2 event tables; CDC on the relevant
+  fact for each domain; per-metric treatment cohorts on `marketing`
+  and `health`; bridge tables on `hr`, `retail`, `banking`, and
+  `health`; parent/child fact grain on `retail`, `banking`, and
+  `health`; cross-fact FK on `retail` and `health`; geo bundle on
+  `retail`, `banking`, and `health`; narrative columns on `hr`,
+  `retail`, `banking`, and `health`; heteroscedastic noise on
+  `saas` and `health`; student-t noise on `banking`; holdout splits
+  on `banking` and `health`; sub-entity dim on `saas`; multi-locale
+  on `retail`. The previous catalog of fourteen mixed-purpose
+  templates — `ab_trial`, `bare_minimum`, `cdc_demo`,
+  `crm_billing_overlap`, `education`, `geo_retail`, `lakehouse`,
+  `latency_skew`, `narrative_reviews`, `orders` — has been demoted
+  from public surface: the feature-vehicle YAMLs and `.py`
+  companions for each now live under `tests/configs/` and continue
+  to power the existing feature-coverage test files unchanged.
+
 ### Added
 
 - **Manifest decomposition + regression sections.** The manifest sidecar

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -41,7 +41,7 @@ your config shape:
 
 Steps:
 
-1. Copy an existing template (e.g. `saas_template.yaml` +
+1. Copy an existing template (e.g. `saas.yaml` +
    `saas_template.py`) as a starting point.
 2. Edit metrics, segments / archetypes, dimensions, facts, events, and
    any feature-specific blocks for the new use case.

diff --git a/docs/site/api-reference.md b/docs/site/api-reference.md
@@ -131,12 +131,10 @@ Return the names of bundled builder templates.
 def list_templates() -> list[str]
 ```
 
-Names round-trip through [`load_template`](#load_template). Templates
-whose filename ends in `_template` strip that suffix; `bare_minimum`
-and the single-feature templates keep their full stems. Sorted
-alphabetically.
+Names round-trip through [`load_template`](#load_template). The bundled
+catalog covers six domains, sorted alphabetically.
 
-**Returns** — e.g. `["ab_trial", "bare_minimum", "cdc_demo", "crm_billing_overlap", "education", "geo_retail", "hr", "lakehouse", "latency_skew", "marketing", "narrative_reviews", "retail", "saas"]`.
+**Returns** — `["banking", "health", "hr", "marketing", "retail", "saas"]`.
 
 **Example**
 

diff --git a/docs/site/column-types.md b/docs/site/column-types.md
@@ -144,7 +144,9 @@ Output dtype is `float` for `latitude` / `longitude` and `string`
 for everything else. `geo.<field>` is dim-only; on facts and
 events the engine raises `unsupported generated provider`. See
 [Geo hierarchy](./user-guide/geo-hierarchy.md) for the underlying
-dataset, determinism, and the bundled `geo_retail` template.
+dataset, determinism, and the `tests/configs/geo_retail.yaml`
+worked example; the bundled `retail`, `banking`, and `health`
+domain templates each put a geo bundle on their customer/patient dim.
 
 ---
 
@@ -182,8 +184,10 @@ builder API). `narrative` is fact-only and per_entity_per_period;
 the cell builder forces the scalar fact path because it consumes one
 RNG draw per slot per row. See
 [Narrative text source](./user-guide/narrative-source.md) for the
-lexicon-design playbook, validation gates, and the bundled
-`narrative_reviews` template.
+lexicon-design playbook, validation gates, and the
+`tests/configs/narrative_reviews.yaml` worked example; narrative
+columns also ship on the bundled `hr`, `retail`, `banking`, and
+`health` domain templates.
 
 ---
 

diff --git a/docs/site/config-reference.md b/docs/site/config-reference.md
@@ -725,7 +725,7 @@ output:
 |---|---|---|---|
 | `format` | `"csv"` / `"parquet"` / `"jsonl"` / `"sql"` | `"csv"` | `parquet` requires `pip install plotsim[parquet]` (pyarrow) and produces typed binary files ~5–10× smaller than CSV. `jsonl` writes newline-delimited JSON (one self-contained object per row) for streaming-ingestion / schema-on-read consumers. `sql` writes a single `data.sql` file with dialect-aware DDL + batched INSERTs instead of per-table files |
 | `directory` | `str` | `"output"` | Where `write_tables` writes. Override at call time with `write_tables(..., output_dir=...)` |
-| `cell_budget` | `int ≥ 0` / `null` | `null` | Soft cell-count cap consumed by the load-time scale estimator. `null` falls through to `PLOTSIM_CELL_BUDGET` env var, then to the 2,000,000 default. `0` disables the soft cap entirely. See [Cell-count budget](#cell-count-budget) for precedence and the bundled `lakehouse` template for a worked example |
+| `cell_budget` | `int ≥ 0` / `null` | `null` | Soft cell-count cap consumed by the load-time scale estimator. `null` falls through to `PLOTSIM_CELL_BUDGET` env var, then to the 2,000,000 default. `0` disables the soft cap entirely. See [Cell-count budget](#cell-count-budget) for precedence and `tests/configs/lakehouse.yaml` for a worked example |
 | `denormalized` | `bool` | `false` | Opt-in wide-table companion writer. When `true`, every fact table is left-joined with its FK'd dims (SCD2 dims filtered to current state) and emits `<fct>_wide.<ext>` alongside the normalized output. Under `format: sql` the wide tables emit as trailing blocks inside `data.sql` instead of separate files |
 | `partition_by` | `str` / `null` | `null` | Column name to partition Parquet output on. When set, every table that carries the column is written as a Hive-style directory (`<output_dir>/<table>/<col>=<value>/...`) via `pyarrow.parquet.write_to_dataset`. Tables without the column fall back to single files. Requires `format: parquet`; cross-validated at config load |
 | `sql_dialect` | `"postgresql"` / `"mysql"` / `"sqlite"` | `"postgresql"` | Dialect for the SQL dump writer — selects identifier quoting (`"col"` for PG/SQLite, `` `col` `` for MySQL), type words (PG `NUMERIC` / MySQL `DOUBLE` + `VARCHAR(255)` for string PKs / SQLite `REAL`), and boolean encoding. The default round-trips under any format; explicit `mysql` / `sqlite` requires `format: sql` (cross-validated at config load) |
@@ -894,8 +894,8 @@ precedence order (the first one that resolves wins):
 1. **Config field (recommended)** — set `output.cell_budget: N` in
    the YAML (or pass `output={"cell_budget": N}` to `create()`).
    Reproducible from the config alone — no env vars or flags
-   required, which is the contract the bundled `lakehouse`
-   template relies on.
+   required, which is the contract the `tests/configs/lakehouse.yaml`
+   worked example relies on.
 2. **Environment variable** — `PLOTSIM_CELL_BUDGET=N` sets the
    soft cap to `N` cells when no config field is set.
 3. **Default** — `2,000,000` cells.

diff --git a/docs/site/cookbook/data-engineering.md b/docs/site/cookbook/data-engineering.md
@@ -56,8 +56,9 @@ whichever fits your workflow.
 
     Or skip the YAML round-trip entirely — the
     [`saas_template.py`](https://github.com/mohossam01/plotsim/blob/main/plotsim/configs/templates/saas_template.py)
-    bundled with plotsim shows the same template authored as
-    `create(**kwargs)` directly.
+    bundled with plotsim shows the same SaaS template authored as
+    `create(**kwargs)` directly, paired with `saas.yaml` in the
+    same directory.
 
 Pin `seed:` in the YAML (or pass `seed=42` to `create`) and the fixture
 is byte-stable across CI runs.
@@ -351,8 +352,8 @@ in the config (recommended; reproducible from YAML alone),
 `output.cell_budget: 0` (or `PLOTSIM_CELL_BUDGET=0`) disables the soft
 cap entirely; only the `50,000,000`-cell hard ceiling still applies.
 See [Limits](../config-reference.md#limits-and-performance-gates) for
-the full ladder and the bundled `lakehouse` template for a worked
-example of a 1.5M-cell config.
+the full ladder; `tests/configs/lakehouse.yaml` in the repo is a
+worked example of a config near the 1.5M-cell range.
 
 ---
 

diff --git a/docs/site/cookbook/data-science.md b/docs/site/cookbook/data-science.md
@@ -50,8 +50,9 @@ multi-metric dataset with archetype ground truth.
     ```
 
 The [`saas_template.py`](https://github.com/mohossam01/plotsim/blob/main/plotsim/configs/templates/saas_template.py)
-companion shows the same template authored as a `create(**kwargs)`
-call — every YAML field maps 1-1 to a Python keyword.
+companion (paired with `saas.yaml` in the same directory) shows the
+same SaaS template authored as a `create(**kwargs)` call — every YAML
+field maps 1-1 to a Python keyword.
 
 ---
 
@@ -228,9 +229,9 @@ time, not just larger ones.
 
 All six builder distribution families (`lognorm`, `gamma`,
 `weibull`, `beta`, `normal`, `poisson`) are pinnable the same way
-via `MetricInput.distribution` + `distribution_params`. The bundled
-`latency_skew` template (`plotsim template latency_skew`) exercises
-all six on a single config. Full mechanics:
+via `MetricInput.distribution` + `distribution_params`. The
+`tests/configs/latency_skew.yaml` worked example exercises all six
+on a single config. Full mechanics:
 [`metrics-and-connections.md` §pinning the distribution explicitly](../user-guide/metrics-and-connections.md#pinning-the-distribution-explicitly).
 
 ---

diff --git a/docs/site/feature-reference.md b/docs/site/feature-reference.md
@@ -22,7 +22,7 @@ Three surfaces today:
 |---|---|---|
 | Library | `plotsim.create`, `create_from_yaml`, `generate_tables`, `write_tables` | Python users in an IDE or notebook |
 | CLI | `plotsim run`, `validate`, `info`, `template`, `schema` | Terminal, CI, scripts |
-| YAML | bundled templates: `ab_trial`, `bare_minimum`, `cdc_demo`, `crm_billing_overlap`, `education`, `geo_retail`, `hr`, `lakehouse`, `latency_skew`, `marketing`, `narrative_reviews`, `retail`, `saas` | Anyone who wants to hand-edit a config |
+| YAML | bundled domain templates: `banking`, `health`, `hr`, `marketing`, `retail`, `saas` | Anyone who wants to hand-edit a config |
 
 ---
 
@@ -39,7 +39,7 @@ integrity / provenance tooling.
 |---|---|---|
 | Trajectory-first metric generation | Every metric for an entity at time *t* is derived from one archetype-curve position | `generate_tables(cfg)` |
 | Determinism | Single seeded `numpy.random.Generator` flows through every random draw | YAML `seed:` (integer) |
-| Cell-budget scale gate | Soft pre-flight guard that aborts runs above the configured cell ceiling. Precedence: `output.cell_budget` field > `PLOTSIM_CELL_BUDGET` env > 2M default; `0` disables. Bundled template `lakehouse` exercises a 1.5M-cell config. | YAML `output.cell_budget: <int>`; env override `PLOTSIM_CELL_BUDGET` / `PLOTSIM_ALLOW_LARGE_DATASET` |
+| Cell-budget scale gate | Soft pre-flight guard that aborts runs above the configured cell ceiling. Precedence: `output.cell_budget` field > `PLOTSIM_CELL_BUDGET` env > 2M default; `0` disables. `tests/configs/lakehouse.yaml` is a worked example near the 1.5M-cell range. | YAML `output.cell_budget: <int>`; env override `PLOTSIM_CELL_BUDGET` / `PLOTSIM_ALLOW_LARGE_DATASET` |
 
 #### Tables emitted
 
@@ -114,7 +114,7 @@ is no longer byte-identical to a pre-flag run of the same file.
 |---|---|---|
 | Lifecycle stages | Per-entity stage sequence with stage-specific archetype overrides | YAML `lifecycle:` |
 | Cohort arrival distribution | Per-segment entity arrival shape — `uniform` / `linear` / `step` / `explicit` — driving `Entity.start_period`, so the entity body grows or contracts across the window. Cold-start cells are NaN-filled and dropped pre-write. Validator enforces every entity has ≥2 active periods. | builder kwarg `arrival:` on segments (4-shape discriminated union); YAML `Entity.start_period` directly |
-| Treatment / control cohorts | Per-entity treatment assignment with a logit-shift on trajectory position from `treatment_start_period` onward (`treatment_lift_log_odds`). Known effect → A/B test analysis, uplift modeling, causal inference. Manifest carries `TreatmentAssignment` per entity + `TreatmentCohort` per segment. Bundled template `ab_trial`. | YAML `Entity.treatment_group` / `treatment_lift_log_odds` / `treatment_start_period` |
+| Treatment / control cohorts | Per-entity treatment assignment with a logit-shift on trajectory position from `treatment_start_period` onward (`treatment_lift_log_odds`). Known effect → A/B test analysis, uplift modeling, causal inference. Manifest carries `TreatmentAssignment` per entity + `TreatmentCohort` per segment. Demonstrated on bundled `marketing` and `health` (per-metric lifts) and `banking` (whole-trajectory lift); `tests/configs/ab_trial.yaml` is the dedicated worked example. | YAML `Entity.treatment_group` / `treatment_lift_log_odds` / `treatment_start_period` |
 
 ### 6. Dim columns + fact-grain text — fill non-metric cells with realistic content
 
@@ -124,20 +124,20 @@ is no longer byte-identical to a pre-flag run of the same file.
 | Faker-backed text + identifiers | PII-shape providers wired into the engine: `name`, `email`, `phone_number`, `company`, `address`, `postcode`, `country`, `city`, `latitude`, `longitude`, `sentence`. Deterministic under the run seed. Useful for masking exercises and regex-validation scenarios; **does not read entity, archetype, or trajectory** (each call is an independent draw). |
 | Range source | `type: range` with `range: [min, max]` on fact / event columns produces a per-row uniform draw between the bounds. Integer bounds → `dtype: int` and inclusive upper bound; float bounds → `dtype: float` and exclusive upper bound (numpy conventions). Use it for `quantity ∈ [1, 5]`, `unit_price ∈ [10.0, 500.0]`, and similar shape constraints that `faker.random_int` / `faker.pyfloat` express less precisely. Deterministic under seed. |
 | Pool source on facts and events | `type: pool.<attribute>` lifts the per-entity value pool (previously dim-only) onto per_entity_per_period facts, variable-grain facts, per_parent_row child facts, and event tables. Every row resolves to its entity's segment, then draws uniformly from `attributes[<attr>]` — so a `loyal` cohort customer's `channel` always lands in `[app, web]` while a `casual` customer's lands in `[sms, email]`. Per_period facts (the `dim_date`-style grain) remain out of scope — those rows have no per-row entity binding. |
-| Narrative text source (trajectory-aware) | Per-archetype lexicons + a sentence template rendered into a `narrative` column on a fact table. Output vocabulary tracks the entity's trajectory position (a high-position `growth` entity produces systematically different text than a low-position `decline` entity); a simple bag-of-words classifier hits ≥0.55 accuracy on archetype prediction. Deterministic under seed; preserves the trajectory-first invariant. **Fact-only** (rejected on dim / event tables at config load). **Performance:** forces the scalar fact builder path (~3-10× slower than vectorized metric-only facts), so keep narrative on tables that genuinely need text. Bundled template `narrative_reviews`. See [Narrative source](./user-guide/narrative-source.md). |
+| Narrative text source (trajectory-aware) | Per-archetype lexicons + a sentence template rendered into a `narrative` column on a fact table. Output vocabulary tracks the entity's trajectory position (a high-position `growth` entity produces systematically different text than a low-position `decline` entity); a simple bag-of-words classifier hits ≥0.55 accuracy on archetype prediction. Deterministic under seed; preserves the trajectory-first invariant. **Fact-only** (rejected on dim / event tables at config load). **Performance:** forces the scalar fact builder path (~3-10× slower than vectorized metric-only facts), so keep narrative on tables that genuinely need text. Demonstrated on bundled `hr`, `retail`, `banking`, `health`; `tests/configs/narrative_reviews.yaml` is the dedicated lexicon-design walkthrough. See [Narrative source](./user-guide/narrative-source.md). |
 
 ### 7. Audit + downstream-pipeline outputs
 
 | Feature | Behavior |
 |---|---|
 | SCD Type 2 | `dim_<entity>` expanded to N×versions with `valid_from_period` and band-crossing events surfaced in the manifest |
 | SCD Type 1 | default (no-op) |
-| Fact-side CDC | `facts[].cdc: true` emits `_inserted_at` / `_updated_at` / `_op` audit columns; column-level quality issues flip `_op` to `"U"` on affected rows. Demonstrated in `cdc_demo` (dedicated) and `retail` (realistic POS purchase ledger). |
+| Fact-side CDC | `facts[].cdc: true` emits `_inserted_at` / `_updated_at` / `_op` audit columns; column-level quality issues flip `_op` to `"U"` on affected rows. Demonstrated on bundled `saas` (revenue restatement), `marketing` (spend attribution), `retail` (purchase ledger), `banking` (loan disbursement), `health` (encounter chart amendment); `tests/configs/cdc_demo.yaml` is the dedicated minimal walkthrough. |
 | Holdout splits | `output.holdout: {fraction\|periods}` writes `{table}_train.<csv\|parquet>` + `{table}_holdout.<csv\|parquet>` instead of one file per fact, split by period index |
 | Denormalization | `output.denormalized: true` joins each fact with its FK'd dims (SCD2 current-only, audit columns excluded, dim columns prefixed `<dim>__<col>`); emits `<fct>_wide.{csv\|parquet}` alongside normalized output for 1NF–3NF decomposition exercises. Demonstrated in `saas`. |
-| Log-file writer | Event tables with `log_format: "{ts} ... "` + `log_filename: "..."` emit a structured `.log` file alongside the CSV/Parquet event table. Format string is `template.format(**row.to_dict())` per row; unknown placeholders raise. Demonstrated in `saas` (`evt_login` as syslog-flavoured lines). |
-| Multi-source / overlap | `multi_source:` block emits per-source dim copies with controlled drift (casing / abbreviation / swap) and per-source key schemes; `source_entity_mappings` ground truth in the manifest. Demonstrated in `crm_billing_overlap` (CRM + billing dual-source, 40 mapping records). |
-| Nested / JSON columns | `dtype: struct` (with `nested_schema`) or `dtype: array` (with `array_element_type`) paired with `source: nested` on dim columns. Parquet preserves native nested schema (`pa.struct(...)`); CSV serializes as JSON string. Dim-only, one level of nesting, primitive leaves in V1. Demonstrated in `retail` (`dim_product_category.catalog_metadata`). |
+| Log-file writer | Event tables with `log_format: "{ts} ... "` + `log_filename: "..."` emit a structured `.log` file alongside the CSV/Parquet event table. Format string is `template.format(**row.to_dict())` per row; unknown placeholders raise. `tests/configs/saas_template.yaml` (`evt_login` as syslog-flavoured lines) is the worked example. |
+| Multi-source / overlap | `multi_source:` block emits per-source dim copies with controlled drift (casing / abbreviation / swap) and per-source key schemes; `source_entity_mappings` ground truth in the manifest. `tests/configs/crm_billing_overlap.yaml` is the worked example (CRM + billing dual-source, 40 mapping records). |
+| Nested / JSON columns | `dtype: struct` (with `nested_schema`) or `dtype: array` (with `array_element_type`) paired with `source: nested` on dim columns. Parquet preserves native nested schema (`pa.struct(...)`); CSV serializes as JSON string. Dim-only, one level of nesting, primitive leaves in V1. `tests/configs/retail_template.yaml` (`dim_product_category.catalog_metadata`) is the worked example. |
 
 ### 8. Validation, manifest, and provenance (advanced)
 
@@ -223,7 +223,7 @@ convenience shapes:
 - `window=("2024-01", "2024-12", "monthly")` shorthand.
 
 Templates: `plotsim.list_templates()` →
-`["ab_trial", "bare_minimum", "cdc_demo", "crm_billing_overlap", "education", "geo_retail", "hr", "lakehouse", "latency_skew", "marketing", "narrative_reviews", "retail", "saas"]`.
+`["banking", "health", "hr", "marketing", "retail", "saas"]`.
 `plotsim.load_template("saas")` returns a `PlotsimConfig` ready to mutate
 or pass to `generate_tables`.