Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,24 @@ Versioning: [SemVer](https://semver.org/spec/v2.0.0.html).

### Added

- **Per-metric treatment effects.** New optional `target_metric` field
on the treatment surface (set on `SegmentInput.treatment` in the
builder; mirrored as `Entity.treatment_target_metric` in the engine).
When set, the configured `treatment_lift_log_odds` applies only to
the named metric's evaluation for treatment-arm entities — every
other metric in the same period is drawn identically to its
control-arm counterpart. Lets a single intervention be modelled as
shifting one outcome (e.g. revenue) while leaving the rest of the
metric set as a placebo. Default `null` preserves the prior
trajectory-wide behaviour byte-for-byte. Config-time validation
rejects target names that don't match any declared metric.
Correlated metrics do not inherit the lift via the copula: the
residual transform is centred on each metric's own (un-shifted)
centre, so the targeted metric shifts and the correlated metric
stays at its control distribution. Manifest schema bumps 1.8 → 1.9
with the additive `target_metric` field on the per-entity
`treatment` and per-cohort `treatment_cohorts` records.

- **Heteroscedastic gaussian noise.** Optional `scale_with_trajectory`
flag on `NoiseConfig` (mirror on the builder's `NoiseInput`). When
`true`, each cell's gaussian standard deviation becomes
Expand Down
2 changes: 1 addition & 1 deletion docs/site/manifest-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ produces a byte-identical `manifest.json`. Encoding: UTF-8,

| Field | Type | Description |
|---|---|---|
| `schema_version` | `str` | Wire-shape version. Currently `"1.8"` (bumped over time as new additive sections — `causal_graph`, `correlations`, `outlier_injections`, multi-source mappings, `parent_child_relations`, `noise_config` — landed; 1.7 → 1.8 extended `noise_config` with `noise_family` / `degrees_of_freedom`) |
| `schema_version` | `str` | Wire-shape version. Currently `"1.9"` (bumped over time as new additive sections — `causal_graph`, `correlations`, `outlier_injections`, multi-source mappings, `parent_child_relations`, `noise_config` — landed; 1.7 → 1.8 extended `noise_config` with `noise_family` / `degrees_of_freedom`; 1.8 → 1.9 added the optional `target_metric` field on the per-entity `treatment` and per-cohort `treatment_cohorts` records) |
| `seed` | `int` | The seed used for generation — `config.seed` |
| `config_sha256` | `str` | Full SHA-256 hex of the JSON-serialized config. Detects config drift between generation and consumption |
| `archetype_assignments` | array | One entry per entity; see below |
Expand Down
48 changes: 45 additions & 3 deletions docs/site/user-guide/experiments-and-cohorts.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,7 @@ segments:
start_period: 6 # rollout date
treatment_label: new_onboarding
control_label: original_onboarding
target_metric: mrr # optional — see below
```

### What the lift does
Expand All @@ -188,6 +189,36 @@ behaviour: a `+0.5` lift moves `p=0.5` to ~0.62, but only moves
`p=0.9` to ~0.94. Same intervention, less impact when the metric is
already near saturation.

### Targeting a single metric (`target_metric`)

By default the lift applies to **every** metric for the treatment arm —
useful when modelling an intervention that shifts overall trajectory
position (a global engagement boost, a churn-reduction programme).
Add `target_metric: <metric_name>` to restrict the lift to a single
named metric. Every other metric in the same period is drawn
identically to the control arm, even for entities in the treatment
cohort.

```yaml
treatment:
fraction: 0.5
lift_log_odds: 0.6
start_period: 6
target_metric: mrr # only mrr shifts; engagement, churn_risk, etc. stay flat
```

Use the targeted form when the experimental hypothesis names one
outcome metric ("the pricing experiment lifts revenue, not
engagement"), or when you want a placebo metric in the dataset whose
mean must be statistically identical across arms. Omit it for a
trajectory-wide intervention.

Correlated metrics: if `target_metric` names a metric that participates
in a `connections` correlation, the copula still operates on residuals
around each metric's own (un-shifted) centre — so the lift does **not**
propagate to the correlated metric's mean. The targeted metric shifts,
the correlated metric stays at its control distribution.

### Pre-treatment baseline

At `period_index < treatment_start_period`, the shift is `0.0` for
Expand Down Expand Up @@ -218,22 +249,33 @@ changing one feature's shape doesn't shift another feature's outputs.

### Manifest

Two new manifest fields land at schema version `1.5`:
Two manifest fields surface treatment ground-truth:

- `EntityArchetypeAssignment.treatment` — per-entity assignment
record. Carries the entity's group label, lift (or `None` for
control), and `start_period`. `null` for entities with no treatment
control), `start_period`, and `target_metric` (`null` for the
trajectory-wide default). `null` for entities with no treatment
fields set.
- `ManifestSchema.treatment_cohorts` — aggregate per-cohort records.
One entry per distinct `treatment_group` label. Reports the cohort
size, mean lift, and modal `start_period`.
size, mean lift, modal `start_period`, and modal `target_metric`
(`null` when every entity in the cohort uses the trajectory-wide
default).

The `target_metric` field on both records is additive; manifests
emitted for configs that do not set `target_metric` keep the field
`null`, so older readers continue to parse cleanly.

### Validator

Rejected at config load:

- `treatment_start_period >= n_periods` (the lift would never apply).
- `treatment_lift_log_odds = ±inf` or `nan` (would propagate NaN cells).
- `target_metric` set to a name that doesn't match any declared metric.
Without this check a typo would silently fall through the per-metric
gate (no metric matches → the lift is never applied) and the
treatment would be invisible in the generated data.

**NOT** rejected (intentionally):

Expand Down
12 changes: 12 additions & 0 deletions plotsim-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -560,6 +560,18 @@
"minimum": 0,
"title": "Treatment Start Period",
"type": "integer"
},
"treatment_target_metric": {
"anyOf": [
{
"type": "string"
},
{
"type": "null"
}
],
"default": null,
"title": "Treatment Target Metric"
}
},
"required": [
Expand Down
10 changes: 10 additions & 0 deletions plotsim/builder/input.py
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,15 @@ class TreatmentConfig(BaseModel):
positions — the AC for "pre-treatment baseline is identical".
* ``treatment_label`` / ``control_label`` — cohort labels for the
manifest. Defaults match the conventional A/B labelling.
* ``target_metric`` — optional name of a single metric. When set,
the lift only affects that metric's effective-position
evaluation; every other metric is byte-identical to its
control-arm draw. ``None`` (default) = trajectory-wide
application (every metric sees the lift, the pre-M24
behaviour). The interpreter copies this value onto every
expanded entity in the segment (treatment AND control arms)
for ground-truth symmetry; control entities have no lift to
gate, so the field is harmless there.

RNG isolation: the interpreter draws treatment assignments from a
distinct ``np.random.default_rng(seed ^ TREATMENT_SALT)`` stream,
Expand All @@ -404,6 +413,7 @@ class TreatmentConfig(BaseModel):
start_period: int = Field(default=0, ge=0)
treatment_label: str = Field(default="treatment", min_length=1)
control_label: str = Field(default="control", min_length=1)
target_metric: Optional[str] = None

@field_validator("lift_log_odds")
@classmethod
Expand Down
3 changes: 3 additions & 0 deletions plotsim/builder/interpreter.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,6 +487,9 @@ def _build_archetypes_and_entities(
treatment_start_period=(
s.treatment.start_period if s.treatment is not None else 0
),
treatment_target_metric=(
s.treatment.target_metric if s.treatment is not None else None
),
)
)

Expand Down
9 changes: 9 additions & 0 deletions plotsim/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -1332,6 +1332,14 @@ class Entity(_Frozen):
# out a baseline window where treatment and control entities share
# identical metric distributions (the AC for "pre-treatment baseline
# is identical across groups").
# * ``treatment_target_metric`` — optional name of a single metric.
# When set, the logit shift only applies to that metric's
# effective-position evaluation; every other metric in the same
# period sees ``treatment_shift=0.0`` and is byte-identical to its
# control-arm draw. ``None`` (default) = trajectory-wide application
# (every metric sees the lift, the pre-M24 behaviour). The named
# metric must exist in ``config.metrics``; the validator
# ``validate_treatment_assignments`` enforces this at load time.
#
# The label is decoupled from the lift so a "control" entity can carry
# a label without applying a shift, AND so the user can opt out of
Expand All @@ -1342,6 +1350,7 @@ class Entity(_Frozen):
treatment_group: Optional[str] = None
treatment_lift_log_odds: Optional[float] = None
treatment_start_period: int = Field(default=0, ge=0)
treatment_target_metric: Optional[str] = None


class FKDistribution(_Frozen):
Expand Down
48 changes: 43 additions & 5 deletions plotsim/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,15 @@
# records the realized noise family whenever it diverges from the
# historical lane, not only when heteroscedastic amplitude is on.
# Default-family default-amplitude runs still emit ``noise_config=None``.
MANIFEST_SCHEMA_VERSION = "1.8"
# 0.6-M24: bumped 1.8 → 1.9 for the additive ``target_metric`` field on
# both ``TreatmentAssignment`` (per-entity) and ``TreatmentCohort``
# (per-cohort). Defaults to ``None`` (trajectory-wide lift, the pre-M24
# behaviour), so configs without per-metric targeting emit a 1.9
# manifest byte-equivalent to 1.8 modulo the schema version string and
# the new field's null default. Populated when the entity's
# ``treatment_target_metric`` names a metric — the lift then applies
# only to that metric's effective-position evaluation.
MANIFEST_SCHEMA_VERSION = "1.9"


class _ManifestBase(BaseModel):
Expand Down Expand Up @@ -151,7 +159,7 @@ class ActiveWindow(_ManifestBase):
class TreatmentAssignment(_ManifestBase):
"""0.6-M8c: an entity's treatment / control assignment.

Three fields, all sourced from the matching ``Entity`` fields:
Four fields, all sourced from the matching ``Entity`` fields:

* ``group`` — the cohort label (e.g. ``"treatment"`` /
``"control"``). Plotsim treats it as opaque metadata.
Expand All @@ -160,16 +168,23 @@ class TreatmentAssignment(_ManifestBase):
* ``start_period`` — the absolute period index at which the lift
kicks in. Pre-treatment periods (``period_index < start_period``)
see the same trajectory as the control arm.
* ``target_metric`` — M24 per-metric targeting. ``None`` means
the lift applies trajectory-wide (every metric sees it, the
pre-M24 default); a metric name restricts the lift to that
metric only. Carried on both treatment and control entities
for ground-truth symmetry — control arms have no lift to gate.

Emitted only for entities with at least one treatment field set.
Default-only entities (no group label, no lift, no start period) get
``treatment=None`` on their ``EntityArchetypeAssignment`` so the
M8c manifest field is invisible to non-A/B test datasets.
Default-only entities (no group label, no lift, no start period,
no target metric) get ``treatment=None`` on their
``EntityArchetypeAssignment`` so the M8c manifest field is
invisible to non-A/B test datasets.
"""

group: Optional[str]
lift_log_odds: Optional[float]
start_period: int
target_metric: Optional[str] = None


class EntityArchetypeAssignment(_ManifestBase):
Expand Down Expand Up @@ -211,12 +226,22 @@ class TreatmentCohort(_ManifestBase):
for the cohort. Most A/B tests use one start period per cohort,
so this is the headline value; if the cohort has heterogeneous
starts (rare, but supported), pick the most common.
* ``target_metric`` — M24 per-metric targeting. ``None`` when
every entity in the cohort applies the lift trajectory-wide
(the pre-M24 default), or when no entity in the cohort
declares a target metric. Otherwise the modal target metric
across the cohort — heterogeneous cohorts (rare; segments
normally map 1:1 to cohort labels and carry one
``TreatmentConfig.target_metric``) report their most-common
value and downstream consumers can cross-reference per-entity
records for outliers.
"""

label: str
n_entities: int
mean_lift_log_odds: Optional[float]
start_period: int
target_metric: Optional[str] = None


class TrajectorySample(_ManifestBase):
Expand Down Expand Up @@ -918,12 +943,14 @@ def _treatment_assignment_for(entity: Any) -> Optional[TreatmentAssignment]:
entity.treatment_group is None
and entity.treatment_lift_log_odds is None
and entity.treatment_start_period == 0
and entity.treatment_target_metric is None
):
return None
return TreatmentAssignment(
group=entity.treatment_group,
lift_log_odds=entity.treatment_lift_log_odds,
start_period=entity.treatment_start_period,
target_metric=entity.treatment_target_metric,
)


Expand Down Expand Up @@ -970,12 +997,23 @@ def _build_treatment_cohorts(entities: list) -> list[TreatmentCohort]:

starts = Counter(m.treatment_start_period for m in members)
modal_start = starts.most_common(1)[0][0]
# M24: modal target_metric across the cohort. ``None`` when no
# member declares one (the pre-M24 default — trajectory-wide
# lift). Counted across non-None values only; if every member
# has ``treatment_target_metric=None`` the cohort reports
# ``None`` (trajectory-wide), matching the pre-M24 manifest
# shape for that cohort.
targets = Counter(
m.treatment_target_metric for m in members if m.treatment_target_metric is not None
)
modal_target: Optional[str] = targets.most_common(1)[0][0] if targets else None
cohorts.append(
TreatmentCohort(
label=label,
n_entities=len(members),
mean_lift_log_odds=mean_lift,
start_period=modal_start,
target_metric=modal_target,
)
)
return cohorts
Expand Down
22 changes: 21 additions & 1 deletion plotsim/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1268,6 +1268,7 @@ def generate_metrics_for_period(
seasonal_global: float = 0.0,
entity_seasonal_sensitivity: float = 1.0,
treatment_shift: float = 0.0,
treatment_target_metric: Optional[str] = None,
) -> dict[str, Optional[float]]:
"""Generate every metric for one entity at one time step.

Expand All @@ -1285,19 +1286,36 @@ def generate_metrics_for_period(
6. apply Cholesky correlation on residuals (if correlations given)
7. apply noise (if noise config given): gaussian → outlier → MCAR
8. clamp to value_range, round poisson to int

M24: ``treatment_target_metric`` gates the per-metric shift. When
``None`` (default) every metric in the loop sees the caller's
``treatment_shift`` — the pre-M24 trajectory-wide behaviour, byte-
identical to before. When set to a metric name, only that metric's
``_compute_effective_position`` call receives the shift; every other
metric in the same period sees ``0.0`` and is byte-identical to its
control-arm draw. The validator
``validate_treatment_assignments`` guarantees the named metric
exists, so a non-matching name here is silent dead-weight only if
the caller bypassed validation (e.g. constructed ``PlotsimConfig``
programmatically and pushed it into the engine directly).
"""
effective = [_apply_archetype_overrides(m, archetype) for m in metrics]
centers: dict[str, float] = {}
independent: dict[str, Optional[float]] = {}
correlations_active = bool(correlations)

for em in effective:
em_shift = (
treatment_shift
if (treatment_target_metric is None or em.name == treatment_target_metric)
else 0.0
)
eff_pos = _compute_effective_position(
trajectory_position,
em,
lag_buffer,
period_index,
treatment_shift=treatment_shift,
treatment_shift=em_shift,
)
if lag_buffer is not None:
# Append this metric's effective position BEFORE moving on to
Expand Down Expand Up @@ -1374,6 +1392,7 @@ def generate_entity_metrics(
entity_seasonal_sensitivity: float = 1.0,
treatment_lift_log_odds: Optional[float] = None,
treatment_start_period: int = 0,
treatment_target_metric: Optional[str] = None,
) -> dict[str, np.ndarray]:
"""Generate every metric's full time series for one entity.

Expand Down Expand Up @@ -1471,6 +1490,7 @@ def generate_entity_metrics(
seasonal_global=seasonal_global_t,
entity_seasonal_sensitivity=entity_seasonal_sensitivity,
treatment_shift=shift_t,
treatment_target_metric=treatment_target_metric,
)
for m in sorted_metrics:
collected[m.name].append(period_out[m.name])
Expand Down
2 changes: 2 additions & 0 deletions plotsim/tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,7 @@ def _compute_entity_metrics(
entity_seasonal_sensitivity=entity.seasonal_sensitivity,
treatment_lift_log_odds=entity.treatment_lift_log_odds,
treatment_start_period=entity.treatment_start_period,
treatment_target_metric=entity.treatment_target_metric,
)
return entity_metrics

Expand Down Expand Up @@ -468,6 +469,7 @@ def _compute_entity_metrics(
entity_seasonal_sensitivity=entity.seasonal_sensitivity,
treatment_lift_log_odds=entity.treatment_lift_log_odds,
treatment_start_period=entity.treatment_start_period,
treatment_target_metric=entity.treatment_target_metric,
)
return entity_metrics_v

Expand Down
Loading
Loading