Plateau detection should stop early after N identical composite scores

## Problem

When the extraction spec phase produces identical composite scores across multiple iterations, simmer-sdk runs all configured iterations anyway. In a real run, 6 iterations all scored 6.7/10 with identical composites — the generator made changes, judges evaluated them differently on individual criteria, but the composite always averaged to the same value.

Related to #1 — if regression detection compared per-criterion instead of just composite, some of these iterations would have been flagged as regressions and the best candidate preserved.

## Observed behavior

```
extraction_spec iter 0: composite=6.7 (seed)
extraction_spec iter 1: composite=6.7
extraction_spec iter 2: composite=6.7
extraction_spec iter 3: composite=6.7
extraction_spec iter 4: composite=6.7
extraction_spec iter 5: composite=6.7
```

Individual criterion scores varied (precision went 6 → 6.5 → 7.5 → 5 → 6), but the composite was flat because coverage and format_compliance offset the changes.

## Expected behavior

After 3 consecutive iterations with identical composite scores (configurable), trigger `on_plateau` callback and stop early. The current run wasted 3 iterations of compute (~15 minutes of Claude CLI time) producing no improvement.

## Suggested fix

```python
# In refine loop
if len(trajectory) >= 3:
    last_3 = [t.composite for t in trajectory[-3:]]
    if len(set(last_3)) == 1:  # all identical
        if on_plateau:
            await on_plateau(trajectory)
        break  # stop early
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plateau detection should stop early after N identical composite scores #3

Problem

Observed behavior

Expected behavior

Suggested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Plateau detection should stop early after N identical composite scores #3

Description

Problem

Observed behavior

Expected behavior

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions