Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 31 additions & 4 deletions experiments/signum_evolve/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# signum-evolve v0
# signum-evolve v1

`signum-evolve v0` is a deterministic offline experiment harness for policy scanner rule catalogs.
`signum-evolve` is a deterministic offline experiment harness for policy scanner rule catalogs.

It does not change scanner behavior, mutate source catalogs, call LLMs, or apply candidate rules. It generates candidate `policy-rules.json` catalogs under `experiments/signum_evolve/out/`, evaluates them with the existing policy scanner eval harness, compares them against the frozen baseline, and exports adoption bundles for human review.

Expand All @@ -21,6 +21,23 @@ When `--historical-root` is provided, `generate` discovers historical contract d

Historical replay is only a review signal. It is not treated as labeled ground truth and does not auto-apply or rewrite candidate catalogs. Missing historical roots are skipped gracefully.

## What v1 Adds

v1 keeps the same safe mutation boundary and adds adoption-grade review evidence:

- Bounded multi-prefix candidates for one non-critical rule.
- Per-candidate `catalog_diff.json`.
- Leaderboard `rank`, `score`, `mutationCount`, and compact catalog diff metadata.
- Adoption bundle catalog diff copy and report section.

The default v1 config is:

```text
experiments/signum_evolve/configs/evolve.v1.json
```

It sets `maxMutationDepth` to `2`, which means a candidate may add one or two excluded path prefixes to the same non-critical rule. It still cannot mutate CRITICAL rules, regexes, severities, rule IDs, or source catalogs.

## What v0 Does Not Do

- No OpenEvolve.
Expand All @@ -40,6 +57,8 @@ Allowed mutation operator:

- `add_excluded_path_prefix` on non-CRITICAL rules only.

v1 may group multiple `add_excluded_path_prefix` mutations for the same rule into one candidate, bounded by `maxMutationDepth`.

Allowed prefixes:

- `docs/`
Expand All @@ -64,7 +83,7 @@ Immutable fields:
```bash
python3 -m experiments.signum_evolve.cli generate \
--repo-root . \
--config experiments/signum_evolve/configs/evolve.v0.json \
--config experiments/signum_evolve/configs/evolve.v1.json \
--run-id smoke \
--max-candidates 5 \
--seed 42
Expand All @@ -83,7 +102,7 @@ To add optional historical replay:
```bash
python3 -m experiments.signum_evolve.cli generate \
--repo-root . \
--config experiments/signum_evolve/configs/evolve.v0.json \
--config experiments/signum_evolve/configs/evolve.v1.json \
--run-id replay-smoke \
--max-candidates 5 \
--seed 42 \
Expand Down Expand Up @@ -134,6 +153,13 @@ python3 -m experiments.signum_evolve.cli leaderboard \

The leaderboard reports candidate decision, status, hard gate result, improvements, regressions, and mutation metadata. A candidate does not need to beat the current baseline to be useful; the current baseline is intentionally strong.

v1 leaderboards also include:

- `rank`: deterministic review order
- `score`: compact ranking inputs
- `mutationCount`: number of scoped catalog edits in the candidate
- `catalogDiff`: changed rule and critical-rule change counts

When replay is enabled, each leaderboard candidate also includes compact historical replay data:

```json
Expand Down Expand Up @@ -163,6 +189,7 @@ The bundle contains:

- `candidate.json`
- `policy-rules.candidate.json`
- `catalog_diff.json`
- `eval.json`
- `compare.json`
- `historical_replay.json`, when replay was enabled
Expand Down
4 changes: 3 additions & 1 deletion experiments/signum_evolve/archive.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Run archive helpers for signum-evolve v0."""
"""Run archive helpers for signum-evolve."""
from __future__ import annotations

import shutil
Expand Down Expand Up @@ -31,6 +31,7 @@ def write_run_manifest(
config_path: Path,
seed: int,
max_candidates: int,
max_mutation_depth: int,
candidate_count: int,
baseline_summary: Dict[str, Any],
) -> Dict[str, Any]:
Expand All @@ -40,6 +41,7 @@ def write_run_manifest(
"config": config_path.as_posix(),
"createdAt": None,
"maxCandidates": max_candidates,
"maxMutationDepth": max_mutation_depth,
"runId": run_id,
"schemaVersion": "1.0",
"seed": seed,
Expand Down
87 changes: 52 additions & 35 deletions experiments/signum_evolve/candidate.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
"""Candidate catalog construction for signum-evolve v0."""
"""Candidate catalog construction for signum-evolve."""
from __future__ import annotations

import copy
import itertools
import json
from pathlib import Path
from typing import Any, Dict, Iterable, List, Sequence
Expand Down Expand Up @@ -52,38 +53,53 @@ def build_candidate(
base_catalog: Dict[str, Any],
index: int,
seed: int,
operator: str,
rule_id: str,
prefix: str,
prefixes: Sequence[str],
) -> Dict[str, Any]:
catalog = copy.deepcopy(base_catalog)
changed = False
prefixes_to_add = list(prefixes)
changed_prefixes: List[str] = []
for rule in catalog["rules"]:
if rule.get("ruleId") != rule_id:
continue
if rule.get("severity") == "CRITICAL":
raise ValueError(f"critical rule cannot be mutated: {rule_id}")
prefixes = list(rule.get("excludedPathPrefixes", []))
if operator == "add_excluded_path_prefix":
if prefix not in prefixes:
prefixes.append(prefix)
rule["excludedPathPrefixes"] = prefixes
changed = True
else:
raise ValueError(f"unsupported mutation operator: {operator}")
rule_prefixes = list(rule.get("excludedPathPrefixes", []))
for prefix in prefixes_to_add:
if prefix not in rule_prefixes:
rule_prefixes.append(prefix)
changed_prefixes.append(prefix)
if changed_prefixes:
rule["excludedPathPrefixes"] = rule_prefixes
break
if not changed:
raise ValueError(f"mutation produced no catalog change: {rule_id} {prefix}")
if not changed_prefixes:
raise ValueError(f"mutation produced no catalog change: {rule_id} {list(prefixes_to_add)}")

mutations = [
{
"operator": "add_excluded_path_prefix",
"prefix": prefix,
"ruleId": rule_id,
}
for prefix in changed_prefixes
]
mutation: Dict[str, Any]
if len(mutations) == 1:
mutation = dict(mutations[0])
else:
mutation = {
"operator": "add_excluded_path_prefix_set",
"prefixes": changed_prefixes,
"ruleId": rule_id,
}

return {
"candidateId": candidate_id(index),
"catalog": catalog,
"createdAt": None,
"mutation": {
"operator": operator,
"prefix": prefix,
"ruleId": rule_id,
},
"mutation": mutation,
"mutationCount": len(mutations),
"mutations": mutations,
"parentId": "baseline",
"schemaVersion": "1.0",
"seed": seed,
Expand All @@ -96,28 +112,29 @@ def generate_candidates(
max_candidates: int,
seed: int,
allowed_prefixes: Sequence[str] = DEFAULT_ALLOWED_PREFIXES,
max_mutation_depth: int = 1,
) -> List[Dict[str, Any]]:
candidates: List[Dict[str, Any]] = []
if max_candidates <= 0:
if max_candidates <= 0 or max_mutation_depth <= 0:
return candidates
next_index = 1
for rule in noncritical_rules(catalog):
rule_id = str(rule.get("ruleId"))
existing_prefixes = set(rule.get("excludedPathPrefixes", []))
for prefix in allowed_prefixes:
if prefix in existing_prefixes:
continue
candidates.append(
build_candidate(
base_catalog=catalog,
index=next_index,
seed=seed,
operator="add_excluded_path_prefix",
rule_id=rule_id,
prefix=prefix,
missing_prefixes = [prefix for prefix in allowed_prefixes if prefix not in existing_prefixes]
max_depth = min(max_mutation_depth, len(missing_prefixes))
for depth in range(1, max_depth + 1):
for prefix_set in itertools.combinations(missing_prefixes, depth):
candidates.append(
build_candidate(
base_catalog=catalog,
index=next_index,
seed=seed,
rule_id=rule_id,
prefixes=prefix_set,
)
)
)
next_index += 1
if len(candidates) >= max_candidates:
return candidates
next_index += 1
if len(candidates) >= max_candidates:
return candidates
return candidates
79 changes: 79 additions & 0 deletions experiments/signum_evolve/catalog_diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
"""Catalog diff helpers for signum-evolve candidate review."""
from __future__ import annotations

from pathlib import Path
from typing import Any, Dict, List

from .candidate import write_json
from .mutate import IMMUTABLE_RULE_FIELDS, rule_by_id


def _as_string_list(value: Any) -> List[str]:
if not isinstance(value, list):
return []
return sorted(item for item in value if isinstance(item, str))


def diff_catalogs(base_catalog: Dict[str, Any], candidate_catalog: Dict[str, Any]) -> Dict[str, Any]:
base_rules = rule_by_id(base_catalog)
candidate_rules = rule_by_id(candidate_catalog)
rule_ids = sorted(set(base_rules) | set(candidate_rules))
changes: List[Dict[str, Any]] = []
critical_rule_changes: List[str] = []

for rule_id in rule_ids:
base_rule = base_rules.get(rule_id)
candidate_rule = candidate_rules.get(rule_id)
if base_rule is None:
changes.append({"changeType": "rule_added", "ruleId": rule_id})
critical_rule_changes.append(rule_id)
continue
if candidate_rule is None:
changes.append({"changeType": "rule_removed", "ruleId": rule_id})
if base_rule.get("severity") == "CRITICAL":
critical_rule_changes.append(rule_id)
continue

base_prefixes = set(_as_string_list(base_rule.get("excludedPathPrefixes")))
candidate_prefixes = set(_as_string_list(candidate_rule.get("excludedPathPrefixes")))
added_prefixes = sorted(candidate_prefixes - base_prefixes)
removed_prefixes = sorted(base_prefixes - candidate_prefixes)
immutable_changes = sorted(
field
for field in IMMUTABLE_RULE_FIELDS
if base_rule.get(field) != candidate_rule.get(field)
)

if not added_prefixes and not removed_prefixes and not immutable_changes:
continue

change = {
"addedExcludedPathPrefixes": added_prefixes,
"autoBlock": candidate_rule.get("autoBlock"),
"immutableFieldChanges": immutable_changes,
"pattern": candidate_rule.get("pattern"),
"removedExcludedPathPrefixes": removed_prefixes,
"ruleId": rule_id,
"severity": candidate_rule.get("severity"),
"type": candidate_rule.get("type"),
}
changes.append(change)

if base_rule.get("severity") == "CRITICAL" and (
added_prefixes or removed_prefixes or immutable_changes
):
critical_rule_changes.append(rule_id)

return {
"changedRuleCount": len(changes),
"changes": changes,
"criticalRuleChanges": sorted(set(critical_rule_changes)),
"criticalRuleChangesCount": len(set(critical_rule_changes)),
"schemaVersion": "1.0",
}


def write_catalog_diff(candidate_dir: Path, diff: Dict[str, Any]) -> Path:
path = candidate_dir / "catalog_diff.json"
write_json(path, diff)
return path
9 changes: 7 additions & 2 deletions experiments/signum_evolve/cli.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""CLI for signum-evolve v0."""
"""CLI for signum-evolve."""
from __future__ import annotations

import argparse
Expand All @@ -8,6 +8,7 @@

from .archive import archive_candidate, prepare_run_dir, write_run_manifest
from .candidate import DEFAULT_ALLOWED_PREFIXES, canonical_json, generate_candidates, load_catalog, load_json
from .catalog_diff import diff_catalogs, write_catalog_diff
from .export import export_bundle
from .mutate import validate_scope_only_mutation
from .report import baseline_summary_from_scorecard, write_leaderboard
Expand Down Expand Up @@ -46,6 +47,7 @@ def command_generate(args: argparse.Namespace) -> int:
).resolve()
historical_root = repo_path(repo_root, args.historical_root).resolve() if args.historical_root else None
allowed_prefixes = tuple(config.get("allowedPrefixes", DEFAULT_ALLOWED_PREFIXES))
max_mutation_depth = int(config.get("maxMutationDepth", 1))

catalog = load_catalog(catalog_path)
baseline_scorecard = load_json(baseline_path)
Expand All @@ -54,6 +56,7 @@ def command_generate(args: argparse.Namespace) -> int:
max_candidates=args.max_candidates,
seed=args.seed,
allowed_prefixes=allowed_prefixes,
max_mutation_depth=max_mutation_depth,
)
if not candidates:
raise RuntimeError("no candidates generated")
Expand All @@ -64,6 +67,7 @@ def command_generate(args: argparse.Namespace) -> int:
if errors:
raise RuntimeError(f"{candidate['candidateId']} failed mutation validation: {errors}")
candidate_dir = archive_candidate(run_dir, candidate)
write_catalog_diff(candidate_dir, diff_catalogs(catalog, candidate["catalog"]))
evaluate_candidate(repo_root, candidate_dir, baseline_path)
if historical_root is not None:
replay = run_historical_replay(
Expand All @@ -81,6 +85,7 @@ def command_generate(args: argparse.Namespace) -> int:
config_path=manifest_path_ref(repo_root, config_path),
seed=args.seed,
max_candidates=args.max_candidates,
max_mutation_depth=max_mutation_depth,
candidate_count=len(candidates),
baseline_summary=baseline_summary_from_scorecard(baseline_scorecard),
)
Expand All @@ -102,7 +107,7 @@ def command_export(args: argparse.Namespace) -> int:


def main(argv: Optional[Sequence[str]] = None) -> int:
parser = argparse.ArgumentParser(description="Offline Signum evolve v0 candidate generator.")
parser = argparse.ArgumentParser(description="Offline Signum evolve candidate generator.")
subcommands = parser.add_subparsers(dest="command", required=True)

generate = subcommands.add_parser("generate", help="Generate and evaluate candidate catalogs.")
Expand Down
17 changes: 17 additions & 0 deletions experiments/signum_evolve/configs/evolve.v1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"allowedPrefixes": [
"docs/",
"examples/",
"fixtures/",
"tests/",
"test/",
"generated/"
],
"baselineCatalog": "lib/policy-rules.json",
"baselineScorecard": "evals/policy_scanner/baselines/current.json",
"maxMutationDepth": 2,
"operators": [
"add_excluded_path_prefix"
],
"schemaVersion": "1.0"
}
Loading
Loading