GEPA optimization program: plugin v0.12.3 (over-explain + viz), viewer fixes, full harness + skill by ivanmkc · Pull Request #218 · ivanmkc/termchart

ivanmkc · 2026-07-03T22:07:13Z

What this ships

This branch is the full GEPA prompt-optimization program for termchart diagram
skills, plus the validated wins ported into the shipped plugin and viewer. It is
~82 commits on top of master (clean fast-forward, no divergence).

Plugin (user-facing) — v0.11.4 → v0.12.3

Over-explain-for-non-experts default (v0.12.2/0.12.3): define jargon + WHY + a
space-budget "don't crowd the visuals" guard. GEPA-validated to lift junior
comprehension ~+0.19–0.21 on unseen (OOD) journeys, 8/10 improving.
Visualization-usage guidance (show-don't-tell: prose/diagram mix, products get
images, links checked).
New UX/app-screen mockup recipe family (v0.12.1); diagram-recipes reconciled
with the real API surface (dedup + self-consistency audit).

Viewer

Experimental features (chat console, board history) hidden behind an
?experimental=1 / localStorage flag.
Fixes: inbox nudge fires once per new message (not a heartbeat); don't snap-to-top
when the viewed board re-renders.

GEPA harness (`scripts/experiments/gepa-flowchart/`)

Topology-skill hierarchy (journey → shared topology skills → schema atoms) + joint
multi-journey GEPA over shared skills.
Unified scorer: comprehension (text+vision VQA) · geometry (heuristic + rendered
DOM) · visual-quality · junior rubric · viz-usage, harmonic-mean weighted.
Anti-Goodhart machinery: PoLL multi-judge panel (median + disagreement + abstain),
K-sample generation, optimize-judge ≠ validation-judge, and an OOD holdout kept in
a separate file so optimization can't train on it.
Resilient LLM calls (429/transient backoff + graceful degrade); regression + judge-
agreement + cross-eval harnesses; two-gate validated promotion.
Methodology codified as a skill (SKILL.md, symlinked into .claude/skills/ gepa-optimization/) — runbook + metric design + gotchas.
Curated run artifacts: best_* skills, report.md, SUMMARY*.md kept; regenerable
scratch git-ignored.

Key finding (documented, not over-applied)

The junior-comprehension gain generalizes OOD; it mildly regresses viz on data-dense
boards. A viz-protective re-weight proved the tension is fundamental for one universal
board_layout — so the shipped default keeps the space-budget guard, and the clean
follow-up (split board_layout per artifact class) is left staged, not forced.

Notes for review

Draft: the plugin bump (0.12.3) only reaches users once this is on master — today
master is at 0.11.4 with no experiments dir, so none of this has shipped via git yet.
No scratch/binaries in the diff (gepa_state.bin, run logs, generated boards ignored).

Runbook (entry points, env vars, auth) + metric design + OOD-holdout discipline + the hard-won gotchas, authored in the experiment dir and symlinked into .claude/skills/ so any session in the repo finds it.

…e scratch Curates ~130 overnight run dirs down to their durable outputs (best_topology_skills/best_prompts JSON, report.md, SUMMARY*.md) and adds a .gitignore for regenerable scratch (gepa_state.bin, run_log*, candidate_tree.html, candidates.json, generated_best_outputs_valset/, frozen_*.json, logs, generated corpus).

…ntal flag) Reverts the gating from 29877fd — the agent console (chat) and board-history toggles are wired up for all non-readonly views again, and the EXPERIMENTAL flag is removed. Restores the pre-hiding behavior byte-for-byte.

ivanmkc-google added 3 commits July 2, 2026 22:24

gepa: codify the optimization methodology as a discoverable skill

c9c9e4c

Runbook (entry points, env vars, auth) + metric design + OOD-holdout discipline + the hard-won gotchas, authored in the experiment dir and symlinked into .claude/skills/ so any session in the repo finds it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEPA optimization program: plugin v0.12.3 (over-explain + viz), viewer fixes, full harness + skill#218

GEPA optimization program: plugin v0.12.3 (over-explain + viz), viewer fixes, full harness + skill#218
ivanmkc wants to merge 3 commits into
masterfrom
worktree-gepa-flowchart

ivanmkc commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ivanmkc commented Jul 3, 2026

What this ships

Plugin (user-facing) — v0.11.4 → v0.12.3

Viewer

GEPA harness (scripts/experiments/gepa-flowchart/)

Key finding (documented, not over-applied)

Notes for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GEPA harness (`scripts/experiments/gepa-flowchart/`)