GEPA optimization program: plugin v0.12.3 (over-explain + viz), viewer fixes, full harness + skill#218
Draft
ivanmkc wants to merge 3 commits into
Draft
GEPA optimization program: plugin v0.12.3 (over-explain + viz), viewer fixes, full harness + skill#218ivanmkc wants to merge 3 commits into
ivanmkc wants to merge 3 commits into
Conversation
Runbook (entry points, env vars, auth) + metric design + OOD-holdout discipline + the hard-won gotchas, authored in the experiment dir and symlinked into .claude/skills/ so any session in the repo finds it.
…e scratch Curates ~130 overnight run dirs down to their durable outputs (best_topology_skills/best_prompts JSON, report.md, SUMMARY*.md) and adds a .gitignore for regenerable scratch (gepa_state.bin, run_log*, candidate_tree.html, candidates.json, generated_best_outputs_valset/, frozen_*.json, logs, generated corpus).
…ntal flag) Reverts the gating from 29877fd — the agent console (chat) and board-history toggles are wired up for all non-readonly views again, and the EXPERIMENTAL flag is removed. Restores the pre-hiding behavior byte-for-byte.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this ships
This branch is the full GEPA prompt-optimization program for termchart diagram
skills, plus the validated wins ported into the shipped plugin and viewer. It is
~82 commits on top of
master(clean fast-forward, no divergence).Plugin (user-facing) — v0.11.4 → v0.12.3
space-budget "don't crowd the visuals" guard. GEPA-validated to lift junior
comprehension ~+0.19–0.21 on unseen (OOD) journeys, 8/10 improving.
images, links checked).
with the real API surface (dedup + self-consistency audit).
Viewer
?experimental=1/localStorageflag.when the viewed board re-renders.
GEPA harness (
scripts/experiments/gepa-flowchart/)multi-journey GEPA over shared skills.
DOM) · visual-quality · junior rubric · viz-usage, harmonic-mean weighted.
K-sample generation, optimize-judge ≠ validation-judge, and an OOD holdout kept in
a separate file so optimization can't train on it.
agreement + cross-eval harnesses; two-gate validated promotion.
SKILL.md, symlinked into.claude/skills/ gepa-optimization/) — runbook + metric design + gotchas.best_*skills,report.md,SUMMARY*.mdkept; regenerablescratch git-ignored.
Key finding (documented, not over-applied)
The junior-comprehension gain generalizes OOD; it mildly regresses viz on data-dense
boards. A viz-protective re-weight proved the tension is fundamental for one universal
board_layout— so the shipped default keeps the space-budget guard, and the cleanfollow-up (split
board_layoutper artifact class) is left staged, not forced.Notes for review
master— todaymasteris at 0.11.4 with no experiments dir, so none of this has shipped via git yet.gepa_state.bin, run logs, generated boards ignored).