Skip to content

Refactor input control with shared _common library and add EPR, CPO, GEPA, PRewrite methods#18

Open
emiehling wants to merge 9 commits into
IBM:mainfrom
emiehling:input-control-refactor
Open

Refactor input control with shared _common library and add EPR, CPO, GEPA, PRewrite methods#18
emiehling wants to merge 9 commits into
IBM:mainfrom
emiehling:input-control-refactor

Conversation

@emiehling

@emiehling emiehling commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Input control

  • Add shared input_control/_common library: formatters, selectors, proposers, scorers, memory, pareto, and budget primitives reused across methods.
  • Rework FewShot onto _common: pluggable selectors (replacing the template arg with a formatter arg) and add the EPR learned dense retriever (Rubin et al. 2021).
  • Add CPO (causal prompt optimization).
  • Add GEPA (reflective prompt evolution).
  • Add PRewrite (prompt rewriter with optional GRPO-trained rewriter).

Other

  • Polymorphic generate in SteeringPipeline (+ types.py).
  • Rename state_control/common/ to _common/ for consistency with input control.
  • Replace the SPPO trainer wrapper with GRPO and PPO trainer wrappers.
  • Add short_answer_match generic metric; refactor base_judge.
  • New control notebooks (CPO, GEPA, PRewrite) and re-run of existing notebooks.
  • Documentation for new methods and the _common libraries.
  • Test coverage for all new controls, common libraries, wrappers, and the polymorphic generate path.

Rename the shared state_control library from common/ to _common/ to mark it
private, and update all importing controls (act_add, caa, cast, iti) and tests
to the new path. Pure rename plus import-path updates; no behavior change.

Signed-off-by: Erik Miehling <emiehling@gmail.com>
Introduce the input_control _common library (memory, formatters, proposers,
scorers, selectors, pareto/budget utilities) and harden the InputControl base
(adapt / adapt_messages application). Add supporting evaluation-metric helpers
(judge base, short-answer match) that the scorers build on, plus core type and
pipeline updates. Add econml/einops dependencies. Includes _common and core
tests.

Signed-off-by: Erik Miehling <emiehling@gmail.com>
Rework FewShot to route adapt/adapt_messages through the shared
FewShotBlockFormatter and pluggable selectors, replace the template arg with a
formatter arg, and add the EPR learned dense retriever (Rubin et al. 2021) as a
BaseSelector. Includes FewShot and EPR tests.

Signed-off-by: Erik Miehling <emiehling@gmail.com>
Add PRewrite (Kong et al. 2024): an LLM instruction rewriter with greedy
inference and best-of-K search strategies, plus an optional GRPO-trained
rewriter using a metric-in-the-loop reward. Add PPO and GRPO TRL wrappers (and
remove the unused SPPO wrapper) to support rewriter training. Includes PRewrite,
PPO, and GRPO wrapper tests.

Signed-off-by: Erik Miehling <emiehling@gmail.com>
Add CPO (Chen et al. 2026): offline causal reward training (Double ML over
PCA-reduced embeddings, GBR fallback) with per-query tree search, routed through
the shared TaskEvaluationScorer. Includes CPO tests.

Signed-off-by: Erik Miehling <emiehling@gmail.com>
Add GEPA (Agrawal et al. 2025): reflective genetic prompt optimization
(single-system-prompt variant) with Pareto-based parent selection over a
held-out set and a budgeted reflective-mutation loop. Includes GEPA tests.

Signed-off-by: Erik Miehling <emiehling@gmail.com>
Add reference pages and example notebooks for FewShot/PRewrite/CPO/GEPA, wire
all four into the nav and examples landing page, and rewrite the structural/
state/output sections of controls.md to the per-method bullet style with
reference links and citations. Fix the STEERING_METHOD registry name and field
in the add-a-method tutorials and the _common building-blocks path.

Signed-off-by: Erik Miehling <emiehling@gmail.com>
…oint

The self-built attention mask used a square (seq_len, seq_len) shape from the
current hidden states, which is wrong during decoding where the single query
token must attend to the full kv cache. SDPA's fused CUDA kernel rejected the
mismatched bias (reported as a contiguity error), failing all PASTA CUDA tests.
The mask now spans the cached key positions via cache_position.

Also replace the bare 'except: breakpoint()' in get_hooks with a chained
ValueError so substring re-tokenization failures surface clearly.
Signed-off-by: Erik Miehling <emiehling@gmail.com>
@emiehling emiehling requested a review from ingelise June 25, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant