Sealed twin predictions, decision-case context, and eval export (roadmap Phase 0+1)#84
Merged
Merged
Conversation
Phase 0 of TWIN_ACCURACY_ROADMAP: twin context is now budgeted (4000 tokens, greedy fill) and case-first - past decision episodes are retrieved as verbatim behavioral cases, approved records are relevance-gated (cap 12, confidence fallback), and every assembly is tagged with a context_version for external attribution. Phase 1: decision tiles with 2+ options fire one hidden non-streaming LLM call that predicts the user's choice. The prediction is sealed - redacted from IPC and exports until the outcome is recorded - then revealed with computed agreement. Misses capture an optional correction note that future case retrieval surfaces as a lesson. Prediction lifecycle status (requested/sealed/failed/outcome_recorded_first) exports so the external harness can model missingness. New: OllamaService::chat() non-streaming variant, decision_episodes.jsonl and feedback_events.jsonl in the export bundle (schema v2, privacy filtered), reveal UX with chosen-option select and correction input in Twin Workspace. Prediction prompts are first-person immersed when Twin Identity is configured; no meta-framing in model-facing prompts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Merged
WKJBryan
added a commit
that referenced
this pull request
Jun 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements Phase 0 + Phase 1 of
TWIN_ACCURACY_ROADMAP.md.Phase 0 — context assembly hygiene
## Past Decision Casessection, lexical kNN for now)context_version(ctx-v2-cases-lexical) in trace events and on the episode, so accuracy deltas are externally attributablePhase 1 — sealed predictions (capture + export only, no in-app scoring)
requested/sealed/failed/outcome_recorded_first) exports so the external harness can model missingness instead of overcounting hitsOllamaService::chat()non-streaming variant (visible responses keep streaming)decision_episodes.jsonl(pre-outcome predictions redacted to a sealed stub) andfeedback_events.jsonl(privacy-filtered against Rejected/Private/NoTrain records)Test plan
cargo test— 200 passed (22 new: serde migration, redaction, selection/budget, adversarial parser suite, agreement, export redaction + privacy)npm run test:run— 432 passed (5 new inTwinReviewDecisions.spec.js)cargo clippy --all-targets -- -D warnings— no new warnings introduced (36 pre-existing, CI step is continue-on-error)🤖 Generated with Claude Code