Skip to content

Sealed twin predictions, decision-case context, and eval export (roadmap Phase 0+1)#84

Merged
WKJBryan merged 1 commit into
mainfrom
feature/twin-sealed-predictions
Jun 12, 2026
Merged

Sealed twin predictions, decision-case context, and eval export (roadmap Phase 0+1)#84
WKJBryan merged 1 commit into
mainfrom
feature/twin-sealed-predictions

Conversation

@WKJBryan

Copy link
Copy Markdown
Owner

Summary

Implements Phase 0 + Phase 1 of TWIN_ACCURACY_ROADMAP.md.

Phase 0 — context assembly hygiene

  • Twin context is budgeted (4000-token greedy fill) and case-first: past decision episodes are retrieved into the prompt as verbatim behavioral cases (new ## Past Decision Cases section, lexical kNN for now)
  • Approved records are relevance-gated (cap 12) with a top-confidence fallback instead of unconditionally included
  • Every assembled context is tagged context_version (ctx-v2-cases-lexical) in trace events and on the episode, so accuracy deltas are externally attributable

Phase 1 — sealed predictions (capture + export only, no in-app scoring)

  • Decision tiles with 2+ options fire one hidden non-streaming LLM call predicting the user's choice; fire-and-forget, never blocks the visible streaming flow
  • The prediction is sealed: redacted from IPC responses, trace payloads, and exports until the user records their choice; then revealed with computed agreement
  • Misses capture an optional "why was the twin wrong?" correction note — surfaced as a lesson by future case retrieval
  • Prediction lifecycle (requested / sealed / failed / outcome_recorded_first) exports so the external harness can model missingness instead of overcounting hits
  • Parsing fallback chain: strict JSON → normalized option/label match → raw (manual adjudication); 1-based index convention with text-wins disambiguation
  • Prediction prompts are first-person immersed when Twin Identity is configured ("I am {name}… which option do I choose?") — no meta-framing anywhere in model-facing prompts
  • New OllamaService::chat() non-streaming variant (visible responses keep streaming)
  • Export bundle schema v2: adds decision_episodes.jsonl (pre-outcome predictions redacted to a sealed stub) and feedback_events.jsonl (privacy-filtered against Rejected/Private/NoTrain records)
  • Twin Workspace reveal UX: sealed badge, chosen-option select with free-text fallback, agreement badge, correction input on miss — no accuracy aggregates or dashboards (external-eval policy)

Test plan

  • cargo test — 200 passed (22 new: serde migration, redaction, selection/budget, adversarial parser suite, agreement, export redaction + privacy)
  • npm run test:run — 432 passed (5 new in TwinReviewDecisions.spec.js)
  • cargo clippy --all-targets -- -D warnings — no new warnings introduced (36 pre-existing, CI step is continue-on-error)

🤖 Generated with Claude Code

Phase 0 of TWIN_ACCURACY_ROADMAP: twin context is now budgeted (4000
tokens, greedy fill) and case-first - past decision episodes are
retrieved as verbatim behavioral cases, approved records are
relevance-gated (cap 12, confidence fallback), and every assembly is
tagged with a context_version for external attribution.

Phase 1: decision tiles with 2+ options fire one hidden non-streaming
LLM call that predicts the user's choice. The prediction is sealed -
redacted from IPC and exports until the outcome is recorded - then
revealed with computed agreement. Misses capture an optional correction
note that future case retrieval surfaces as a lesson. Prediction
lifecycle status (requested/sealed/failed/outcome_recorded_first)
exports so the external harness can model missingness.

New: OllamaService::chat() non-streaming variant, decision_episodes.jsonl
and feedback_events.jsonl in the export bundle (schema v2, privacy
filtered), reveal UX with chosen-option select and correction input in
Twin Workspace. Prediction prompts are first-person immersed when Twin
Identity is configured; no meta-framing in model-facing prompts.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@WKJBryan WKJBryan merged commit 79aa9f4 into main Jun 12, 2026
10 checks passed
@WKJBryan WKJBryan deleted the feature/twin-sealed-predictions branch June 12, 2026 05:43
@WKJBryan WKJBryan mentioned this pull request Jun 17, 2026
WKJBryan added a commit that referenced this pull request Jun 17, 2026
Version bump to 0.2.0 plus vite 7->8 upgrade to clear high-severity npm advisories. Ships Twin Workspace redesign (#85), sealed twin predictions + eval export (#84), Twin Identity setup (#83), jsconfig fix (#86).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant