Sealed twin predictions, decision-case context, and eval export (roadmap Phase 0+1) by WKJBryan · Pull Request #84 · WKJBryan/Grafyn

WKJBryan · 2026-06-12T05:31:16Z

Summary

Implements Phase 0 + Phase 1 of TWIN_ACCURACY_ROADMAP.md.

Phase 0 — context assembly hygiene

Twin context is budgeted (4000-token greedy fill) and case-first: past decision episodes are retrieved into the prompt as verbatim behavioral cases (new ## Past Decision Cases section, lexical kNN for now)
Approved records are relevance-gated (cap 12) with a top-confidence fallback instead of unconditionally included
Every assembled context is tagged context_version (ctx-v2-cases-lexical) in trace events and on the episode, so accuracy deltas are externally attributable

Phase 1 — sealed predictions (capture + export only, no in-app scoring)

Decision tiles with 2+ options fire one hidden non-streaming LLM call predicting the user's choice; fire-and-forget, never blocks the visible streaming flow
The prediction is sealed: redacted from IPC responses, trace payloads, and exports until the user records their choice; then revealed with computed agreement
Misses capture an optional "why was the twin wrong?" correction note — surfaced as a lesson by future case retrieval
Prediction lifecycle (requested / sealed / failed / outcome_recorded_first) exports so the external harness can model missingness instead of overcounting hits
Parsing fallback chain: strict JSON → normalized option/label match → raw (manual adjudication); 1-based index convention with text-wins disambiguation
Prediction prompts are first-person immersed when Twin Identity is configured ("I am {name}… which option do I choose?") — no meta-framing anywhere in model-facing prompts
New OllamaService::chat() non-streaming variant (visible responses keep streaming)
Export bundle schema v2: adds decision_episodes.jsonl (pre-outcome predictions redacted to a sealed stub) and feedback_events.jsonl (privacy-filtered against Rejected/Private/NoTrain records)
Twin Workspace reveal UX: sealed badge, chosen-option select with free-text fallback, agreement badge, correction input on miss — no accuracy aggregates or dashboards (external-eval policy)

Test plan

cargo test — 200 passed (22 new: serde migration, redaction, selection/budget, adversarial parser suite, agreement, export redaction + privacy)
npm run test:run — 432 passed (5 new in TwinReviewDecisions.spec.js)
cargo clippy --all-targets -- -D warnings — no new warnings introduced (36 pre-existing, CI step is continue-on-error)

🤖 Generated with Claude Code

Phase 0 of TWIN_ACCURACY_ROADMAP: twin context is now budgeted (4000 tokens, greedy fill) and case-first - past decision episodes are retrieved as verbatim behavioral cases, approved records are relevance-gated (cap 12, confidence fallback), and every assembly is tagged with a context_version for external attribution. Phase 1: decision tiles with 2+ options fire one hidden non-streaming LLM call that predicts the user's choice. The prediction is sealed - redacted from IPC and exports until the outcome is recorded - then revealed with computed agreement. Misses capture an optional correction note that future case retrieval surfaces as a lesson. Prediction lifecycle status (requested/sealed/failed/outcome_recorded_first) exports so the external harness can model missingness. New: OllamaService::chat() non-streaming variant, decision_episodes.jsonl and feedback_events.jsonl in the export bundle (schema v2, privacy filtered), reveal UX with chosen-option select and correction input in Twin Workspace. Prediction prompts are first-person immersed when Twin Identity is configured; no meta-framing in model-facing prompts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Version bump to 0.2.0 plus vite 7->8 upgrade to clear high-severity npm advisories. Ships Twin Workspace redesign (#85), sealed twin predictions + eval export (#84), Twin Identity setup (#83), jsconfig fix (#86).

WKJBryan merged commit 79aa9f4 into main Jun 12, 2026
10 checks passed

WKJBryan deleted the feature/twin-sealed-predictions branch June 12, 2026 05:43

WKJBryan mentioned this pull request Jun 17, 2026

chore: release v0.2.0 #87

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sealed twin predictions, decision-case context, and eval export (roadmap Phase 0+1)#84

Sealed twin predictions, decision-case context, and eval export (roadmap Phase 0+1)#84
WKJBryan merged 1 commit into
mainfrom
feature/twin-sealed-predictions

WKJBryan commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

WKJBryan commented Jun 12, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant