Skip to content

[DO NOT MERGE — experimental] Ask chat for Shrek demo (audri's green-light required)#160

Draft
audriB wants to merge 188 commits into
mainfrom
feat/experimental-ask-chat
Draft

[DO NOT MERGE — experimental] Ask chat for Shrek demo (audri's green-light required)#160
audriB wants to merge 188 commits into
mainfrom
feat/experimental-ask-chat

Conversation

@audriB
Copy link
Copy Markdown
Contributor

@audriB audriB commented May 12, 2026

Summary

Experimental public-facing chatbot at /ask over the published NDI Commons catalog. Built for the Shrek upsell demo (he's already buying LabChat; pitch is "you can also chat over your experiment data on NDI Cloud").

Scope is deliberately tight:

  • Anonymous-only, public-data-only (5 tools backed by existing FastAPI public endpoints)
  • Ephemeral conversation (no DB)
  • Two-flag gate: ANTHROPIC_API_KEY (route) + NEXT_PUBLIC_ASK_ENABLED (nav)
  • Edge-runtime streaming via Vercel AI SDK v5 + Anthropic Claude Sonnet 4.5

Production impact when this PR sits in draft: ZERO. Both env flags must be set, and the PR is intentionally not merging to main without explicit Audri review.

Spec: apps/web/docs/specs/2026-05-11-experimental-ask-chat-design.md
Impl plan: apps/web/docs/plans/2026-05-11-experimental-ask-chat-impl.md

What's new

  • /ask page (route-group: marketing) — 'use client' chat shell using @ai-sdk/react v5 useChat
  • POST /api/ask edge route (streaming) with feature flag + per-IP rate limit + zod-validated tools
  • lib/ai/ modules: tools (5 tools backed by existing public catalog endpoints), system-prompt, rate-limit, feature-flag, anthropic-client
  • components/ai/ chat primitives (Markdown, Message, Thread, Input, Chips, ToolCallIndicator)
  • Nav tab "Ask" inserted between Platform and About (env-gated)
  • 31 new unit tests, 5-test E2E smoke

Test plan

Local (all done):

  • Unit tests pass: 1005 tests, 94 files, all green
  • Lint clean (pnpm lint --max-warnings=0)
  • Typecheck clean (tsc -b --noEmit)
  • Production build succeeds; /api/ask shows as edge function, /ask as static (gates render-time)
  • Bundle ratchet under +0.22 KB delta — heavy deps are route-scoped, not added to shared chunk
  • E2E smoke ready (mocks AI SDK v5 UI message stream; flag-off tests run unconditionally; flag-on tests skip cleanly without API key)

Preview (Audri to verify on Vercel preview URL):

  • Set ANTHROPIC_API_KEY + NEXT_PUBLIC_ASK_ENABLED=1 on the preview deployment env
  • Visit preview URL /ask — Ask tab visible in nav, chat loads
  • Click each of 4 suggested prompts — get factual cited responses
  • Type a custom prompt about a specific dataset (e.g. tree shrew Bhar) — verify response is correct
  • Confirm no console errors during a 5-message conversation
  • Mobile: open preview on phone, confirm no horizontal scroll

Cost / risk

  • Expected demo cost: under $5 even with Shrek's whole team playing for an hour
  • Rate limit: 10 messages / 10 min per IP (in-memory, per-edge-instance)
  • No DB changes, no FastAPI changes, no auth changes
  • Branch deletes cleanly if Shrek doesn't bite

🤖 Generated with Claude Code

audriB and others added 13 commits May 11, 2026 23:56
Design for an anonymous public chatbot demo over the published NDI
Commons catalog. Showcase target: Shrek (existing LabChat customer,
prospect for data services). Lives behind a feature branch + dual env
gate so the demo can be reviewed on a Vercel preview without ever
touching production.

Scope is intentionally tight to keep the demo throwaway-safe:
anonymous-only, public-data-only, ephemeral conversation, 5 tools
backed by existing FastAPI public endpoints, no MongoDB schema
changes, no auth changes.

Companion impl plan generated next via superpowers:writing-plans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13-task TDD-style plan covering the full build: deps + env + flag,
rate-limiter, system prompt, tool handlers, route handler, chat
components, page assembly, nav integration, e2e smoke, build + PR.

Companion to 2026-05-11-experimental-ask-chat-design.md. Will be
executed inline next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the dependency set for the experimental Ask chat (Vercel AI SDK
v5 + Anthropic provider + react-markdown + @ai-sdk/react for the
hooks), extends the zod env schema with two new optional vars
(ANTHROPIC_API_KEY for the route gate, NEXT_PUBLIC_ASK_ENABLED for
nav visibility), and lands the feature-flag helpers + unit tests.
No runtime surface changes yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simple in-memory token bucket: 10 requests / 10 min per IP. Sliding
window. Documented edge-runtime caveat (per-instance memory) and
swap path to Vercel KV if this ever escapes prototype scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hand-tuned for scope-locking + anti-fabrication + identity-anchoring.
Tests pin the critical clauses so a future edit can't accidentally
strip a safety guarantee.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each tool proxies to an existing FastAPI public endpoint with
zod-validated input, 8s timeout, anonymous fetch, and { error }
fallback on failure. Tools are also exported as AI SDK tool()
definitions for direct binding to streamText.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Streams Claude Sonnet completions via the AI SDK with 5 tools bound.
Fails closed on missing API key (503), rate-limited per IP (429),
and validates body shape (400). Uses AI SDK v5's stopWhen +
stepCountIs (replaces v4's maxSteps) and convertToModelMessages
to bridge UIMessage<->ModelMessage at the boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d, ToolCallIndicator)

Six presentational components for the /ask chat surface:
- Markdown: react-markdown + remark-gfm with internal link rewriting
- ChatMessage: user/assistant bubble with role-based styling
- ChatInput: textarea + Send, Enter-to-send (Shift+Enter newline)
- SuggestedPromptChips: starter prompts shown on empty thread
- ToolCallIndicator: inline "browsing the catalog…" while tools fire
- ChatThread: scrollable container with smart auto-scroll heuristic

Sized so the ask-shell composition stays small. No business logic
in these — they accept handlers and render.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Composes thread + chips + input. v5 useChat differences handled:
input state is local, transport is DefaultChatTransport configured
to /api/ask, sends via sendMessage({ text }). Adapts UIMessage[]
parts shape into our ThreadEntry[] so tool-call indicators
interleave with assistant text in the same order the model
emitted them. Friendly error banner for 503/429/network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RSC page gates on askEnabled() server-side (defense in depth with
the route handler's 503). noindex metadata since the preview isn't
SEO content. Scoped not-found for any future sub-routes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inserts the new tab between Platform and About so it reads as a
product surface. Hidden by default — NEXT_PUBLIC_ASK_ENABLED=1
required for the link to appear. Independent gate from
ANTHROPIC_API_KEY (which controls the route) so we can deploy the
backend without surfacing the tab, or vice versa.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mocks the AI SDK v5 UI message stream so the chat flow exercises
end-to-end without a live Anthropic key. Tests skip gracefully if
the feature flag is off. Mobile viewport test runs unconditionally
and asserts no horizontal overflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, env entries like ANTHROPIC_API_KEY='' (empty string)
tripped the min(20) check, failing parseEnv for any caller (tests
that set the var to '' deliberately, dotenv files with placeholder
'KEY=' rows, etc.). The preprocess() short-circuits empty strings
to undefined so optional() applies cleanly.

Caught by the unit-test sweep at Task 13.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
ndi-cloud-app-web Ready Ready Preview, Comment May 20, 2026 2:35pm

Request Review

CI e2e on PR #160 caught a pre-existing footer-layout bug at
viewports <~400px: the mailto link `info@walthamdatascience.com`
(unbreakable string) expands its grid column to its intrinsic
min-content width, overflowing the page horizontally by ~23px.

This has actually been live on every marketing page on mobile
since launch — never caught because no prior e2e checked
document.documentElement.scrollWidth vs clientWidth. The new
/ask test surfaced it, and the fix is the same 2-class change
that helps everywhere: min-w-0 on the grid item lets it shrink,
break-words on long links lets them wrap.

Verified: ask.spec.ts mobile-viewport test now passes (375x667).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
audriB and others added 5 commits May 12, 2026 19:45
Adds the dependency + env entry for the build-time RAG index that
the next commits will land. Matches vh-lab/shrek-lab's choice of
voyage-4-large @ 1024-d so the same key works across all three
chatbots.

devDependency (not dependencies) — the SDK is build-time-only.
Runtime query embedding will use Voyage's REST API via fetch so
the edge bundle stays clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three-tier metadata pattern adapted from vh-lab/shrek-lab:

1. `lib/ai/dataset-metadata.json` — hand-curated sidecar mapping
   dataset IDs to {highlights, keywords, notableMethods, piContext}.
   Author facts the catalog API doesn't expose (e.g., "this is
   the only public tree shrew V1 dataset") and they end up in the
   embedded chunk text so semantic queries can find them.

2. `scripts/build-ask-index.mjs` — one-shot build:
   - Paginates the catalog
   - Enriches each dataset with the summary endpoint
   - Composes a document string per dataset (catalog + sidecar)
   - Batch-embeds via Voyage AI voyage-4-large (1024-d, same as
     vh-lab + shrek-lab so the key is shared)
   - Writes lib/ai/dataset-index.json (committed to git)

3. `lib/ai/dataset-index.json` — empty placeholder. Run the script
   to populate. Runtime tool returns gracefully when entries=[].

Run with: pnpm --filter @ndi-cloud/web build-ask-index

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the runtime side of the experimental Ask chat's RAG layer:

- lib/ai/index-loader.ts: loads dataset-index.json, lazily promotes
  embeddings to Float32Array, exposes cosineSimilarity + topKByVector.
  Tested with synthetic 3-d fixtures so the geometry is reasonable.

- lib/ai/voyage-client.ts: runtime query embedding via Voyage REST API
  (no SDK at runtime — keeps the bundle clean). 8s timeout matches
  the other tool handlers. Pinned to voyage-4-large to match the
  build-time script + vh-lab + shrek-lab.

- lib/ai/tools.ts: new 6th tool semantic_search_datasets({query, limit}).
  Embeds the query, ranks against the pre-baked index, returns top-K
  with score + curated metadata. Graceful errors for: empty index,
  no API key, embed failure, dim mismatch.

- lib/ai/system-prompt.ts: teaches Claude when to pick which tool —
  concept-vs-substring is the key heuristic. Fall-back instructions
  if semantic_search returns an error.

- app/api/ask/route.ts: runtime: 'nodejs' (was 'edge'). The
  dataset-index.json import will be multi-MB once populated;
  Node's 250 MB limit gives plenty of headroom vs. edge's 4 MB.
  60s maxDuration covers up to 4 tool roundtrips + streaming.

23 new unit tests across 4 test files. Build + lint + typecheck
+ all 1031 unit tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to the original 2026-05-11 design spec. Documents:
- What was copied from vh-lab + shrek-lab (three-tier metadata,
  Voyage AI provider, build-time embedding)
- What was deliberately simplified for our scale (flat JSON vs
  pgvector, one chunk per dataset vs section-aware chunking,
  cosine-only vs hybrid+rerank)
- The manual refresh workflow (set VOYAGE_API_KEY, run script,
  commit + push, Vercel auto-redeploys)
- Failure-mode UX (every RAG failure falls back to keyword search;
  the chat never breaks because RAG is unavailable)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strict TS + the project's '--max-warnings=0' eslint config required
non-null assertions on array-index accesses + dropping the unused
`beforeEach` import. No runtime impact, no behavioral change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rank)

Replaces the flat-JSON + pure-cosine first pass with a faithful copy
of vh-lab + shrek-lab architecture. Every retrieval-quality component
matches: same DB engine, same indexes, same RRF constants, same
reranker.

What changed:

- DROP lib/ai/index-loader.ts + dataset-index.json — flat JSON gone
- ADD lib/ai/db/{pool.ts, schema.sql} — Postgres connection +
  chunks + chunks_staging + rag_versions tables, IVFFlat (cosine,
  lists=100), GIN tsvector index
- ADD lib/ai/hybrid-retrieval.ts — parallel vector + BM25 lanes,
  RRF merge at k=60, ivfflat.probes=10 at query time
- UPDATE lib/ai/voyage-client.ts — adds rerank() alongside
  embedQuery(); both via REST, voyage-4-large + rerank-2.5
- UPDATE lib/ai/tools.ts — semantic_search_datasets runs the full
  4-stage pipeline (embed → hybrid → RRF → rerank); soft-degrades
  to RRF-only if rerank fails
- REWRITE scripts/build-ask-index.mjs — staged ingest into Postgres
  with atomic promote (mirrors vh-lab's
  promote_staging_to_production_sync); REINDEX after promote
- ADD DATABASE_URL to env schema
- UPDATE design addendum with final architecture + setup steps +
  cost + failure modes

Setup (one-time):
  1. Railway → +Add → PostgreSQL → copy DATABASE_URL
  2. psql $DATABASE_URL -f apps/web/lib/ai/db/schema.sql
  3. Set DATABASE_URL + VOYAGE_API_KEY on Vercel Preview
  4. export DATABASE_URL=... && export VOYAGE_API_KEY=...
     pnpm --filter @ndi-cloud/web build-ask-index

Local verification:
  ✅ 1031 unit tests (10 new tests + 12 updated for new pipeline)
  ✅ Lint + typecheck clean
  ✅ Production build succeeds
  ✅ Bundle ratchet still under baseline

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The voyageai npm SDK ships ESM with directory-style sub-imports that
Node's strict ESM resolver rejects (ERR_UNSUPPORTED_DIR_IMPORT). The
runtime client in lib/ai/voyage-client.ts already calls the REST API
directly; aligning the build script removes the broken dep entirely.

Same Voyage endpoints, same auth, same response shape — just no
SDK indirection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@audriB audriB changed the title feat: experimental Ask chat (Shrek demo, branch-only) [DO NOT MERGE — experimental] Ask chat for Shrek demo (audri's green-light required) May 13, 2026
@audriB
Copy link
Copy Markdown
Contributor Author

audriB commented May 13, 2026

🛑 DO NOT MERGE — experimental branch only

Per audri 2026-05-13: this branch is for Shrek-demo experimentation. No merge without explicit green light from @audriB.

State as of this comment:

Phase 1 of the scientific-depth plan (apps/web/docs/specs/
2026-05-13-ask-scientific-depth-plan.md). Every tool result now
carries a `references: Reference[]` array, the LLM is taught to
emit `[^N]` footnote markers tied to those references, and the
chat UI renders them as clickable chips that deep-link into the
Document Explorer.

Foundation pieces:

- lib/ai/references.ts  Reference type + makeReference helpers +
  parseFootnotes() that extracts [^N]: [title](url) — class defs
  from message body into a Map<number, Reference>
- lib/ai/tools.ts  every existing tool (list_published_datasets,
  get_dataset, get_dataset_summary, get_dataset_class_counts,
  get_facets, semantic_search_datasets) now returns a `references`
  array alongside its data payload. Each cites the dataset
  overview (catalog) or facet surface
- lib/ai/system-prompt.ts  adds CITATION section: [^N] footnotes
  required, ### Sources panel required, never fabricate a citation
- components/ai/CitationChip.tsx  small inline [N] chip with
  hover tooltip (title + snippet + class badge), opens reference
  URL in new tab
- components/ai/SourcesPanel.tsx  bottom-of-message
  deduplicated references list with class badges
- components/ai/Markdown.tsx  parses footnotes from raw content,
  customizes remark-gfm rendering: footnote-ref <sup><a> becomes
  CitationChip; default footnote-section is suppressed in favor
  of SourcesPanel; "### Sources" h3 stripped to avoid duplicates

Tests:

- 1045 unit tests pass (+14 new: references shape, footnote
  parsing, tool reference attachment, system-prompt citation
  clauses)
- Lint + typecheck + build clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
audriB and others added 30 commits May 18, 2026 17:23
Two scientific use cases:
  - Subject join: pair measurements from two ontologyTableRow
    groups by subjectDocumentIdentifier
  - Treatment join: pair a measurement with the subject's
    treatment label (walks treatment / treatment_drug /
    treatment_transfer class chain)

Spec covers backend service (cross_table_pairs method on
TabularQueryService), router endpoint, cloud-app tool handler
(cross_table_query), chat-tools registration, ScatterChart
component, and BehavioralComparePanel mode-toggle.

Acceptance + test plan included. Implementation lands as
follow-up commits this same arc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First wave of S5.3 (cross-table joins) on cloud-app.

NEW: apps/web/lib/ndi/tools/cross-table-query.ts
  - Tool handler mirroring tabular-query.ts pattern
  - zod input schema with joinOn enum ("subject" | "treatment")
  - POSTs to /api/datasets/:id/cross-table-query
  - Returns pair_count, unjoined, group_summary, chart_payload (for
    the scatter-chart fence), references, empty_hint for retries

NEW: apps/web/app/api/datasets/[id]/cross-table-query/route.ts
  - Thin proxy route following the tabular-query/route.ts pattern

NEW: apps/web/tests/unit/ai/tools/cross-table-query.test.ts
  - 9 tests covering subject-join, treatment-join, groupBy
    aggregation, empty-hint surfacing, input validation,
    groupOrder pass-through

Remaining S5.3 work: chat-tools.ts registration, ScatterChart
component, BehavioralComparePanel mode toggle, backend service +
router (waiting on BE-A agent).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When tables have many columns (Bhar's subject table is 43 cols
post-F-1b), the body H-scrolls but the header row stays static.
Users scroll right to see column 30's data, but column 30's
header is hidden left — they can't tell what column the data
belongs to.

Fix: CSS-only — add `min-width: max-content` to the `<table>` inside
VirtualizedTable's scroll container. With `w-full` alone, some
browsers honor `width: 100%` over cell intrinsic widths and squeeze
columns rather than growing the table; the scroll container then
never triggers an H-scrollbar. With `min-width: max-content` set,
the table naturally grows to fit cell content, and because both
`<thead>` (with `sticky top-0` for vertical pinning) and `<tbody>`
live inside the SAME `<table>` inside the SAME `overflow-auto`
scroll container, horizontal scroll moves header + body together —
column titles stay aligned with their cells.

Narrow tables (3-5 cols) where natural content width is smaller
than the container still render at `w-full` — `min-width:
max-content` only kicks in when content overflows. No regression.

Test added: tests/unit/components/ui/VirtualizedTable.test.tsx
verifies the inline style is applied, the sticky thead class is
preserved, and the thead+tbody share the same scroll-container
parent. Covers both the wide (43-col Bhar) and narrow (3-col)
case. Real H-scroll behavior is browser-driven; jsdom doesn't lay
out tables, so visual verification is owed to the Playwright E2E
suite against `/datasets/69bc5ca11d547b1f6d083761/tables/subject`.

This applies to the catalog summary tables (SummaryTableView,
MyDatasetsTable). WorkspaceDataGrid already had a JS scroll-sync
fix landed in fc1b8a8 (header lives outside the body's
overflow-auto, so it needs the JS sync rather than CSS).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second wave of S5.3 (cross-table joins) on cloud-app:

NEW: apps/web/components/ndi/charts/ScatterChart.tsx (~270 lines)
  - Plotly scatter for joinOn=subject (numeric X vs numeric Y,
    color by group when groupBy set)
  - Strip plot for joinOn=treatment (numeric X vs categorical Y
    treatment labels)
  - Fetches via TanStack Query against /api/datasets/:id/cross-table-query
  - Shared color palette with ViolinChart for consistent
    cross-chart group coloring
  - Surfaces unjoined count in figcaption when non-zero (so users
    see "5 subjects unpaired (x-only: 3, y-only: 2)")
  - Empty / loading / error states with testid wrappers

NEW: apps/web/tests/unit/components/charts/ScatterChart.test.tsx
  - 6 tests: subject-join, treatment-join, empty, error, loading,
    unjoined-figcaption

MODIFIED: apps/web/lib/ai/chat-tools.ts
  - Registers cross_table_query tool with description directing the
    LLM to use it when the user names TWO measurements (or one +
    a treatment), with the retry loop pattern + scatter-chart fence

Remaining S5.3 work: BehavioralComparePanel mode toggle, backend
service + router (waiting on BE-A agent to finish F-8 + F-1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The LLM emits cross_table_query results inside a ```scatter-chart
fence. Markdown.tsx now recognizes the fence + parses the payload
via `parseScatterChartPayload` and mounts <ScatterChart /> inline
in the chat message. Falls back to default code styling on
malformed payloads — same defensive pattern as parseViolinChartPayload.

Also unwraps the <pre> wrapper that react-markdown otherwise puts
around fenced code blocks, so the chart's overflow + figcaption
render cleanly.

Adds ScatterChart.displayName = 'ScatterChart' for the
childIsChartComponent identity test (mirrors ViolinChart pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the CROSS-TABLE / PAIRED COMPARISONS section to the SYSTEM_PROMPT
guiding the LLM to:
  - Use cross_table_query when the user names TWO distinct
    measurements/axes (vs tabular_query for single-axis groupBy)
  - Discriminate joinOn=subject (numeric × numeric scatter) vs
    joinOn=treatment (numeric × categorical strip plot)
  - Embed the chart_payload inside a scatter-chart fence
  - Surface unjoined counts explicitly when non-zero

Also tightens the cross_table_query chat-tools description so the
yVariableContains hint doesn't carry quoted snake_case field names
(which the tool-descriptions lint flags as un-explained NDI schema
names). Replaced with prose-form: "treatment reference / mixture /
name fields".

2221 tests + lint + build green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a mode toggle at the top of the BehavioralComparePanel that
switches between the existing single-table flow (tabular_query →
ViolinChart) and the new cross-table flow (cross_table_query →
ScatterChart).

Single-table mode behavior + existing tests unchanged.

Cross-table mode:
  - Form: xVariableContains + yVariableContains + joinOn radio +
    groupBy + groupOrder + title
  - Run posts to /api/datasets/:id/cross-table-query
  - Result renders ScatterChart (subject-join scatter / treatment-
    join strip plot) + per-group count summary table
  - Empty-hint retry loop mirrors tabular_query (separate
    test-id prefix so both modes' picks are independently asserted)
  - Mode toggle resets BOTH form sets + both committedArgs slots so
    no stale input silently fires on the next Run
  - F-4 useQuery with stable committedArgs preserved — two queries,
    each enabled only when its mode is active

+7 new tests for the cross-table flow (mode toggle, x+y+joinOn
validation, POST body shape, ScatterChart render, empty-hint retry
with picked column as groupBy, form-reset on mode switch, result-
clear on mode switch after a successful run).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous turn crashed mid-arc (BE-A backend agent OOM'd
during F-1; my S5.3 backend service was wiped by an earlier git
reset from the same agent). All cloud-app work + 4 of 5 backend
tickets are pushed and safe; 2 items remain for the completion
run:

  - F-1 backend (curated /tables/stimulus projection) — 241-line
    integration-test stub recovered to
    docs/specs/2026-05-18-f1-stimulus-projection-stub.diff
    (service + router never implemented)
  - S5.3 backend (cross_table_pairs service + POST /cross-table-query
    route) — cloud-app side fully wired and waiting; backend was
    lost. Full design spec + response contract added to the
    handoff doc.

Plus two verification curls (F-6 0-count regression, B6
cross-dataset audit) added to the checklist.

Adds explicit "Step 1 → Step 2 → Step 3" framing matching the
user's stated plan:
  1. Finish the completion run (this turn's remaining items)
  2. Run the exhaustive test matrix
  3. Tools-along-boundaries canvas redesign (held for design Q&A)

Plus operational guardrails to prevent re-hitting the same git
reset / parallel-agent collision / rate-limit pitfalls we burned
this session on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…= verifications

User cross-checked the handoff against their pre-crash context recap
and surfaced four items needing reconciliation:

1. F-1 was already implemented in 0231851 (earlier this arc).
   STIMULUS_COLUMNS + router alias + projection function all live in
   the backend tree. The BE-A agent's recovered test stub is for
   ADDITIONAL pagination-invariant integration tests, not the
   implementation itself. Updated F-1 detail to reflect this and
   re-classified as "optional integration tests" rather than
   "implementation pending."

2. F-6 verification ran this turn with specific numbers — Bhar=0
   (expected, no spike data), Francesconi=1604, Haley=4156. Locked
   in to the handoff as "Verifications run this turn — locked-in
   results" so the next session doesn't re-run.

3. B6 cross-dataset audit ran clean (all 8 datasets). Surfaced a
   SEPARATE upstream issue: Dabrowska returns totalDocuments=0
   from /summary — appears to be cloud-side sync, NOT B6-related.
   Added to "Surfaced this turn but NOT actioned" section.

4. ?className= vs ?class= mystery — confirmed NOT a bug.
   Cloud-app sends ?class= correctly via
   lib/api/documents.ts::useDocuments. The earlier curl was a typo
   on the curl side. Locked in so future sessions don't re-investigate.

After this update, the post-compaction completion checklist
collapses to ONE substantive backend item: S5.3 cross_table_pairs
service + POST /cross-table-query route (cloud-app side fully
wired; backend implementation lost to git reset, design captured
in full in this handoff).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-crash START HERE block was task-focused but didn't
re-state the operational essentials a fresh agent needs before
ANY action: which repo, which branch, which env IDs, which creds,
which sacred rules, which CI gates.

Adds a Pre-flight checklist right after the user-stated plan
intro covering:
  - Repo paths + branches (cloud-app on feat/experimental-ask-chat,
    backend on feat/ndi-python-phase-a)
  - Production vs experimental env IDs (Vercel + Railway), with
    explicit "DO NOT touch production" framing
  - The 7 sacred rules (NEVER push to main, hook discipline,
    author rule, Co-Authored-By trailer)
  - CI gates per repo (pnpm lint+typecheck+test+build for
    cloud-app; ruff+mypy+pytest for backend)
  - Cred status — all 3 burned mid-arc; ~1h recovery; test matrix
    will likely need a fresh cred from the user
  - Operational gotchas: pnpm-lock at repo root, Vercel/Railway
    deploy-wait windows, Railway-agent get-logs structlog quirk

The next agent reads this checklist before diving into the S5.3
backend implementation and won't accidentally push to main, skip
hooks, target production env, or burn creds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend re-implementation of cross_table_pairs + POST
/cross-table-query route landed in commit 7157bde on
feat/ndi-python-phase-a. The post-crash completion checklist is now
empty of substantive work. Next session moves to Step 2 — the
exhaustive test matrix.

Updates to apps/web/docs/reviews/2026-05-19b-post-handoff-execution.md:
- New "🟩 IF YOU'RE THE SESSION AFTER S5.3 BACKEND LANDS" block at
  the top, listing verifications + remaining deferred items
- Legacy completion-checklist row for S5.3 marked struck-through
  with commit ref
- S5.3 detail section headed "✅ SHIPPED in commit 7157bde"
  preserving the design for historical reference + linking to the
  shipped impl + tests
- Pre-flight checklist updated: backend HEAD = 7157bde
- Update history entry for the s5.3-completion session

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit closes the post-crash completion run. Every deferred
item from the post-S5.3 checklist (aside from S-1 through S-4 SDK
upstream asks held by user) has now been resolved:

- S5.3 backend cross_table_pairs + POST route — commit 7157bde
- F-1 integration tests (column shape + alias + pagination) —
  commit f6ecb83
- F-4 stable query keys — verified already shipped (grep audit)
- Mobile <375px thorough audit — found no remaining issues beyond
  what fd44603 already shipped. Added exhaustive grep matrix to
  the handoff doc.
- Card gap thorough audit — verified harmonious space-y / gap
  rhythm across components/datasets/, components/ui/Card.tsx, and
  components/workspace/PanelCard.tsx. No code changes needed.
- Dabrowska totalDocuments=0 — diagnosed as upstream cloud-node
  state (isPublished:true + documentCount:0 + empty documents
  array on BOTH prod + experimental envs). Backend is correct;
  flagged for cloud-node team.

Branch state at close:
- Cloud-app feat/experimental-ask-chat: HEAD updated by this commit
- Backend feat/ndi-python-phase-a: HEAD f6ecb83 (1128 tests)

Next step per the user's plan: Step 2 — the exhaustive test matrix
(8 datasets x ~10 panels x 17 chat tools + G2/G3 + Safari verifies).
Wait for fresh test creds before starting; current 3 are
rate-limited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test-matrix Agent A surfaced a regression in the dataset overview
page: Haley's overview rendered sessions=3 while
/api/datasets/682e7772cdf3f24938176fac/summary correctly returns
counts.sessions=2 (B6-filtered).

Root cause: the 2026-04-28 +1-session correction in
overview-content.tsx unconditionally re-sourced counts.sessions
from raw classCounts.session, clobbering B6's backend
parent-session filter. For Haley the synthesizer returns 2 (parent
filtered), but classCounts.session is still the raw 3.

Fix: gate the override on summary < raw — only re-source from
class-counts when the backend has NOT already filtered. Preserves
the original wrapper-subtract-1 case (session_in_a_dataset only)
and Bhar's no-op (summary == raw == 2 unchanged).

Tests (3 new in dataset-detail-shells.test.tsx > OverviewContent >
session-count override (B6 compatibility)):
- Haley-like: trusts B6-filtered summary, ignores raw class-counts.session
- Bhar-like: summary == raw, override is a no-op
- pure-wrapper: synthesizer fell back to session_in_a_dataset → subtract 1

CI: pnpm lint clean, pnpm typecheck clean, 2231 tests pass (was 2228 + 3).

Refs: audit/2026-05-19-test-matrix/agent-A.md "Haley sessions count
stuck at 3" finding

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Post-completion-run test matrix executed via 3 parallel Playwright
agents (datasets 1-4, datasets 5-8, 17 chat tools). All three agents
hit AUTH_RATE_LIMITED (HTTP 429) within ~5 logins, gating most UI
coverage. The matrix still produced rich findings via the public
catalog UI + same-origin public read endpoints.

8 NEW BUGS surfaced:
- NEW-1 P0: Catalog Overview Sessions count override undid B6 filter
  (Haley showed sessions=3 vs /summary correctly returns 2). FIXED in
  commit 3e0c28d earlier this session.
- NEW-2 P0: Workspace router silently substitutes to default
  workspace (68839b1f...) when user lacks org access. No 403, no
  notice. Burns rate-limit budget via /create-account redirect
  cascade. OPEN — recommended fix in report.
- NEW-3 P1: Dataset card header numberOfSubjects (281) disagrees with
  COUNTS panel (0) on Dabrowska. Two surfaces sourced from different
  endpoints. OPEN.
- NEW-4 P1: Cmd+K from workspace opens DIFFERENT workspace. May be
  same root cause as NEW-2. OPEN.
- NEW-5 P1: Vercel preview auth instability — session cookies appear
  to fail re-validation after 30-60s. Agent B hypothesis: cookie
  domain scoping bug despite cookie_attrs.py looking correct. OPEN.
- NEW-6 P3: .playwright-mcp/ snapshots persisted plaintext passwords.
  21 files affected, all scrubbed in-place this session. Never
  committed to git (gitignored). FIXED.
- NEW-7 P2: Placeholder DOI text "https://doi.org://10.1000/123456789"
  on DS6/7/8 (data-ingest pipeline issue). OPEN.
- NEW-8 P2: DS8 (Mukherjee gustatory) is a 99-byte stub with 0
  sessions/epochs. Probably shouldn't be marked Published. OPEN.

Coverage delivered:
- 4/8 datasets catalog UI verified live
- 1/8 datasets workspace shell verified live
- 4/8 datasets API characterized + known-good demo doc IDs harvested
- 0/17 chat tools exercised (all blocked at login)

Branch state at end of run:
- Cloud-app feat/experimental-ask-chat: HEAD 3e0c28d (Haley fix)
- Backend feat/ndi-python-phase-a: HEAD f6ecb83 (unchanged)

Next session priorities (in order):
1. Fix NEW-2 workspace router substitution (highest-impact bug)
2. Investigate NEW-5 Vercel preview auth instability
3. Re-run test matrix with fresh creds + NEW-2/NEW-5 fixed
4. Then Step 3 — Tools-along-boundaries canvas redesign

Full report at apps/web/docs/reviews/2026-05-19-test-matrix-results.md
(includes known-good demo doc IDs appendix for tutorial handout).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test-matrix Agent A surfaced an inconsistency on Dabrowska's
catalog page: the hero strip shows "Subjects: 281" (from the
dataset record's precomputed numberOfSubjects field, sourced from
the paper's reported sample size) while the synthesized COUNTS
panel below correctly shows "Subjects: 0" (live-computed from
documents, which haven't been ingested upstream — Dabrowska's
isPublished=true + documentCount=0 published-but-empty state).

Two surfaces on one page disagreeing about subject count misleads
users about the dataset's contents-of-record state.

Fix: treat documentCount === 0 as the authoritative signal that
the documents-of-record are absent, and suppress the precomputed
numberOfSubjects fact in the hero. The synthesized COUNTS panel
correctly shows 0 — the hero now stays silent on subjects when
documents are zero.

Other facts (Documents, Size, License) still render with their
honest values. When documents come back, numberOfSubjects renders
again automatically.

Tests: 1 new in DatasetDetailHero.test.tsx pinning the
Dabrowska-like (documentCount=0, numberOfSubjects=281) case.
Existing tests for happy-path Subjects rendering (with
documentCount>0) preserved.

CI: pnpm lint clean, pnpm typecheck clean, 2232 tests pass.

Refs: apps/web/docs/reviews/2026-05-19-test-matrix-results.md NEW-3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the test-matrix synthesis to reflect that NEW-3 was also
fixed inline this session (alongside NEW-1 + NEW-6 already noted).

Updates:
- 2026-05-19-test-matrix-results.md
  - TL;DR now shows 3 fixes shipped this session
  - NEW-3 section reclassified as FIXED with commit ref 1583a33
  - Recommendations section updated to reflect NEW-3 done +
    deeper investigation notes for NEW-2 and NEW-5
- 2026-05-19b-post-handoff-execution.md
  - New "IF YOU'RE THE SESSION AFTER TEST MATRIX LANDS" block at
    the top with the 8-bug status table and next-session
    priorities
  - Previous "all completion work" block demoted to legacy

Test-matrix fix summary across this session:
- 3e0c28d — NEW-1 P0 Overview Sessions count B6 compatibility
- 1583a33 — NEW-3 P1 Dabrowska hero/COUNTS disagreement
- (security) NEW-6 P3 .playwright-mcp/ password scrub (local-only,
  21 files scrubbed in-place, never committed to git per gitignore)

5 bugs remain OPEN (NEW-2, NEW-4, NEW-5 P1; NEW-7, NEW-8 P2 data-
ingest). NEW-2 (workspace router substitution) is the highest-
impact and should be the next session's first priority.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NEW-5 investigation (curl-verified this session):
- /api/auth/csrf from preview Origin returns host-only cookie (no
  Domain attribute) ✅
- /api/auth/csrf from apex Origin returns Domain=.ndi-cloud.com ✅
- cookie_attrs.py logic is correct; Agent B's "cookies scoped
  wrong" hypothesis was wrong.
- BUT: GET / on the preview URL itself returns HTTP 401 with
  _vercel_sso_nonce. The preview is gated by Vercel SSO
  Deployment Protection. Playwright agents authenticated via a
  saved Chromium state Vercel SSO token; when that token expired
  (~1h TTL) every subsequent navigation became a 401 challenge.
  The "session loss" agents observed was Vercel-layer SSO, not
  NDI-layer session.

Root cause: Vercel SSO Deployment Protection on the preview.
Fix is operational, not code — set up a Vercel Automation Bypass
Token on the project's Deployment Protection settings.

NEW-2 reclassification (P0 → P1, likely-secondary-of-NEW-5):
Audited every workspace route handler + middleware-equivalent
surface in this session — no code path substitutes dataset IDs.
With NEW-5 root-caused, the URL substitution agents observed is
most likely a Vercel SSO redirect chain artifact, not a
workspace-router bug. Final verdict deferred until the test
matrix re-runs with the bypass token; if NEW-2 still reproduces
it's a real bug.

Updates:
- 2026-05-19-test-matrix-results.md
  - NEW-5 section: full investigation log + 3 operational fix
    options (recommended: Vercel bypass token)
  - NEW-2 section: reclassified as likely-secondary; audit
    record of all surfaces checked
  - TL;DR: now reflects 1 root-cause + 1 reclassification
  - Recommendations: bypass-token + matrix re-run is the #1
    next-session priority
- 2026-05-19b-post-handoff-execution.md
  - New "🟪 IF YOU'RE THE SESSION AFTER NEW-2/NEW-5 INVESTIGATION"
    block at the top with the curl results + USER ACTION REQUIRED
    callout for the bypass token
  - Previous "🟦 IF YOU'RE THE SESSION AFTER TEST MATRIX LANDS"
    demoted to superseded

Next session priorities (per the new top block):
1. Confirm user has set up the Vercel Automation Bypass Token
2. Re-run the test matrix with the token plumbed through
3. If NEW-2/NEW-4 still reproduce: investigate as real code bugs
4. Then Step 3 — canvas redesign (held for design Q&A)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix: when safeFetchDataset returns null (dataset doesn't exist,
user lacks access, OR transient network blip), WorkspaceShell rendered
the bare 24-char datasetId as the h1. The user had no signal as to
whether the dataset was missing, gated, or just slow.

Test-matrix Agent A NEW-3 follow-up: users who navigate to a workspace
they can't access see a confusing bare-hex h1 with no recovery path.

Fix: when data is null, the h1 still renders the datasetId (preserved
for share-link debuggability — operators need to be able to read the
id back to the user from the URL), but with a fallback notice below
explaining the degraded state + offering a link to the public catalog
detail page (anonymous-readable, works even when workspace API paths
are gated) plus a "browse all datasets" fallback.

The H1 styling for the null-data case is now `font-mono` so the
unparseable hex id is visually distinct from a real title.

CI: pnpm lint clean, pnpm typecheck clean, 2232 tests pass.

Refs: apps/web/docs/reviews/2026-05-19-test-matrix-results.md
NEW-2/NEW-3 follow-up UX improvement

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step-by-step instructions for setting up the Vercel
Protection-Bypass-for-Automation token on the ndi-cloud-app-web
project so the next test-matrix re-run can sustain Playwright
sessions on the preview deploy.

Without this setup, automated test agents cannot reliably
exercise the preview because Vercel's SSO Deployment Protection
gate (~1h TTL) expires mid-session, masquerading as NDI session
loss.

New doc: apps/web/docs/operations/vercel-automation-bypass-setup.md

Covers:
- The 3-minute Vercel UI setup
- Two equivalent agent code patterns (cookie set vs header per
  request)
- Plumbing strategy for the test-matrix dispatcher
- Security posture (token scope, revocation, audit logs)
- Verification curl commands
- Alternative: disable Deployment Protection on the experimental
  preview branch (security trade-off discussion)

Also updated the post-handoff doc to reference the new setup
guide from the "🔑 USER ACTION REQUIRED" callout.

Refs: apps/web/docs/reviews/2026-05-19-test-matrix-results.md NEW-5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User pushed back on the earlier follow-up claim (commit 5559e53)
that NEW-2/NEW-5 were caused by Vercel SSO Deployment Protection.
The pushback is correct: plain curl returns 401 because it has
no cookies, but Playwright agents AND real Chrome both reached
/my and the workspace shell past any Vercel-layer auth in this
session AND in prior sessions (e.g. the late-evening panel-
exercise pass that landed the patch-clamp 21 sweeps and
BehavioralTrack 1985-segment renders). Vercel SSO is not the
test-matrix blocker.

Updates:
- 2026-05-19-test-matrix-results.md
  - NEW-5 section: Vercel SSO claim retracted; reopened with
    "real cause unknown"; standing hypothesis is test-cred
    org-access limitation
  - NEW-2 section: removed "likely-secondary-of-NEW-5"
    classification; reopened as P0 with unknown root cause
  - TL;DR: reflects the retraction
  - Recommendations: ask user to reproduce in actual Chrome
    BEFORE chasing as a product bug
- 2026-05-19b-post-handoff-execution.md
  - New "🟫 IF YOU'RE THE SESSION AFTER NEW-2/NEW-5
    INVESTIGATION (with retraction)" block at the top,
    replacing the bad-claim "🟪" block
  - Final status table now shows NEW-2/NEW-4/NEW-5 all OPEN
- DELETED: apps/web/docs/operations/vercel-automation-bypass-setup.md
  (the underlying claim it documented was wrong)

What remains good from the investigation:
- /api/auth/csrf cookie-attrs verification (host-only on preview
  Origin, Domain on apex) ✅
- Workspace route-handler code audit (no substitution found in
  cloud-app routing code) ✅
- The WorkspaceShell friendly fallback commit d06e9e2 (legit UX
  improvement regardless of the root-cause confusion)

Next session: ask the user to reproduce in actual Chrome with
their own creds before chasing NEW-2/NEW-4/NEW-5 as product
bugs. If they don't reproduce there, they're Playwright-agent
or test-cred-specific artifacts.

CI: 2232 cloud-app tests still pass; no code changes in this
commit (docs + deletion only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ills

Two fixes addressing direct user feedback from the post-test-matrix
session:

1. Documents picker left-click now auto-fills the matching panel
   (user: "the selector never works — you select a document, you
   always have to manually copy paste it").

   Root cause: DocumentsPicker was passing primaryId={null} +
   onPrimaryChange={()=>undefined} to WorkspaceDataGrid. Single-click
   on a row did NOTHING. All 4 specific pickers (Subjects / Sessions
   / Probes / Stimuli) wired primary selection correctly — Documents
   picker was the only gap.

   Fix: new `lib/workspace/class-to-selection-key.ts` maps each NDI
   doc class to the workspace's 5-key selection dimension. The
   Documents picker reads from selection[targetSlot] for primaryId
   and writes to selection[targetSlot] on click. For unmapped
   classes (treatment, ontologyTableRow, daqsystem, etc.) the
   picker shows a hint banner pointing the user at right-click "Set
   as…".

   Mapping highlights:
   - imageStack → session (so VideoPlaybackPanel auto-fills)
   - subject / openminds_subject → subject
   - element / probe / probe_location → probe
   - element_epoch / epoch / epochfiles_ingested / etc. → session
     (mirrors the backend _CLASS_ALIASES chain)
   - stimulus_presentation / stimulus_response → stimulus
   - vmspikesummary / neuron_extracellular / *_tuning_calc → unit

2. Video panel extended to also render still images (user: "if we
   have a video viewer that takes image stacks, why not also let
   the same tool show images?").

   VideoPlaybackPanel now branches on formatOntology:
   - NCIT:C190180 (MP4 video, Bhar use case) → ImageStackVideoViewer
   - NCIT:C70631 / NCIT:C85437 (PNG-family stills, Haley use case)
     → ImageViewer (PIL-decoded, zoom + frame stepper)
   - anything else → friendly "unsupported" message naming the
     ontology codes the panel DOES support

   Title now reads "Media playback"; icon picks Image vs Video
   based on the doc shape. Empty-state copy explains both
   subdomains. Filename + component name preserved
   (VideoPlaybackPanel.tsx) for import stability.

Tests:
- 10 new in `tests/unit/lib/workspace/class-to-selection-key.test.ts`
  (class → slot mapping, null returns, case-sensitivity contract,
  round-trip consistency)
- 7 new in DocumentsPicker.test.tsx (primary-select wiring per class,
  unmapped-class hint banner, no-op onPrimaryChange for unmapped)
- 4 new in VideoPlaybackPanel.test.tsx (image branch — PNG
  formatOntology mounts ImageViewer; mask formatOntology too; loading
  skeleton during /data/image fetch; error fallback)
- Existing tests updated to match new copy: "Pick a media document"
  empty-state, "doesn't contain renderable media" unsupported
  message, NCIT:C999999 (truly-unsupported) fixture for the
  unsupported-format test

CI: 2253 cloud-app tests pass (was 2232 + 21 new); lint clean;
typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…able_query

Steve's "Show code" feedback: he wanted to see the snippet load data
from the cloud + plot it, with intervention points between the two so
users can do something different mid-pipeline. Three gaps surfaced:

1. **fetch_signal had a TODO for picking the binary file off
   doc.files** — Steve specifically flagged "I guess it should go as
   far as loading the data from the cloud." The new flow has 4 named
   steps:
     Step 1: fetch the doc
     Step 2: pick the binary file off doc.files
       - Skips metadata files (channel_list.bin etc.) per the prior
         smart-binary-picker work
       - Prefers .nbf / .vhsb / .dat / .bin; largest file by size
     Step 3: download the bytes via fetch_cloud_file (Python) /
       getFile (MATLAB)
     Step 4: decoder note pointing at .nbf_read / .vhsb_read / numpy
       (the natural intervention point)

2. **get_document had NO mapping** — Video/Media panel emitted toolName
   "get_document" which fell to the default TODO. Now branches by
   imageStack.formatOntology:
     - NCIT:C190180 (MP4 video) → download file for local playback
     - NCIT:C70631 / NCIT:C85437 (PNG-family) → PIL decode +
       matplotlib (Python) / imread + imagesc (MATLAB)

3. **cross_table_query had NO mapping** — BehavioralCompare cross-mode
   (S5.3) emitted "cross_table_query" which fell to default TODO. Now
   emits a clean pandas (Python) / containers.Map (MATLAB) pipeline:
     Step 1: fetch ontologyTableRow docs via ndi_query "isa"
     Step 2: find X + Y columns by substring match (mirrors the
       backend _find_matching_group logic)
     Step 3: inner-join on subjectDocumentIdentifier (subject join)
       OR fetch treatment-class labels (treatment join)
     Step 4: matplotlib scatter / gscatter — colored by groupBy
       when set

All three branches in both Python AND MATLAB. Steve will see the same
structure in either tab.

CI: 2260 cloud-app tests pass (was 2253 + 7 new); ruff/lint clean;
typecheck clean. 56 code-export tests including the 7 new ones pin
the contract: "no TODO for fetch_signal pick-the-file", "branch on
NCIT:C190180 vs NCIT:C70631 for get_document", "fetch_treatment +
strip plot for cross_table_query treatment join", etc.

Pattern (Steve's bar):
  1. Fetch the doc(s)
  2. Pick / extract the relevant fields
  3. Compute or transform
  4. Plot
  → Each step is a separate block with a banner comment, so the user
    can stop / introspect / modify between any two steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The user pushed back (correctly) on the prior turn's claim that the
Show-Code snippets were "good enough to send to Steve." The
snippets have the right STRUCTURE (data-load → extract → plot with
named "Step N" banners), but they likely do NOT run end-to-end
because:
- No install header
- No auth flow shown
- Some referenced NDI-python API names may not exist (the existing
  audit comments in the code already flagged this in places)
- Binary decoders may need separate packages
- NDI-matlab audit is even thinner

The user requested we hold further Show-Code generator changes
until a deep audit lands. This commit captures the deep-dive scope
in a fresh top block on the existing handoff doc.

New block ("🟧 IF YOU'RE THE POST-COMPACTION AGENT (Show-Code
DEEP-DIVE arc)") covers:

- Branch state with exact SHAs (cloud-app: 4a0ddd7; backend:
  f6ecb83)
- Sacred rules brief (NEVER push to main, author rule, etc.)
- 9 bugs / improvements shipped this multi-turn arc (don't redo)
- Retracted misdiagnoses (Vercel SSO was NOT NEW-5; NEW-2 confirmed
  by user as Playwright artifact, not product bug)
- 5 OPEN bugs with status (NEW-2/4/5 = Playwright-specific per
  user verification in real Chrome; NEW-7/8 = data-ingest)
- Explicit user HELDs (more Show-Code work, canvas redesign, S-1
  to S-4, test matrix re-run)
- 9 deep-dive topics in priority order, starting with NDI-python
  public API audit (output: apps/web/docs/operations/
  ndi-python-api-audit.md)
- "What to do FIRST" — verify branch state, read the whole doc,
  ask the user to pick a starting topic, do NOT touch the
  generator until the audit lands
- Test cred status (all 3 likely burned ~1h recovery)
- CI state at close (2260 cloud-app + 1128 backend tests passing)
- Operational gotchas (pnpm-lock at root, Vercel/Railway redeploy
  windows, structlog filter quirk)

The original "🟫" / "🟦" / "🟩" blocks are preserved below for
historical context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three deep audits + a memory/crash investigation landed this session;
this commit ships the audit-driven fixes to the Show-Code snippet
generators so the emitted Python + MATLAB actually runs against the
published SDKs.

Top bugs surfaced + fixed (full table in code-export-coverage-matrix.md):
- pip install ndi-python was wrong (package is `ndi`, not on PyPI); use
  pip install git+https://github.com/Waltham-Data-Science/NDI-python.git
- ndiqueryAll(datasetId, …) is wrong; first arg is scope literal.
  Five Python emitters now use ndiqueryAll("public", …) + post-filter.
- fetch_cloud_file(uri) is wrong; real sig is (uri, target_path) -> bool.
  Three Python emitters now write to ~/.ndi/cache/<datasetId>/ and check.
- nbf_read doesn't exist in vlt; use ndicompress.expand_ephys for .nbf.
- vhsb_read takes (fo, x0, x1), not (path); fixed in both languages.
- /api/facets is a Next.js route → Python hits via urllib + Bearer;
  MATLAB errors explicitly with S-3 PR pointer.
- MATLAB getFile sig is (downloadUrl, localPath, ...) after
  getFileDetails — fixed in fetch_signal / fetch_image / get_document.
- Canonical snake_case (treatment.numeric_value, vmspikesummary
  .sample_times, stimulus_presentation.presentation_time.onset) vs
  cloud-app's camelCase projection — every accessor now checks both.
- MATLAB getDocument flat vs bulkFetch wrapped envelope — added
  _doc_body / _vm_body unwrap helpers.
- MATLAB cross_table_query was passing q.searchstructure (wrong) +
  'page_size' (wrong) — fixed to q + 'pageSize' + bulkFetch hydration.
- No auth pre-flight — Python docstring lists USERNAME/PASSWORD or
  TOKEN/ORGANIZATION_ID env vars; MATLAB %% Step 0 guards the path
  then calls ndi.cloud.authenticate().
- ndi_dataset_overview was hitting the default TODO — added the
  emitter to both languages (composes getDataset + documentClassCounts).

Test surface: 65 code-export tests pass (32 Python + 33 MATLAB), of
which 9 new pinning tests protect the audit-driven shapes from silent
regression. Full suite 2269 passing, lint + typecheck clean.

New ops docs:
- ndi-python-api-audit.md — per-emitter audit, file:line grounded
- ndi-matlab-api-audit.md — same for MATLAB (matters most for Steve)
- code-export-coverage-matrix.md — synthesis, panel × tool matrix, the
  S-1 through S-4 SDK upstream PR asks
- 2026-05-19c-memory-crash-investigation.md — root-cause for the user's
  computer crashes (15 GB locked .claude/worktrees + 226 MB live JSONL
  transcript + 6 GB colima VM; cloud-app code is NOT a culprit)

Handoff doc updated with 🟪 block summarizing this session and what's
still held (live verification, Modal UX, S-1 → S-4 upstream PRs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the cheap-but-real layer of the co-versioning safety idea documented
in code-export-coverage-matrix.md.

The snippet generators emit SDK names by string. The audit docs verify
each name against the published SDK source by file:line. Those two
artifacts are hand-written and could drift apart silently. This commit
bridges the gap with:

- lib/ndi/code-export/sdk-surface.json — the AUDITED truth: every import,
  function name, signature, and audit_ref for both NDI-python and
  NDI-matlab, plus a `_explicitly_does_not_exist` list of names we
  must NEVER emit (e.g. `vlt.file.custom_file_formats.nbf_read`,
  `ndi.database.openbinarydoc`).

- tests/unit/ai/code-export/sdk-surface.test.ts — 46 assertions that
  invoke every emitter once, then check the produced snippet against
  every entry in sdk-surface.json. CI fails if the generator emits a
  banned name OR stops emitting an audited one.

Total test surface: 111 code-export tests (32 Python + 33 MATLAB +
46 co-versioning). Full suite 2315 passing.

The DYNAMIC layer (pytest against installed NDI-python in CI) waits
for NDI-python to publish to PyPI — sketched in §"Co-versioning safety
idea" of the coverage matrix doc.

Minor fix: matlab get_facets error message reworded so it no longer
embeds the literal `ndi.cloud.api.datasets.getFacets` token (which the
co-versioning check flags as banned). The error still points at the
S-3 PR ask.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Curling https://ndb-v2-experimental.up.railway.app for Bhar's imageStack
returned the doc body's `files` field in the canonical NDI shape:

  body.files = {
    file_list: ["imageStack"],
    file_info: {name, locations: {location, uid, ...}}
  }

— NOT a list of {uri, name, size} entries (the projection the previous
emitters assumed). Two real bugs uncovered:

1. doc.get("files") returns a dict, not a list. The previous "files = doc.get('files') or []"
   pattern iterated dict KEYS instead of file entries.

2. The location string is a raw pre-signed S3 URL by default, NOT an
   ndic:// URI. fetch_cloud_file REQUIRES the ndic:// form (it calls
   parse_ndic_uri). Users must first call ndi.cloud.filehandler
   .updateFileInfoForRemoteFiles(body, datasetId) to rewrite locations
   to the ndic:// form in-place.

Applied the fix to all three Python file-touching emitters
(fetch_signal, fetch_image, get_document):

- Unwrap envelope: body = doc.get("data") if isinstance(...) else doc
- Call updateFileInfoForRemoteFiles(body, datasetId) to normalize URIs
- Walk file_info defensively (dict or list); same for locations
- Pick the first location's .location as the ndic:// URI

Same shape fix applied to MATLAB emitters (fetch_signal, get_document):
walk doc.files.file_info as a struct or struct array; parse the
.locations.location URI; extract fileUID for getFileDetails → getFile.

New: updateFileInfoForRemoteFiles added to sdk-surface.json (cited in
filehandler.py:51-118).

Live-verification finding documented in code-export-coverage-matrix.md
§"Live verification finding" with the actual curl response shape +
both surprises.

Test surface: 112 code-export tests pass (32 Python + 33 MATLAB + 47
co-versioning). Full suite 2316 passing; lint + typecheck clean.

What's still NOT end-to-end verified: actually running the snippets
in a real Python/MATLAB kernel against a (dataset, doc) pair. Topic
#6's natural next step — but the shape gap that would have crashed
the snippets at the first file-access is now closed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a fresh "GitHub Template arc" top block to the post-handoff
execution doc. Captures the design pivot (Steve + Eivind brainstorm
→ green-light to prototype the template repo), what landed this
session (the local `ndi-analysis-template` repo at commit 3fb2567),
and the ordered punch list for the next session: push to GitHub
under the chosen org, mark as Template, add 6 more plot modules, do
the cloud-app NextAuth + button work, then ZIP/Colab/Codespaces
deep-links.

Also notes the side-effect of this session's memory cleanup:
`pnpm store prune` invalidated `apps/web/node_modules` hardlinks,
so the next agent touching cloud-app code must run `pnpm install`
first.

Marks the prior "🟪 Show-Code audit + fixes" block as superseded so
the next agent doesn't double back into emitter work — the template
repo subsumes that direction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the cloud-app side of the GitHub Template arc from
2026-05-19b-post-handoff-execution.md. Adds two buttons next to
every workspace panel's existing ShowCodeButton + every chat
tool-call message:

  - "Open in GitHub" → POST /api/github/create-analysis-repo
      → octokit.rest.repos.createUsingTemplate against the private
        Waltham-Data-Science/ndi-analysis-template repo
      → commits current_analysis.py pre-populated with the panel's
        exact args via a thin generator (lib/ndi/code-export/
        current-analysis.ts) that imports plots.plot_X functions
        from the template
  - "Download as ZIP" → POST /api/github/download-analysis-zip
      → fetches the template tarball via a server-side PAT
        (GITHUB_APP_TOKEN) since the template is private
      → repacks as a .zip with current_analysis.py injected at the
        slug-prefixed root

OAuth lives in lib/github/oauth.ts as a "linked-account" cookie
(HttpOnly + AES-256-GCM via GITHUB_TOKEN_ENCRYPTION_KEY) rather
than NextAuth, so the existing FastAPI cookie-session auth stays
the single source of truth. /api/github/status surfaces the
merged "configured + linked" verdict for the client.

Button is gated client-side on NEXT_PUBLIC_GITHUB_INTEGRATION_ENABLED
and server-side on the three env vars (GITHUB_CLIENT_ID,
GITHUB_CLIENT_SECRET, GITHUB_APP_TOKEN). Renders disabled with a
tooltip when unset — never crashes.

Tests (+51): create-analysis-repo (7), download-analysis-zip (5),
slug helpers (8), oauth helpers (16), current-analysis emitter
(7), OpenInGitHubButton component (5). Total cloud-app suite:
2367 passing across 191 files. lint + typecheck + build clean.

ADR-010 documents the decision tree and the 12 new files. COMPLIANCE
gains a §8 External services row for the new GitHub integration.

Phase 1 of the workflow — non-functional in production until the
user provisions GITHUB_CLIENT_ID etc. on Vercel Preview. The
template repo itself was pushed to Waltham-Data-Science last
session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Marks the arc ~95% complete and rewrites the "what remains" section
into a short user-side punch list (env vars on Vercel Preview, pin
smoke doc IDs, add CI secrets, license pick, Colab/Codespaces
deep-links).

The three pillars built across this multi-agent session:

1. Python template — Waltham-Data-Science/ndi-analysis-template
   (private, GitHub Template flag set), 9 plot modules + 68 unit
   tests + 10 smoke scaffolded. Commits 3fb2567 + 2fb1ac6.

2. MATLAB template — Waltham-Data-Science/ndi-analysis-template-matlab
   (private, GitHub Template flag set), 9 plotXxx.m functions + 3-job
   CI matrix via matlab-actions/setup-matlab. Commit 872f4e8.

3. Cloud-app side — 6 new API routes (/api/github/*) + OpenInGitHub
   Button on all 10 panel + chat surfaces + linked-account OAuth +
   ADR-010 + 51 new tests. Commit 4e85ef8 on feat/experimental-ask-
   chat. 2367/2367 cloud-app tests passing across 191 files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds apps/web/docs/HANDOFF.md as the single source of truth for the
next session — covers all four repos (cloud-app, ndb-v2,
ndi-analysis-template Python + MATLAB), sacred rules, test creds,
production vs experimental state, the GitHub Template arc, the
recent Railway outage + recovery procedure, recent commit timeline,
and a prioritized punch list for what's left to do.

Marks 21 prior dated docs (handoffs, audits, plans, reviews) as
SUPERSEDED with a one-line header pointing back to HANDOFF.md.
Files retained for archaeology — git history is the safety net.

Deletes 5 truly-redundant artifacts:
  - 2026-05-14-pre-compact-handoff.md (V1)
  - 2026-05-14-pre-compact-handoff-v2.md (V2)
  - 2026-05-15-pre-compact-handoff-and-execution-plan.md (dup of master plan)
  - 2026-05-16-pre-compact-handoff.md (superseded multiple times)
  - 2026-05-18-f1-stimulus-projection-stub.diff (already-applied binary patch)

CLAUDE.md's "Where to read next" section now points at HANDOFF.md
rather than a stale laundry list of dated docs. The S5.3 line is
updated to reflect that it SHIPPED on feat/ndi-python-phase-a
(commit 7157bde on the backend).

Cleans up workspace-snapshot.md (stale Playwright artifact at repo
root) + adds a gitignore guard so similar snapshots can't accidentally
get committed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant