[DO NOT MERGE — experimental] Ask chat for Shrek demo (audri's green-light required)#160
Draft
audriB wants to merge 188 commits into
Draft
[DO NOT MERGE — experimental] Ask chat for Shrek demo (audri's green-light required)#160audriB wants to merge 188 commits into
audriB wants to merge 188 commits into
Conversation
Design for an anonymous public chatbot demo over the published NDI Commons catalog. Showcase target: Shrek (existing LabChat customer, prospect for data services). Lives behind a feature branch + dual env gate so the demo can be reviewed on a Vercel preview without ever touching production. Scope is intentionally tight to keep the demo throwaway-safe: anonymous-only, public-data-only, ephemeral conversation, 5 tools backed by existing FastAPI public endpoints, no MongoDB schema changes, no auth changes. Companion impl plan generated next via superpowers:writing-plans. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13-task TDD-style plan covering the full build: deps + env + flag, rate-limiter, system prompt, tool handlers, route handler, chat components, page assembly, nav integration, e2e smoke, build + PR. Companion to 2026-05-11-experimental-ask-chat-design.md. Will be executed inline next. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the dependency set for the experimental Ask chat (Vercel AI SDK v5 + Anthropic provider + react-markdown + @ai-sdk/react for the hooks), extends the zod env schema with two new optional vars (ANTHROPIC_API_KEY for the route gate, NEXT_PUBLIC_ASK_ENABLED for nav visibility), and lands the feature-flag helpers + unit tests. No runtime surface changes yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simple in-memory token bucket: 10 requests / 10 min per IP. Sliding window. Documented edge-runtime caveat (per-instance memory) and swap path to Vercel KV if this ever escapes prototype scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hand-tuned for scope-locking + anti-fabrication + identity-anchoring. Tests pin the critical clauses so a future edit can't accidentally strip a safety guarantee. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each tool proxies to an existing FastAPI public endpoint with
zod-validated input, 8s timeout, anonymous fetch, and { error }
fallback on failure. Tools are also exported as AI SDK tool()
definitions for direct binding to streamText.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Streams Claude Sonnet completions via the AI SDK with 5 tools bound. Fails closed on missing API key (503), rate-limited per IP (429), and validates body shape (400). Uses AI SDK v5's stopWhen + stepCountIs (replaces v4's maxSteps) and convertToModelMessages to bridge UIMessage<->ModelMessage at the boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d, ToolCallIndicator) Six presentational components for the /ask chat surface: - Markdown: react-markdown + remark-gfm with internal link rewriting - ChatMessage: user/assistant bubble with role-based styling - ChatInput: textarea + Send, Enter-to-send (Shift+Enter newline) - SuggestedPromptChips: starter prompts shown on empty thread - ToolCallIndicator: inline "browsing the catalog…" while tools fire - ChatThread: scrollable container with smart auto-scroll heuristic Sized so the ask-shell composition stays small. No business logic in these — they accept handlers and render. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Composes thread + chips + input. v5 useChat differences handled:
input state is local, transport is DefaultChatTransport configured
to /api/ask, sends via sendMessage({ text }). Adapts UIMessage[]
parts shape into our ThreadEntry[] so tool-call indicators
interleave with assistant text in the same order the model
emitted them. Friendly error banner for 503/429/network.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RSC page gates on askEnabled() server-side (defense in depth with the route handler's 503). noindex metadata since the preview isn't SEO content. Scoped not-found for any future sub-routes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inserts the new tab between Platform and About so it reads as a product surface. Hidden by default — NEXT_PUBLIC_ASK_ENABLED=1 required for the link to appear. Independent gate from ANTHROPIC_API_KEY (which controls the route) so we can deploy the backend without surfacing the tab, or vice versa. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mocks the AI SDK v5 UI message stream so the chat flow exercises end-to-end without a live Anthropic key. Tests skip gracefully if the feature flag is off. Mobile viewport test runs unconditionally and asserts no horizontal overflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously, env entries like ANTHROPIC_API_KEY='' (empty string) tripped the min(20) check, failing parseEnv for any caller (tests that set the var to '' deliberately, dotenv files with placeholder 'KEY=' rows, etc.). The preprocess() short-circuits empty strings to undefined so optional() applies cleanly. Caught by the unit-test sweep at Task 13. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
CI e2e on PR #160 caught a pre-existing footer-layout bug at viewports <~400px: the mailto link `info@walthamdatascience.com` (unbreakable string) expands its grid column to its intrinsic min-content width, overflowing the page horizontally by ~23px. This has actually been live on every marketing page on mobile since launch — never caught because no prior e2e checked document.documentElement.scrollWidth vs clientWidth. The new /ask test surfaced it, and the fix is the same 2-class change that helps everywhere: min-w-0 on the grid item lets it shrink, break-words on long links lets them wrap. Verified: ask.spec.ts mobile-viewport test now passes (375x667). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the dependency + env entry for the build-time RAG index that the next commits will land. Matches vh-lab/shrek-lab's choice of voyage-4-large @ 1024-d so the same key works across all three chatbots. devDependency (not dependencies) — the SDK is build-time-only. Runtime query embedding will use Voyage's REST API via fetch so the edge bundle stays clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three-tier metadata pattern adapted from vh-lab/shrek-lab:
1. `lib/ai/dataset-metadata.json` — hand-curated sidecar mapping
dataset IDs to {highlights, keywords, notableMethods, piContext}.
Author facts the catalog API doesn't expose (e.g., "this is
the only public tree shrew V1 dataset") and they end up in the
embedded chunk text so semantic queries can find them.
2. `scripts/build-ask-index.mjs` — one-shot build:
- Paginates the catalog
- Enriches each dataset with the summary endpoint
- Composes a document string per dataset (catalog + sidecar)
- Batch-embeds via Voyage AI voyage-4-large (1024-d, same as
vh-lab + shrek-lab so the key is shared)
- Writes lib/ai/dataset-index.json (committed to git)
3. `lib/ai/dataset-index.json` — empty placeholder. Run the script
to populate. Runtime tool returns gracefully when entries=[].
Run with: pnpm --filter @ndi-cloud/web build-ask-index
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the runtime side of the experimental Ask chat's RAG layer:
- lib/ai/index-loader.ts: loads dataset-index.json, lazily promotes
embeddings to Float32Array, exposes cosineSimilarity + topKByVector.
Tested with synthetic 3-d fixtures so the geometry is reasonable.
- lib/ai/voyage-client.ts: runtime query embedding via Voyage REST API
(no SDK at runtime — keeps the bundle clean). 8s timeout matches
the other tool handlers. Pinned to voyage-4-large to match the
build-time script + vh-lab + shrek-lab.
- lib/ai/tools.ts: new 6th tool semantic_search_datasets({query, limit}).
Embeds the query, ranks against the pre-baked index, returns top-K
with score + curated metadata. Graceful errors for: empty index,
no API key, embed failure, dim mismatch.
- lib/ai/system-prompt.ts: teaches Claude when to pick which tool —
concept-vs-substring is the key heuristic. Fall-back instructions
if semantic_search returns an error.
- app/api/ask/route.ts: runtime: 'nodejs' (was 'edge'). The
dataset-index.json import will be multi-MB once populated;
Node's 250 MB limit gives plenty of headroom vs. edge's 4 MB.
60s maxDuration covers up to 4 tool roundtrips + streaming.
23 new unit tests across 4 test files. Build + lint + typecheck
+ all 1031 unit tests green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to the original 2026-05-11 design spec. Documents: - What was copied from vh-lab + shrek-lab (three-tier metadata, Voyage AI provider, build-time embedding) - What was deliberately simplified for our scale (flat JSON vs pgvector, one chunk per dataset vs section-aware chunking, cosine-only vs hybrid+rerank) - The manual refresh workflow (set VOYAGE_API_KEY, run script, commit + push, Vercel auto-redeploys) - Failure-mode UX (every RAG failure falls back to keyword search; the chat never breaks because RAG is unavailable) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strict TS + the project's '--max-warnings=0' eslint config required non-null assertions on array-index accesses + dropping the unused `beforeEach` import. No runtime impact, no behavioral change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rank)
Replaces the flat-JSON + pure-cosine first pass with a faithful copy
of vh-lab + shrek-lab architecture. Every retrieval-quality component
matches: same DB engine, same indexes, same RRF constants, same
reranker.
What changed:
- DROP lib/ai/index-loader.ts + dataset-index.json — flat JSON gone
- ADD lib/ai/db/{pool.ts, schema.sql} — Postgres connection +
chunks + chunks_staging + rag_versions tables, IVFFlat (cosine,
lists=100), GIN tsvector index
- ADD lib/ai/hybrid-retrieval.ts — parallel vector + BM25 lanes,
RRF merge at k=60, ivfflat.probes=10 at query time
- UPDATE lib/ai/voyage-client.ts — adds rerank() alongside
embedQuery(); both via REST, voyage-4-large + rerank-2.5
- UPDATE lib/ai/tools.ts — semantic_search_datasets runs the full
4-stage pipeline (embed → hybrid → RRF → rerank); soft-degrades
to RRF-only if rerank fails
- REWRITE scripts/build-ask-index.mjs — staged ingest into Postgres
with atomic promote (mirrors vh-lab's
promote_staging_to_production_sync); REINDEX after promote
- ADD DATABASE_URL to env schema
- UPDATE design addendum with final architecture + setup steps +
cost + failure modes
Setup (one-time):
1. Railway → +Add → PostgreSQL → copy DATABASE_URL
2. psql $DATABASE_URL -f apps/web/lib/ai/db/schema.sql
3. Set DATABASE_URL + VOYAGE_API_KEY on Vercel Preview
4. export DATABASE_URL=... && export VOYAGE_API_KEY=...
pnpm --filter @ndi-cloud/web build-ask-index
Local verification:
✅ 1031 unit tests (10 new tests + 12 updated for new pipeline)
✅ Lint + typecheck clean
✅ Production build succeeds
✅ Bundle ratchet still under baseline
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The voyageai npm SDK ships ESM with directory-style sub-imports that Node's strict ESM resolver rejects (ERR_UNSUPPORTED_DIR_IMPORT). The runtime client in lib/ai/voyage-client.ts already calls the REST API directly; aligning the build script removes the broken dep entirely. Same Voyage endpoints, same auth, same response shape — just no SDK indirection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
🛑 DO NOT MERGE — experimental branch onlyPer audri 2026-05-13: this branch is for Shrek-demo experimentation. No merge without explicit green light from @audriB. State as of this comment:
|
Phase 1 of the scientific-depth plan (apps/web/docs/specs/ 2026-05-13-ask-scientific-depth-plan.md). Every tool result now carries a `references: Reference[]` array, the LLM is taught to emit `[^N]` footnote markers tied to those references, and the chat UI renders them as clickable chips that deep-link into the Document Explorer. Foundation pieces: - lib/ai/references.ts Reference type + makeReference helpers + parseFootnotes() that extracts [^N]: [title](url) — class defs from message body into a Map<number, Reference> - lib/ai/tools.ts every existing tool (list_published_datasets, get_dataset, get_dataset_summary, get_dataset_class_counts, get_facets, semantic_search_datasets) now returns a `references` array alongside its data payload. Each cites the dataset overview (catalog) or facet surface - lib/ai/system-prompt.ts adds CITATION section: [^N] footnotes required, ### Sources panel required, never fabricate a citation - components/ai/CitationChip.tsx small inline [N] chip with hover tooltip (title + snippet + class badge), opens reference URL in new tab - components/ai/SourcesPanel.tsx bottom-of-message deduplicated references list with class badges - components/ai/Markdown.tsx parses footnotes from raw content, customizes remark-gfm rendering: footnote-ref <sup><a> becomes CitationChip; default footnote-section is suppressed in favor of SourcesPanel; "### Sources" h3 stripped to avoid duplicates Tests: - 1045 unit tests pass (+14 new: references shape, footnote parsing, tool reference attachment, system-prompt citation clauses) - Lint + typecheck + build clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two scientific use cases:
- Subject join: pair measurements from two ontologyTableRow
groups by subjectDocumentIdentifier
- Treatment join: pair a measurement with the subject's
treatment label (walks treatment / treatment_drug /
treatment_transfer class chain)
Spec covers backend service (cross_table_pairs method on
TabularQueryService), router endpoint, cloud-app tool handler
(cross_table_query), chat-tools registration, ScatterChart
component, and BehavioralComparePanel mode-toggle.
Acceptance + test plan included. Implementation lands as
follow-up commits this same arc.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First wave of S5.3 (cross-table joins) on cloud-app.
NEW: apps/web/lib/ndi/tools/cross-table-query.ts
- Tool handler mirroring tabular-query.ts pattern
- zod input schema with joinOn enum ("subject" | "treatment")
- POSTs to /api/datasets/:id/cross-table-query
- Returns pair_count, unjoined, group_summary, chart_payload (for
the scatter-chart fence), references, empty_hint for retries
NEW: apps/web/app/api/datasets/[id]/cross-table-query/route.ts
- Thin proxy route following the tabular-query/route.ts pattern
NEW: apps/web/tests/unit/ai/tools/cross-table-query.test.ts
- 9 tests covering subject-join, treatment-join, groupBy
aggregation, empty-hint surfacing, input validation,
groupOrder pass-through
Remaining S5.3 work: chat-tools.ts registration, ScatterChart
component, BehavioralComparePanel mode toggle, backend service +
router (waiting on BE-A agent).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When tables have many columns (Bhar's subject table is 43 cols post-F-1b), the body H-scrolls but the header row stays static. Users scroll right to see column 30's data, but column 30's header is hidden left — they can't tell what column the data belongs to. Fix: CSS-only — add `min-width: max-content` to the `<table>` inside VirtualizedTable's scroll container. With `w-full` alone, some browsers honor `width: 100%` over cell intrinsic widths and squeeze columns rather than growing the table; the scroll container then never triggers an H-scrollbar. With `min-width: max-content` set, the table naturally grows to fit cell content, and because both `<thead>` (with `sticky top-0` for vertical pinning) and `<tbody>` live inside the SAME `<table>` inside the SAME `overflow-auto` scroll container, horizontal scroll moves header + body together — column titles stay aligned with their cells. Narrow tables (3-5 cols) where natural content width is smaller than the container still render at `w-full` — `min-width: max-content` only kicks in when content overflows. No regression. Test added: tests/unit/components/ui/VirtualizedTable.test.tsx verifies the inline style is applied, the sticky thead class is preserved, and the thead+tbody share the same scroll-container parent. Covers both the wide (43-col Bhar) and narrow (3-col) case. Real H-scroll behavior is browser-driven; jsdom doesn't lay out tables, so visual verification is owed to the Playwright E2E suite against `/datasets/69bc5ca11d547b1f6d083761/tables/subject`. This applies to the catalog summary tables (SummaryTableView, MyDatasetsTable). WorkspaceDataGrid already had a JS scroll-sync fix landed in fc1b8a8 (header lives outside the body's overflow-auto, so it needs the JS sync rather than CSS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second wave of S5.3 (cross-table joins) on cloud-app:
NEW: apps/web/components/ndi/charts/ScatterChart.tsx (~270 lines)
- Plotly scatter for joinOn=subject (numeric X vs numeric Y,
color by group when groupBy set)
- Strip plot for joinOn=treatment (numeric X vs categorical Y
treatment labels)
- Fetches via TanStack Query against /api/datasets/:id/cross-table-query
- Shared color palette with ViolinChart for consistent
cross-chart group coloring
- Surfaces unjoined count in figcaption when non-zero (so users
see "5 subjects unpaired (x-only: 3, y-only: 2)")
- Empty / loading / error states with testid wrappers
NEW: apps/web/tests/unit/components/charts/ScatterChart.test.tsx
- 6 tests: subject-join, treatment-join, empty, error, loading,
unjoined-figcaption
MODIFIED: apps/web/lib/ai/chat-tools.ts
- Registers cross_table_query tool with description directing the
LLM to use it when the user names TWO measurements (or one +
a treatment), with the retry loop pattern + scatter-chart fence
Remaining S5.3 work: BehavioralComparePanel mode toggle, backend
service + router (waiting on BE-A agent to finish F-8 + F-1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The LLM emits cross_table_query results inside a ```scatter-chart fence. Markdown.tsx now recognizes the fence + parses the payload via `parseScatterChartPayload` and mounts <ScatterChart /> inline in the chat message. Falls back to default code styling on malformed payloads — same defensive pattern as parseViolinChartPayload. Also unwraps the <pre> wrapper that react-markdown otherwise puts around fenced code blocks, so the chart's overflow + figcaption render cleanly. Adds ScatterChart.displayName = 'ScatterChart' for the childIsChartComponent identity test (mirrors ViolinChart pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the CROSS-TABLE / PAIRED COMPARISONS section to the SYSTEM_PROMPT
guiding the LLM to:
- Use cross_table_query when the user names TWO distinct
measurements/axes (vs tabular_query for single-axis groupBy)
- Discriminate joinOn=subject (numeric × numeric scatter) vs
joinOn=treatment (numeric × categorical strip plot)
- Embed the chart_payload inside a scatter-chart fence
- Surface unjoined counts explicitly when non-zero
Also tightens the cross_table_query chat-tools description so the
yVariableContains hint doesn't carry quoted snake_case field names
(which the tool-descriptions lint flags as un-explained NDI schema
names). Replaced with prose-form: "treatment reference / mixture /
name fields".
2221 tests + lint + build green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a mode toggle at the top of the BehavioralComparePanel that
switches between the existing single-table flow (tabular_query →
ViolinChart) and the new cross-table flow (cross_table_query →
ScatterChart).
Single-table mode behavior + existing tests unchanged.
Cross-table mode:
- Form: xVariableContains + yVariableContains + joinOn radio +
groupBy + groupOrder + title
- Run posts to /api/datasets/:id/cross-table-query
- Result renders ScatterChart (subject-join scatter / treatment-
join strip plot) + per-group count summary table
- Empty-hint retry loop mirrors tabular_query (separate
test-id prefix so both modes' picks are independently asserted)
- Mode toggle resets BOTH form sets + both committedArgs slots so
no stale input silently fires on the next Run
- F-4 useQuery with stable committedArgs preserved — two queries,
each enabled only when its mode is active
+7 new tests for the cross-table flow (mode toggle, x+y+joinOn
validation, POST body shape, ScatterChart render, empty-hint retry
with picked column as groupBy, form-reset on mode switch, result-
clear on mode switch after a successful run).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous turn crashed mid-arc (BE-A backend agent OOM'd
during F-1; my S5.3 backend service was wiped by an earlier git
reset from the same agent). All cloud-app work + 4 of 5 backend
tickets are pushed and safe; 2 items remain for the completion
run:
- F-1 backend (curated /tables/stimulus projection) — 241-line
integration-test stub recovered to
docs/specs/2026-05-18-f1-stimulus-projection-stub.diff
(service + router never implemented)
- S5.3 backend (cross_table_pairs service + POST /cross-table-query
route) — cloud-app side fully wired and waiting; backend was
lost. Full design spec + response contract added to the
handoff doc.
Plus two verification curls (F-6 0-count regression, B6
cross-dataset audit) added to the checklist.
Adds explicit "Step 1 → Step 2 → Step 3" framing matching the
user's stated plan:
1. Finish the completion run (this turn's remaining items)
2. Run the exhaustive test matrix
3. Tools-along-boundaries canvas redesign (held for design Q&A)
Plus operational guardrails to prevent re-hitting the same git
reset / parallel-agent collision / rate-limit pitfalls we burned
this session on.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…= verifications User cross-checked the handoff against their pre-crash context recap and surfaced four items needing reconciliation: 1. F-1 was already implemented in 0231851 (earlier this arc). STIMULUS_COLUMNS + router alias + projection function all live in the backend tree. The BE-A agent's recovered test stub is for ADDITIONAL pagination-invariant integration tests, not the implementation itself. Updated F-1 detail to reflect this and re-classified as "optional integration tests" rather than "implementation pending." 2. F-6 verification ran this turn with specific numbers — Bhar=0 (expected, no spike data), Francesconi=1604, Haley=4156. Locked in to the handoff as "Verifications run this turn — locked-in results" so the next session doesn't re-run. 3. B6 cross-dataset audit ran clean (all 8 datasets). Surfaced a SEPARATE upstream issue: Dabrowska returns totalDocuments=0 from /summary — appears to be cloud-side sync, NOT B6-related. Added to "Surfaced this turn but NOT actioned" section. 4. ?className= vs ?class= mystery — confirmed NOT a bug. Cloud-app sends ?class= correctly via lib/api/documents.ts::useDocuments. The earlier curl was a typo on the curl side. Locked in so future sessions don't re-investigate. After this update, the post-compaction completion checklist collapses to ONE substantive backend item: S5.3 cross_table_pairs service + POST /cross-table-query route (cloud-app side fully wired; backend implementation lost to git reset, design captured in full in this handoff). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-crash START HERE block was task-focused but didn't
re-state the operational essentials a fresh agent needs before
ANY action: which repo, which branch, which env IDs, which creds,
which sacred rules, which CI gates.
Adds a Pre-flight checklist right after the user-stated plan
intro covering:
- Repo paths + branches (cloud-app on feat/experimental-ask-chat,
backend on feat/ndi-python-phase-a)
- Production vs experimental env IDs (Vercel + Railway), with
explicit "DO NOT touch production" framing
- The 7 sacred rules (NEVER push to main, hook discipline,
author rule, Co-Authored-By trailer)
- CI gates per repo (pnpm lint+typecheck+test+build for
cloud-app; ruff+mypy+pytest for backend)
- Cred status — all 3 burned mid-arc; ~1h recovery; test matrix
will likely need a fresh cred from the user
- Operational gotchas: pnpm-lock at repo root, Vercel/Railway
deploy-wait windows, Railway-agent get-logs structlog quirk
The next agent reads this checklist before diving into the S5.3
backend implementation and won't accidentally push to main, skip
hooks, target production env, or burn creds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backend re-implementation of cross_table_pairs + POST /cross-table-query route landed in commit 7157bde on feat/ndi-python-phase-a. The post-crash completion checklist is now empty of substantive work. Next session moves to Step 2 — the exhaustive test matrix. Updates to apps/web/docs/reviews/2026-05-19b-post-handoff-execution.md: - New "🟩 IF YOU'RE THE SESSION AFTER S5.3 BACKEND LANDS" block at the top, listing verifications + remaining deferred items - Legacy completion-checklist row for S5.3 marked struck-through with commit ref - S5.3 detail section headed "✅ SHIPPED in commit 7157bde" preserving the design for historical reference + linking to the shipped impl + tests - Pre-flight checklist updated: backend HEAD = 7157bde - Update history entry for the s5.3-completion session Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit closes the post-crash completion run. Every deferred item from the post-S5.3 checklist (aside from S-1 through S-4 SDK upstream asks held by user) has now been resolved: - S5.3 backend cross_table_pairs + POST route — commit 7157bde - F-1 integration tests (column shape + alias + pagination) — commit f6ecb83 - F-4 stable query keys — verified already shipped (grep audit) - Mobile <375px thorough audit — found no remaining issues beyond what fd44603 already shipped. Added exhaustive grep matrix to the handoff doc. - Card gap thorough audit — verified harmonious space-y / gap rhythm across components/datasets/, components/ui/Card.tsx, and components/workspace/PanelCard.tsx. No code changes needed. - Dabrowska totalDocuments=0 — diagnosed as upstream cloud-node state (isPublished:true + documentCount:0 + empty documents array on BOTH prod + experimental envs). Backend is correct; flagged for cloud-node team. Branch state at close: - Cloud-app feat/experimental-ask-chat: HEAD updated by this commit - Backend feat/ndi-python-phase-a: HEAD f6ecb83 (1128 tests) Next step per the user's plan: Step 2 — the exhaustive test matrix (8 datasets x ~10 panels x 17 chat tools + G2/G3 + Safari verifies). Wait for fresh test creds before starting; current 3 are rate-limited. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test-matrix Agent A surfaced a regression in the dataset overview page: Haley's overview rendered sessions=3 while /api/datasets/682e7772cdf3f24938176fac/summary correctly returns counts.sessions=2 (B6-filtered). Root cause: the 2026-04-28 +1-session correction in overview-content.tsx unconditionally re-sourced counts.sessions from raw classCounts.session, clobbering B6's backend parent-session filter. For Haley the synthesizer returns 2 (parent filtered), but classCounts.session is still the raw 3. Fix: gate the override on summary < raw — only re-source from class-counts when the backend has NOT already filtered. Preserves the original wrapper-subtract-1 case (session_in_a_dataset only) and Bhar's no-op (summary == raw == 2 unchanged). Tests (3 new in dataset-detail-shells.test.tsx > OverviewContent > session-count override (B6 compatibility)): - Haley-like: trusts B6-filtered summary, ignores raw class-counts.session - Bhar-like: summary == raw, override is a no-op - pure-wrapper: synthesizer fell back to session_in_a_dataset → subtract 1 CI: pnpm lint clean, pnpm typecheck clean, 2231 tests pass (was 2228 + 3). Refs: audit/2026-05-19-test-matrix/agent-A.md "Haley sessions count stuck at 3" finding Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Post-completion-run test matrix executed via 3 parallel Playwright agents (datasets 1-4, datasets 5-8, 17 chat tools). All three agents hit AUTH_RATE_LIMITED (HTTP 429) within ~5 logins, gating most UI coverage. The matrix still produced rich findings via the public catalog UI + same-origin public read endpoints. 8 NEW BUGS surfaced: - NEW-1 P0: Catalog Overview Sessions count override undid B6 filter (Haley showed sessions=3 vs /summary correctly returns 2). FIXED in commit 3e0c28d earlier this session. - NEW-2 P0: Workspace router silently substitutes to default workspace (68839b1f...) when user lacks org access. No 403, no notice. Burns rate-limit budget via /create-account redirect cascade. OPEN — recommended fix in report. - NEW-3 P1: Dataset card header numberOfSubjects (281) disagrees with COUNTS panel (0) on Dabrowska. Two surfaces sourced from different endpoints. OPEN. - NEW-4 P1: Cmd+K from workspace opens DIFFERENT workspace. May be same root cause as NEW-2. OPEN. - NEW-5 P1: Vercel preview auth instability — session cookies appear to fail re-validation after 30-60s. Agent B hypothesis: cookie domain scoping bug despite cookie_attrs.py looking correct. OPEN. - NEW-6 P3: .playwright-mcp/ snapshots persisted plaintext passwords. 21 files affected, all scrubbed in-place this session. Never committed to git (gitignored). FIXED. - NEW-7 P2: Placeholder DOI text "https://doi.org://10.1000/123456789" on DS6/7/8 (data-ingest pipeline issue). OPEN. - NEW-8 P2: DS8 (Mukherjee gustatory) is a 99-byte stub with 0 sessions/epochs. Probably shouldn't be marked Published. OPEN. Coverage delivered: - 4/8 datasets catalog UI verified live - 1/8 datasets workspace shell verified live - 4/8 datasets API characterized + known-good demo doc IDs harvested - 0/17 chat tools exercised (all blocked at login) Branch state at end of run: - Cloud-app feat/experimental-ask-chat: HEAD 3e0c28d (Haley fix) - Backend feat/ndi-python-phase-a: HEAD f6ecb83 (unchanged) Next session priorities (in order): 1. Fix NEW-2 workspace router substitution (highest-impact bug) 2. Investigate NEW-5 Vercel preview auth instability 3. Re-run test matrix with fresh creds + NEW-2/NEW-5 fixed 4. Then Step 3 — Tools-along-boundaries canvas redesign Full report at apps/web/docs/reviews/2026-05-19-test-matrix-results.md (includes known-good demo doc IDs appendix for tutorial handout). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test-matrix Agent A surfaced an inconsistency on Dabrowska's catalog page: the hero strip shows "Subjects: 281" (from the dataset record's precomputed numberOfSubjects field, sourced from the paper's reported sample size) while the synthesized COUNTS panel below correctly shows "Subjects: 0" (live-computed from documents, which haven't been ingested upstream — Dabrowska's isPublished=true + documentCount=0 published-but-empty state). Two surfaces on one page disagreeing about subject count misleads users about the dataset's contents-of-record state. Fix: treat documentCount === 0 as the authoritative signal that the documents-of-record are absent, and suppress the precomputed numberOfSubjects fact in the hero. The synthesized COUNTS panel correctly shows 0 — the hero now stays silent on subjects when documents are zero. Other facts (Documents, Size, License) still render with their honest values. When documents come back, numberOfSubjects renders again automatically. Tests: 1 new in DatasetDetailHero.test.tsx pinning the Dabrowska-like (documentCount=0, numberOfSubjects=281) case. Existing tests for happy-path Subjects rendering (with documentCount>0) preserved. CI: pnpm lint clean, pnpm typecheck clean, 2232 tests pass. Refs: apps/web/docs/reviews/2026-05-19-test-matrix-results.md NEW-3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Update the test-matrix synthesis to reflect that NEW-3 was also fixed inline this session (alongside NEW-1 + NEW-6 already noted). Updates: - 2026-05-19-test-matrix-results.md - TL;DR now shows 3 fixes shipped this session - NEW-3 section reclassified as FIXED with commit ref 1583a33 - Recommendations section updated to reflect NEW-3 done + deeper investigation notes for NEW-2 and NEW-5 - 2026-05-19b-post-handoff-execution.md - New "IF YOU'RE THE SESSION AFTER TEST MATRIX LANDS" block at the top with the 8-bug status table and next-session priorities - Previous "all completion work" block demoted to legacy Test-matrix fix summary across this session: - 3e0c28d — NEW-1 P0 Overview Sessions count B6 compatibility - 1583a33 — NEW-3 P1 Dabrowska hero/COUNTS disagreement - (security) NEW-6 P3 .playwright-mcp/ password scrub (local-only, 21 files scrubbed in-place, never committed to git per gitignore) 5 bugs remain OPEN (NEW-2, NEW-4, NEW-5 P1; NEW-7, NEW-8 P2 data- ingest). NEW-2 (workspace router substitution) is the highest- impact and should be the next session's first priority. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NEW-5 investigation (curl-verified this session):
- /api/auth/csrf from preview Origin returns host-only cookie (no
Domain attribute) ✅
- /api/auth/csrf from apex Origin returns Domain=.ndi-cloud.com ✅
- cookie_attrs.py logic is correct; Agent B's "cookies scoped
wrong" hypothesis was wrong.
- BUT: GET / on the preview URL itself returns HTTP 401 with
_vercel_sso_nonce. The preview is gated by Vercel SSO
Deployment Protection. Playwright agents authenticated via a
saved Chromium state Vercel SSO token; when that token expired
(~1h TTL) every subsequent navigation became a 401 challenge.
The "session loss" agents observed was Vercel-layer SSO, not
NDI-layer session.
Root cause: Vercel SSO Deployment Protection on the preview.
Fix is operational, not code — set up a Vercel Automation Bypass
Token on the project's Deployment Protection settings.
NEW-2 reclassification (P0 → P1, likely-secondary-of-NEW-5):
Audited every workspace route handler + middleware-equivalent
surface in this session — no code path substitutes dataset IDs.
With NEW-5 root-caused, the URL substitution agents observed is
most likely a Vercel SSO redirect chain artifact, not a
workspace-router bug. Final verdict deferred until the test
matrix re-runs with the bypass token; if NEW-2 still reproduces
it's a real bug.
Updates:
- 2026-05-19-test-matrix-results.md
- NEW-5 section: full investigation log + 3 operational fix
options (recommended: Vercel bypass token)
- NEW-2 section: reclassified as likely-secondary; audit
record of all surfaces checked
- TL;DR: now reflects 1 root-cause + 1 reclassification
- Recommendations: bypass-token + matrix re-run is the #1
next-session priority
- 2026-05-19b-post-handoff-execution.md
- New "🟪 IF YOU'RE THE SESSION AFTER NEW-2/NEW-5 INVESTIGATION"
block at the top with the curl results + USER ACTION REQUIRED
callout for the bypass token
- Previous "🟦 IF YOU'RE THE SESSION AFTER TEST MATRIX LANDS"
demoted to superseded
Next session priorities (per the new top block):
1. Confirm user has set up the Vercel Automation Bypass Token
2. Re-run the test matrix with the token plumbed through
3. If NEW-2/NEW-4 still reproduce: investigate as real code bugs
4. Then Step 3 — canvas redesign (held for design Q&A)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix: when safeFetchDataset returns null (dataset doesn't exist, user lacks access, OR transient network blip), WorkspaceShell rendered the bare 24-char datasetId as the h1. The user had no signal as to whether the dataset was missing, gated, or just slow. Test-matrix Agent A NEW-3 follow-up: users who navigate to a workspace they can't access see a confusing bare-hex h1 with no recovery path. Fix: when data is null, the h1 still renders the datasetId (preserved for share-link debuggability — operators need to be able to read the id back to the user from the URL), but with a fallback notice below explaining the degraded state + offering a link to the public catalog detail page (anonymous-readable, works even when workspace API paths are gated) plus a "browse all datasets" fallback. The H1 styling for the null-data case is now `font-mono` so the unparseable hex id is visually distinct from a real title. CI: pnpm lint clean, pnpm typecheck clean, 2232 tests pass. Refs: apps/web/docs/reviews/2026-05-19-test-matrix-results.md NEW-2/NEW-3 follow-up UX improvement Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step-by-step instructions for setting up the Vercel Protection-Bypass-for-Automation token on the ndi-cloud-app-web project so the next test-matrix re-run can sustain Playwright sessions on the preview deploy. Without this setup, automated test agents cannot reliably exercise the preview because Vercel's SSO Deployment Protection gate (~1h TTL) expires mid-session, masquerading as NDI session loss. New doc: apps/web/docs/operations/vercel-automation-bypass-setup.md Covers: - The 3-minute Vercel UI setup - Two equivalent agent code patterns (cookie set vs header per request) - Plumbing strategy for the test-matrix dispatcher - Security posture (token scope, revocation, audit logs) - Verification curl commands - Alternative: disable Deployment Protection on the experimental preview branch (security trade-off discussion) Also updated the post-handoff doc to reference the new setup guide from the "🔑 USER ACTION REQUIRED" callout. Refs: apps/web/docs/reviews/2026-05-19-test-matrix-results.md NEW-5 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User pushed back on the earlier follow-up claim (commit 5559e53) that NEW-2/NEW-5 were caused by Vercel SSO Deployment Protection. The pushback is correct: plain curl returns 401 because it has no cookies, but Playwright agents AND real Chrome both reached /my and the workspace shell past any Vercel-layer auth in this session AND in prior sessions (e.g. the late-evening panel- exercise pass that landed the patch-clamp 21 sweeps and BehavioralTrack 1985-segment renders). Vercel SSO is not the test-matrix blocker. Updates: - 2026-05-19-test-matrix-results.md - NEW-5 section: Vercel SSO claim retracted; reopened with "real cause unknown"; standing hypothesis is test-cred org-access limitation - NEW-2 section: removed "likely-secondary-of-NEW-5" classification; reopened as P0 with unknown root cause - TL;DR: reflects the retraction - Recommendations: ask user to reproduce in actual Chrome BEFORE chasing as a product bug - 2026-05-19b-post-handoff-execution.md - New "🟫 IF YOU'RE THE SESSION AFTER NEW-2/NEW-5 INVESTIGATION (with retraction)" block at the top, replacing the bad-claim "🟪" block - Final status table now shows NEW-2/NEW-4/NEW-5 all OPEN - DELETED: apps/web/docs/operations/vercel-automation-bypass-setup.md (the underlying claim it documented was wrong) What remains good from the investigation: - /api/auth/csrf cookie-attrs verification (host-only on preview Origin, Domain on apex) ✅ - Workspace route-handler code audit (no substitution found in cloud-app routing code) ✅ - The WorkspaceShell friendly fallback commit d06e9e2 (legit UX improvement regardless of the root-cause confusion) Next session: ask the user to reproduce in actual Chrome with their own creds before chasing NEW-2/NEW-4/NEW-5 as product bugs. If they don't reproduce there, they're Playwright-agent or test-cred-specific artifacts. CI: 2232 cloud-app tests still pass; no code changes in this commit (docs + deletion only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ills
Two fixes addressing direct user feedback from the post-test-matrix
session:
1. Documents picker left-click now auto-fills the matching panel
(user: "the selector never works — you select a document, you
always have to manually copy paste it").
Root cause: DocumentsPicker was passing primaryId={null} +
onPrimaryChange={()=>undefined} to WorkspaceDataGrid. Single-click
on a row did NOTHING. All 4 specific pickers (Subjects / Sessions
/ Probes / Stimuli) wired primary selection correctly — Documents
picker was the only gap.
Fix: new `lib/workspace/class-to-selection-key.ts` maps each NDI
doc class to the workspace's 5-key selection dimension. The
Documents picker reads from selection[targetSlot] for primaryId
and writes to selection[targetSlot] on click. For unmapped
classes (treatment, ontologyTableRow, daqsystem, etc.) the
picker shows a hint banner pointing the user at right-click "Set
as…".
Mapping highlights:
- imageStack → session (so VideoPlaybackPanel auto-fills)
- subject / openminds_subject → subject
- element / probe / probe_location → probe
- element_epoch / epoch / epochfiles_ingested / etc. → session
(mirrors the backend _CLASS_ALIASES chain)
- stimulus_presentation / stimulus_response → stimulus
- vmspikesummary / neuron_extracellular / *_tuning_calc → unit
2. Video panel extended to also render still images (user: "if we
have a video viewer that takes image stacks, why not also let
the same tool show images?").
VideoPlaybackPanel now branches on formatOntology:
- NCIT:C190180 (MP4 video, Bhar use case) → ImageStackVideoViewer
- NCIT:C70631 / NCIT:C85437 (PNG-family stills, Haley use case)
→ ImageViewer (PIL-decoded, zoom + frame stepper)
- anything else → friendly "unsupported" message naming the
ontology codes the panel DOES support
Title now reads "Media playback"; icon picks Image vs Video
based on the doc shape. Empty-state copy explains both
subdomains. Filename + component name preserved
(VideoPlaybackPanel.tsx) for import stability.
Tests:
- 10 new in `tests/unit/lib/workspace/class-to-selection-key.test.ts`
(class → slot mapping, null returns, case-sensitivity contract,
round-trip consistency)
- 7 new in DocumentsPicker.test.tsx (primary-select wiring per class,
unmapped-class hint banner, no-op onPrimaryChange for unmapped)
- 4 new in VideoPlaybackPanel.test.tsx (image branch — PNG
formatOntology mounts ImageViewer; mask formatOntology too; loading
skeleton during /data/image fetch; error fallback)
- Existing tests updated to match new copy: "Pick a media document"
empty-state, "doesn't contain renderable media" unsupported
message, NCIT:C999999 (truly-unsupported) fixture for the
unsupported-format test
CI: 2253 cloud-app tests pass (was 2232 + 21 new); lint clean;
typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…able_query
Steve's "Show code" feedback: he wanted to see the snippet load data
from the cloud + plot it, with intervention points between the two so
users can do something different mid-pipeline. Three gaps surfaced:
1. **fetch_signal had a TODO for picking the binary file off
doc.files** — Steve specifically flagged "I guess it should go as
far as loading the data from the cloud." The new flow has 4 named
steps:
Step 1: fetch the doc
Step 2: pick the binary file off doc.files
- Skips metadata files (channel_list.bin etc.) per the prior
smart-binary-picker work
- Prefers .nbf / .vhsb / .dat / .bin; largest file by size
Step 3: download the bytes via fetch_cloud_file (Python) /
getFile (MATLAB)
Step 4: decoder note pointing at .nbf_read / .vhsb_read / numpy
(the natural intervention point)
2. **get_document had NO mapping** — Video/Media panel emitted toolName
"get_document" which fell to the default TODO. Now branches by
imageStack.formatOntology:
- NCIT:C190180 (MP4 video) → download file for local playback
- NCIT:C70631 / NCIT:C85437 (PNG-family) → PIL decode +
matplotlib (Python) / imread + imagesc (MATLAB)
3. **cross_table_query had NO mapping** — BehavioralCompare cross-mode
(S5.3) emitted "cross_table_query" which fell to default TODO. Now
emits a clean pandas (Python) / containers.Map (MATLAB) pipeline:
Step 1: fetch ontologyTableRow docs via ndi_query "isa"
Step 2: find X + Y columns by substring match (mirrors the
backend _find_matching_group logic)
Step 3: inner-join on subjectDocumentIdentifier (subject join)
OR fetch treatment-class labels (treatment join)
Step 4: matplotlib scatter / gscatter — colored by groupBy
when set
All three branches in both Python AND MATLAB. Steve will see the same
structure in either tab.
CI: 2260 cloud-app tests pass (was 2253 + 7 new); ruff/lint clean;
typecheck clean. 56 code-export tests including the 7 new ones pin
the contract: "no TODO for fetch_signal pick-the-file", "branch on
NCIT:C190180 vs NCIT:C70631 for get_document", "fetch_treatment +
strip plot for cross_table_query treatment join", etc.
Pattern (Steve's bar):
1. Fetch the doc(s)
2. Pick / extract the relevant fields
3. Compute or transform
4. Plot
→ Each step is a separate block with a banner comment, so the user
can stop / introspect / modify between any two steps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The user pushed back (correctly) on the prior turn's claim that the
Show-Code snippets were "good enough to send to Steve." The
snippets have the right STRUCTURE (data-load → extract → plot with
named "Step N" banners), but they likely do NOT run end-to-end
because:
- No install header
- No auth flow shown
- Some referenced NDI-python API names may not exist (the existing
audit comments in the code already flagged this in places)
- Binary decoders may need separate packages
- NDI-matlab audit is even thinner
The user requested we hold further Show-Code generator changes
until a deep audit lands. This commit captures the deep-dive scope
in a fresh top block on the existing handoff doc.
New block ("🟧 IF YOU'RE THE POST-COMPACTION AGENT (Show-Code
DEEP-DIVE arc)") covers:
- Branch state with exact SHAs (cloud-app: 4a0ddd7; backend:
f6ecb83)
- Sacred rules brief (NEVER push to main, author rule, etc.)
- 9 bugs / improvements shipped this multi-turn arc (don't redo)
- Retracted misdiagnoses (Vercel SSO was NOT NEW-5; NEW-2 confirmed
by user as Playwright artifact, not product bug)
- 5 OPEN bugs with status (NEW-2/4/5 = Playwright-specific per
user verification in real Chrome; NEW-7/8 = data-ingest)
- Explicit user HELDs (more Show-Code work, canvas redesign, S-1
to S-4, test matrix re-run)
- 9 deep-dive topics in priority order, starting with NDI-python
public API audit (output: apps/web/docs/operations/
ndi-python-api-audit.md)
- "What to do FIRST" — verify branch state, read the whole doc,
ask the user to pick a starting topic, do NOT touch the
generator until the audit lands
- Test cred status (all 3 likely burned ~1h recovery)
- CI state at close (2260 cloud-app + 1128 backend tests passing)
- Operational gotchas (pnpm-lock at root, Vercel/Railway redeploy
windows, structlog filter quirk)
The original "🟫" / "🟦" / "🟩" blocks are preserved below for
historical context.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three deep audits + a memory/crash investigation landed this session; this commit ships the audit-driven fixes to the Show-Code snippet generators so the emitted Python + MATLAB actually runs against the published SDKs. Top bugs surfaced + fixed (full table in code-export-coverage-matrix.md): - pip install ndi-python was wrong (package is `ndi`, not on PyPI); use pip install git+https://github.com/Waltham-Data-Science/NDI-python.git - ndiqueryAll(datasetId, …) is wrong; first arg is scope literal. Five Python emitters now use ndiqueryAll("public", …) + post-filter. - fetch_cloud_file(uri) is wrong; real sig is (uri, target_path) -> bool. Three Python emitters now write to ~/.ndi/cache/<datasetId>/ and check. - nbf_read doesn't exist in vlt; use ndicompress.expand_ephys for .nbf. - vhsb_read takes (fo, x0, x1), not (path); fixed in both languages. - /api/facets is a Next.js route → Python hits via urllib + Bearer; MATLAB errors explicitly with S-3 PR pointer. - MATLAB getFile sig is (downloadUrl, localPath, ...) after getFileDetails — fixed in fetch_signal / fetch_image / get_document. - Canonical snake_case (treatment.numeric_value, vmspikesummary .sample_times, stimulus_presentation.presentation_time.onset) vs cloud-app's camelCase projection — every accessor now checks both. - MATLAB getDocument flat vs bulkFetch wrapped envelope — added _doc_body / _vm_body unwrap helpers. - MATLAB cross_table_query was passing q.searchstructure (wrong) + 'page_size' (wrong) — fixed to q + 'pageSize' + bulkFetch hydration. - No auth pre-flight — Python docstring lists USERNAME/PASSWORD or TOKEN/ORGANIZATION_ID env vars; MATLAB %% Step 0 guards the path then calls ndi.cloud.authenticate(). - ndi_dataset_overview was hitting the default TODO — added the emitter to both languages (composes getDataset + documentClassCounts). Test surface: 65 code-export tests pass (32 Python + 33 MATLAB), of which 9 new pinning tests protect the audit-driven shapes from silent regression. Full suite 2269 passing, lint + typecheck clean. New ops docs: - ndi-python-api-audit.md — per-emitter audit, file:line grounded - ndi-matlab-api-audit.md — same for MATLAB (matters most for Steve) - code-export-coverage-matrix.md — synthesis, panel × tool matrix, the S-1 through S-4 SDK upstream PR asks - 2026-05-19c-memory-crash-investigation.md — root-cause for the user's computer crashes (15 GB locked .claude/worktrees + 226 MB live JSONL transcript + 6 GB colima VM; cloud-app code is NOT a culprit) Handoff doc updated with 🟪 block summarizing this session and what's still held (live verification, Modal UX, S-1 → S-4 upstream PRs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the cheap-but-real layer of the co-versioning safety idea documented in code-export-coverage-matrix.md. The snippet generators emit SDK names by string. The audit docs verify each name against the published SDK source by file:line. Those two artifacts are hand-written and could drift apart silently. This commit bridges the gap with: - lib/ndi/code-export/sdk-surface.json — the AUDITED truth: every import, function name, signature, and audit_ref for both NDI-python and NDI-matlab, plus a `_explicitly_does_not_exist` list of names we must NEVER emit (e.g. `vlt.file.custom_file_formats.nbf_read`, `ndi.database.openbinarydoc`). - tests/unit/ai/code-export/sdk-surface.test.ts — 46 assertions that invoke every emitter once, then check the produced snippet against every entry in sdk-surface.json. CI fails if the generator emits a banned name OR stops emitting an audited one. Total test surface: 111 code-export tests (32 Python + 33 MATLAB + 46 co-versioning). Full suite 2315 passing. The DYNAMIC layer (pytest against installed NDI-python in CI) waits for NDI-python to publish to PyPI — sketched in §"Co-versioning safety idea" of the coverage matrix doc. Minor fix: matlab get_facets error message reworded so it no longer embeds the literal `ndi.cloud.api.datasets.getFacets` token (which the co-versioning check flags as banned). The error still points at the S-3 PR ask. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Curling https://ndb-v2-experimental.up.railway.app for Bhar's imageStack returned the doc body's `files` field in the canonical NDI shape: body.files = { file_list: ["imageStack"], file_info: {name, locations: {location, uid, ...}} } — NOT a list of {uri, name, size} entries (the projection the previous emitters assumed). Two real bugs uncovered: 1. doc.get("files") returns a dict, not a list. The previous "files = doc.get('files') or []" pattern iterated dict KEYS instead of file entries. 2. The location string is a raw pre-signed S3 URL by default, NOT an ndic:// URI. fetch_cloud_file REQUIRES the ndic:// form (it calls parse_ndic_uri). Users must first call ndi.cloud.filehandler .updateFileInfoForRemoteFiles(body, datasetId) to rewrite locations to the ndic:// form in-place. Applied the fix to all three Python file-touching emitters (fetch_signal, fetch_image, get_document): - Unwrap envelope: body = doc.get("data") if isinstance(...) else doc - Call updateFileInfoForRemoteFiles(body, datasetId) to normalize URIs - Walk file_info defensively (dict or list); same for locations - Pick the first location's .location as the ndic:// URI Same shape fix applied to MATLAB emitters (fetch_signal, get_document): walk doc.files.file_info as a struct or struct array; parse the .locations.location URI; extract fileUID for getFileDetails → getFile. New: updateFileInfoForRemoteFiles added to sdk-surface.json (cited in filehandler.py:51-118). Live-verification finding documented in code-export-coverage-matrix.md §"Live verification finding" with the actual curl response shape + both surprises. Test surface: 112 code-export tests pass (32 Python + 33 MATLAB + 47 co-versioning). Full suite 2316 passing; lint + typecheck clean. What's still NOT end-to-end verified: actually running the snippets in a real Python/MATLAB kernel against a (dataset, doc) pair. Topic #6's natural next step — but the shape gap that would have crashed the snippets at the first file-access is now closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a fresh "GitHub Template arc" top block to the post-handoff execution doc. Captures the design pivot (Steve + Eivind brainstorm → green-light to prototype the template repo), what landed this session (the local `ndi-analysis-template` repo at commit 3fb2567), and the ordered punch list for the next session: push to GitHub under the chosen org, mark as Template, add 6 more plot modules, do the cloud-app NextAuth + button work, then ZIP/Colab/Codespaces deep-links. Also notes the side-effect of this session's memory cleanup: `pnpm store prune` invalidated `apps/web/node_modules` hardlinks, so the next agent touching cloud-app code must run `pnpm install` first. Marks the prior "🟪 Show-Code audit + fixes" block as superseded so the next agent doesn't double back into emitter work — the template repo subsumes that direction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the cloud-app side of the GitHub Template arc from
2026-05-19b-post-handoff-execution.md. Adds two buttons next to
every workspace panel's existing ShowCodeButton + every chat
tool-call message:
- "Open in GitHub" → POST /api/github/create-analysis-repo
→ octokit.rest.repos.createUsingTemplate against the private
Waltham-Data-Science/ndi-analysis-template repo
→ commits current_analysis.py pre-populated with the panel's
exact args via a thin generator (lib/ndi/code-export/
current-analysis.ts) that imports plots.plot_X functions
from the template
- "Download as ZIP" → POST /api/github/download-analysis-zip
→ fetches the template tarball via a server-side PAT
(GITHUB_APP_TOKEN) since the template is private
→ repacks as a .zip with current_analysis.py injected at the
slug-prefixed root
OAuth lives in lib/github/oauth.ts as a "linked-account" cookie
(HttpOnly + AES-256-GCM via GITHUB_TOKEN_ENCRYPTION_KEY) rather
than NextAuth, so the existing FastAPI cookie-session auth stays
the single source of truth. /api/github/status surfaces the
merged "configured + linked" verdict for the client.
Button is gated client-side on NEXT_PUBLIC_GITHUB_INTEGRATION_ENABLED
and server-side on the three env vars (GITHUB_CLIENT_ID,
GITHUB_CLIENT_SECRET, GITHUB_APP_TOKEN). Renders disabled with a
tooltip when unset — never crashes.
Tests (+51): create-analysis-repo (7), download-analysis-zip (5),
slug helpers (8), oauth helpers (16), current-analysis emitter
(7), OpenInGitHubButton component (5). Total cloud-app suite:
2367 passing across 191 files. lint + typecheck + build clean.
ADR-010 documents the decision tree and the 12 new files. COMPLIANCE
gains a §8 External services row for the new GitHub integration.
Phase 1 of the workflow — non-functional in production until the
user provisions GITHUB_CLIENT_ID etc. on Vercel Preview. The
template repo itself was pushed to Waltham-Data-Science last
session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Marks the arc ~95% complete and rewrites the "what remains" section into a short user-side punch list (env vars on Vercel Preview, pin smoke doc IDs, add CI secrets, license pick, Colab/Codespaces deep-links). The three pillars built across this multi-agent session: 1. Python template — Waltham-Data-Science/ndi-analysis-template (private, GitHub Template flag set), 9 plot modules + 68 unit tests + 10 smoke scaffolded. Commits 3fb2567 + 2fb1ac6. 2. MATLAB template — Waltham-Data-Science/ndi-analysis-template-matlab (private, GitHub Template flag set), 9 plotXxx.m functions + 3-job CI matrix via matlab-actions/setup-matlab. Commit 872f4e8. 3. Cloud-app side — 6 new API routes (/api/github/*) + OpenInGitHub Button on all 10 panel + chat surfaces + linked-account OAuth + ADR-010 + 51 new tests. Commit 4e85ef8 on feat/experimental-ask- chat. 2367/2367 cloud-app tests passing across 191 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds apps/web/docs/HANDOFF.md as the single source of truth for the next session — covers all four repos (cloud-app, ndb-v2, ndi-analysis-template Python + MATLAB), sacred rules, test creds, production vs experimental state, the GitHub Template arc, the recent Railway outage + recovery procedure, recent commit timeline, and a prioritized punch list for what's left to do. Marks 21 prior dated docs (handoffs, audits, plans, reviews) as SUPERSEDED with a one-line header pointing back to HANDOFF.md. Files retained for archaeology — git history is the safety net. Deletes 5 truly-redundant artifacts: - 2026-05-14-pre-compact-handoff.md (V1) - 2026-05-14-pre-compact-handoff-v2.md (V2) - 2026-05-15-pre-compact-handoff-and-execution-plan.md (dup of master plan) - 2026-05-16-pre-compact-handoff.md (superseded multiple times) - 2026-05-18-f1-stimulus-projection-stub.diff (already-applied binary patch) CLAUDE.md's "Where to read next" section now points at HANDOFF.md rather than a stale laundry list of dated docs. The S5.3 line is updated to reflect that it SHIPPED on feat/ndi-python-phase-a (commit 7157bde on the backend). Cleans up workspace-snapshot.md (stale Playwright artifact at repo root) + adds a gitignore guard so similar snapshots can't accidentally get committed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Experimental public-facing chatbot at
/askover the published NDI Commons catalog. Built for the Shrek upsell demo (he's already buying LabChat; pitch is "you can also chat over your experiment data on NDI Cloud").Scope is deliberately tight:
ANTHROPIC_API_KEY(route) +NEXT_PUBLIC_ASK_ENABLED(nav)Production impact when this PR sits in draft: ZERO. Both env flags must be set, and the PR is intentionally not merging to main without explicit Audri review.
Spec:
apps/web/docs/specs/2026-05-11-experimental-ask-chat-design.mdImpl plan:
apps/web/docs/plans/2026-05-11-experimental-ask-chat-impl.mdWhat's new
/askpage (route-group: marketing) —'use client'chat shell using@ai-sdk/reactv5useChatPOST /api/askedge route (streaming) with feature flag + per-IP rate limit + zod-validated toolslib/ai/modules: tools (5 tools backed by existing public catalog endpoints), system-prompt, rate-limit, feature-flag, anthropic-clientcomponents/ai/chat primitives (Markdown, Message, Thread, Input, Chips, ToolCallIndicator)Test plan
Local (all done):
pnpm lint --max-warnings=0)tsc -b --noEmit)/api/askshows as edge function,/askas static (gates render-time)Preview (Audri to verify on Vercel preview URL):
ANTHROPIC_API_KEY+NEXT_PUBLIC_ASK_ENABLED=1on the preview deployment env/ask— Ask tab visible in nav, chat loadsCost / risk
🤖 Generated with Claude Code