[DO NOT MERGE — experimental] feat(ndi-python): Phase A integration by audriB · Pull Request #112 · Waltham-Data-Science/ndi-data-browser-v2

audriB · 2026-05-13T22:15:21Z

⚠️ DO NOT MERGE

This branch backs the experimental Railway environment for byte-for-byte audit comparison with production. Triple-protected: draft state + DO NOT MERGE title prefix + this comment. The merge gate is the audit passing, NOT CI green.

Summary

Phase A of the NDI-python integration. Installs the NDI stack on the Railway image (vlt + did + ndr + vhlab-toolbox-python + ndi-compress) with --no-deps to skip matplotlib (~50-70 MB) and opencv (~80 MB), saving ~150 MB image growth vs the naive install. Wires three new capabilities into existing services:

VHSB text-tag decoding via vlt.file.vhsb_read — unblocks Haley + any future VHSB dataset for QuickPlot + signal-chart. Today these always soft-error with the legacy "vlt library not available" message.
NDI-compressed binary handling via ndicompress.expand_* — supports the .nbf.tgz wrapper. Today these silently fail in _parse_nbf.
Ontology fallback in OntologyService — when existing external providers miss, falls back to NDI's bundled lookup which knows lab-specific terms (WBStrain, Cre lines, NDIC) that public OLS providers don't have.

Architecture: graceful degradation

All NDI calls go through a new services/ndi_python_service.py that:

Lazy-imports the NDI stack only on first call (cold-start neutral)
Caches the import success/failure as a module flag
Returns None on miss, error, or "NDI not installed" — callers fall through to their existing legacy paths

So even if the NDI install fails in production, the public surface keeps working. The audit script in apps/web/scripts/audit-public-api.mjs proves this empirically.

Files

File	Change
`infra/Dockerfile`	+ `git` apt dep + 5 `pip install --no-deps` lines + import sanity check
`backend/requirements.txt`	+ 7 NDI runtime deps (handpicked, no matplotlib/opencv)
`backend/pyproject.toml`	mirror of requirements.txt + mypy override for NDI/vlt/ndicompress
`backend/services/ndi_python_service.py`	NEW — three thin wrappers + import guard
`backend/services/binary_service.py`	dispatch order rewritten: compression → VHSB → inline → NBF
`backend/services/ontology_service.py`	NDI fallback when existing providers return a stub
`backend/tests/unit/test_ndi_python_service.py`	NEW — 19 unit tests (no NDI install required)
`docs/plans/2026-05-13-ndi-python-integration.md`	Phase A/B/C strategy + recon findings
`docs/plans/2026-05-13-railway-experimental-env-runbook.md`	dashboard walkthrough for the experimental env

Test plan

Merge gate

This PR merges only when:

CI is green AND
The audit shows byte-identical responses on the public surface (with the expected exception of previously-soft-erroring endpoints now returning real data) AND
The user (Audri) explicitly green-lights the merge

🤖 Generated with Claude Code

…n/ontology Adds the NDI-python stack (vlt + did + ndr + vhlab-toolbox-python + ndi-compress) to the Railway image and wires three new capabilities into existing services: 1. VHSB text-tag decoding via vlt.file.vhsb_read — unblocks Haley + any other VHSB-formatted dataset in QuickPlot / signal-chart. Today every VHSB request returns the legacy "vlt library not available" soft error because the early-return prefix check on b"This " never falls through to a real decoder. 2. NDI-compressed binary detection + decompression via ndicompress.expand_* — handles the .nbf.tgz wrapper format. Today these silently fail in _parse_nbf. 3. NDI ontology fallback in OntologyService — fires when the existing external providers miss. NDI's bundled lookup knows lab-specific terms (NDIC, WBStrain, Cre lines) that public OLS providers don't. Image size cost: ~80-100 MB. We pip install with --no-deps and handpick the runtime deps in requirements.txt + pyproject.toml so matplotlib (~50-70 MB) and opencv (~80 MB) — both declared by DID-python + tutorials but never imported by our paths — stay out of the image. Strategy: ALL changes go through a new services/ndi_python_service.py that lazy-imports + degrades gracefully when the stack isn't installed (returns None on every call, callers fall through to their legacy paths). So even on a build where the NDI install fails, the public surface keeps working. Branch protection: this work targets a separate "experimental" Railway environment for byte-for-byte audit against production before any merge to main. See docs/plans/2026-05-13-railway-experimental-env-runbook.md for the dashboard-step walkthrough. Test plan: - [x] 19 new unit tests cover the dispatch logic with NDI uninstalled (CI doesn't pip-install NDI; tests use sys.modules mocking) - [x] All 44 existing binary tests + 535 other tests pass unchanged - [x] mypy strict mode clean (with ndi/vlt/ndicompress in missing-imports override matching the scipy + opentelemetry pattern) - [x] ruff lint clean - [ ] Build the experimental Railway env from this branch - [ ] Run apps/web/scripts/audit-public-api.mjs (Layer 1) + the e2e audit-public-pages spec (Layer 2 + 3) against production vs experimental — see plan doc for the workflow Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@main

… strict-boot Adds the first Plotly-charts pipeline backend half: a tabular_query endpoint that aggregates ontologyTableRow documents into per-group statistics (mean, median, std, q1/q3, min/max, count) plus the raw values needed for a violin-with-jitter render. Powers Dabrowska EPM and Fear-Potentiated Startle plots, Bhar chemotaxis violins, and any future categorical-by-group comparison. Endpoint: GET /api/datasets/{id}/tabular_query ?variableNameContains=ElevatedPlusMaze # required substring &groupBy=treatment_group # optional &groupOrder=Saline,CNO # optional CSV Pipeline: 1. SummaryTableService.ontology_tables(...) returns one group per distinct variableNames schema (already exists, used by Document Explorer) 2. tabular_query_service finds the first group with a column whose key/label matches the substring 3. Buckets rows by groupBy column (or single 'all' bucket if unset) 4. Computes stats on the full value list, then stride-samples down to MAX_VALUES_PER_GROUP=500 for the violin's jitter overlay (stats stay accurate) 5. Caps at MAX_GROUPS=20 with first-seen ordering + explicit group_order override Hygiene fixes folded into this commit: - Pin all five NDI-python git deps to specific SHAs in infra/Dockerfile (replaces @main which silently picks up upstream changes on every redeploy). SHAs captured 2026-05-13. To upgrade: re-run `git ls-remote <repo> HEAD` and bump. - Strict-on-boot NDI check: when NDI_PYTHON_REQUIRED=1 (set by the Dockerfile), lifespan hard-fails if vlt/ndicompress/ndi.ontology aren't importable. Catches broken images at boot instead of letting the chat silently degrade with every NDI tool returning None. Tests: 21 new unit tests cover the violin aggregation math + edge cases (no matches, no group_by, group_order overrides, group cap, value cap with accurate stats, NaN/Inf skipping, empty substring, no ontology docs). All 559 backend tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Smoke-testing the new endpoint against the experimental Railway deploy surfaced an issue with real-world data: ontologyTableRow tables routinely have multiple columns sharing a topic prefix (e.g. 'ElevatedPlusMaze: Test Identifier' + 'ElevatedPlusMaze: Open Arm Entries' + …). The first-match logic picked the test- identifier column → no numeric values → empty violin response ('no numeric values in matched column'). Fix: score each matching column by how many rows have finite- numeric values, and pick the column with the most numeric rows across all matching groups. Ties broken by first-seen order (groups are already sorted by row count desc upstream). Adds one test (test_violin_groups_prefers_numeric_column_over_ identifier) covering the real-world layout. All 19 tabular_query tests pass; 561 backend tests pass overall. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Smoke-testing against real Dabrowska EPM data showed that the LLM naturally calls groupBy='Treatment' but the actual ontology column key is 'Treatment_CNOOrSalineAdministration'. Exact-match returns empty groups; the user/LLM has no way to know the canonical key. Fix: resolve groupBy via the same substring-match strategy as the value column. Exact key match wins; then substring against keys; then substring against labels. When nothing matches, return empty + a meta listing the column keys we DO have so the caller can retry. Verified live: 'Treatment' now resolves to 'Treatment_CNOOrSaline Administration', returning Saline (n=22, mean=5.86) + CNO (n=23, mean=5.09) for EPM open-arm-north entries. Tests: +2 cases (substring resolution + unresolvable-with-available). All 562 backend tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ding Three backend additions wiring the new labchat chat tools. ## image_service + /api/datasets/:id/documents/:docId/image Pillow-based decoder for TIFF/PNG/JPEG/GIF documents (with NDI-native .nim formats flagged as 'unsupported' rather than failing). Downsamples >512px via Pillow thumbnail; multi-frame support via `frame=N`. Returns structured envelope with width/height/min/max/format + downsampled flag. Errors get errorKind={notfound, decode, unsupported}. Same SSRF hardening as existing binary download path (cloud-allowlisted hosts only). 18 unit tests. ## summary_table_service.distinct_summary Backend now computes per-column distinct value counts + top-K most common values across ALL rows of a class-table response (not just the page slice). Cached under the same TTL as columns+rows. Caps at DISTINCT_SUMMARY_MAX_ROWS=10_000 (skips with a meta sentinel for very large tables). Top-K = 5 per column. The LLM sees this as `distinctSummary` on the query_documents tool result. Existing test extended with TestHashable + TestBuildDistinctSummary covering: Dabrowska 49-row collapse case, multi-value top-K, None/dict/ malformed handling, skip-when-too-many path. ## Sprint 1.5: dataset_binding_service + /api/datasets/:id/ndi_overview Lazy in-memory LRU cache (max 5) wrapping ndi.cloud.orchestration. downloadDataset. Per-dataset asyncio.Lock prevents concurrent cold-load races. 90s cold-load timeout, 5GB on-disk soft cap with warn log. Returns: element_count, subject_count, total epoch_count across all elements, first 50 element {name, type} pairs, cache_hit + cache_age. Pre-warm at app boot (production/preview only) for the 3 demo datasets — fire-and-forget asyncio.create_task, cancellable on shutdown. is_ndi_available() extended via is_dataset_binding_available() probing ndi.dataset + ndi.cloud.orchestration imports. Endpoint surface: GET /api/datasets/:id/ndi_overview with limit_reads rate limit. 503 envelope { error, reason } on any failure so the frontend can fall back to ndi_query gracefully. 60s request timeout. 12 unit tests + 2 live-cloud integration tests (gated behind LIVE_NDI_TESTS=1). ## NDI cache directory NDI_CACHE_DIR env (default /tmp/ndi-cache). Already-writable by the ndb user in the Dockerfile. /tmp is ephemeral on Railway redeploys — pre-warm at boot mitigates for the 3 demo datasets; user-driven cold-loads of other datasets pay the 10-30s tax. ## Open risk downloadDataset against the published demo datasets ANONYMOUSLY (no service-account token) is unverified in the Railway environment. If auth is required, the 503 fallback keeps chat usable. Plan B: add a service-account token via Railway env vars and retry. Combined backend test count: 603 unit + 110 integration (was 562 unit + 79 integration pre-wave). All green.

The previous response shape gave the frontend a single arbitrary docId (`source.document_id = doc_ids[0]`) for the entire aggregation — misleading when the chart is summarizing dozens of rows across 2+ groups. The user can verify the column being compared via the ontology-tables view, but couldn't drill into specific group examples ("show me one Saline row, one CNO row"). This change threads per-row docIds through the bucketing so each output group surfaces: - docIds: list[str] (capped at MAX_DOC_IDS_PER_GROUP=3) - totalRows: int (true contributing row count) `source.document_id` is preserved for backwards-compat but is no longer the primary citation path. Implementation notes: - `SummaryTableService.ontology_tables` already returns docIds parallel to rows (same index). The service now enumerates rows by index and routes each row's docId to the appropriate bucket. - Missing-docId desync is tolerated silently (rather than padding with bogus IDs). If the projection ever returns fewer docIds than rows, affected buckets just get shorter docId lists. - 3 docIds per group is enough for the chat to build per-group sample-row chips without flooding the citation panel. The complete set of contributing rows is reachable from the primary table-view citation. Tests: - Extended test_violin_groups_basic to assert per-group docIds + totalRows on the standard 2-group Saline/CNO shape. - Added test_violin_groups_per_group_doc_id_cap covering the 10-row cap behavior — Saline contributes 10 rows but only 3 docIds surface; CNO with 2 rows surfaces both. - Added test_violin_groups_missing_doc_ids_tolerated covering the desync case — service must not crash or invent IDs. 23/23 unit tests pass. This is the backend half of the user-reported "chart citation seems to point to one arbitrary row, not the column or table of entries being aggregated" fix. The frontend (cloud-app) side will land separately and consume the new per-group docIds.

Accidentally staged in the previous commit (6aebed9). Identical content to file_format.py modulo a unicode hyphen in a docstring. CI hygiene gate would reject it; removing now.

… data browser User observation: 'I can point to at least one place where ontology is not resolving in the data browser.' Root cause: ``OntologyService.lookup`` short-circuits on cache hit BEFORE reaching the NDI-python fallback added in Phase A (2026-05-13). Pre-Phase-A lookups of lab-specific terms like ``WBStrain:00000001`` or ``NDIC:1`` returned an empty stub (``label=None``, ``definition=None``) from the legacy provider, and those stubs got cached. ``ONTOLOGY_CACHE_TTL_DAYS`` defaults to 30, so the stale stubs would persist for ~a month and prevent the new fallback from ever firing — every consumer of the ontology service (DatasetSummaryCard pills, FacetPanel filters, SummaryTableView cells, ChatBot's lookup_ontology tool) sees the un-resolved CURIE until the stub expires. The fix: - ``lookup`` now treats stubs as cache MISSES: a stub (label=None AND definition=None) does NOT short-circuit. The legacy provider + NDI-python fallback both run again, and on a real hit the new ``self.cache.set`` OVERWRITES the stub with the resolved entry. - ``batch_lookup`` inherits the fix automatically (it delegates to ``self.lookup`` per term). Stuck stubs heal on first use — no admin endpoint or cache wipe needed. Worst case for a term that NEITHER the legacy providers NOR NDI-python can resolve: we keep the stub and re-attempt on every call. That's negligible network traffic (NDI-python is local; legacy providers have their own throttling) and only happens for genuinely unknown terms. Tests: 6 new tests covering: - Real cache hit still short-circuits (no upstream calls) - Stub cache entry triggers retry → NDI-python wins → cache overwritten with real hit - Both legacy and NDI-python failing keeps the existing stub (no redundant write) - Fresh term with legacy hit doesn't call NDI-python (it's a fallback, not a co-resolver) - Fresh term where legacy returns stub falls through to NDI-python - Batch lookup inherits the bypass behavior After this lands and Railway redeploys, every data-browser surface that displays ontology terms (Dataset hero pills, facet filters, summary table cells, chat lookup_ontology) automatically benefits without any frontend changes.

…ar_query error envelope Aggregated from audit agents that returned with backend findings. ## Critical — every consumer of OntologyService was leaking raw CURIEs * **_fetch_wormbase echoed strain_id as label**: pre-fix, the WormBase provider returned `label=strain_id` (e.g. "00000001") WITHOUT actually scraping the strain page. This is a "label-is- truthy" stub that bypassed the NDI-python fallback — `WBStrain:00000001` showed up as "00000001" on the Bhar dataset hero pills, the /query Strains facet, and the OpenmindsSubject table. The visual UX audit caught this as a P0. Fix: return label=None so the NDI-python fallback fires (which knows the strain name → "N2 wild-type"). The dataset_summary_service's `_enrich_ontology_labels` then writes the real label onto the term and the cache stub gets overwritten on first use. * **UBERON / GO / OBI missing from _OLS_PROVIDERS dict**: pre-fix, the provider allowlist was `{CL, NCBITaxon, CHEBI, PATO, EFO}`. UBERON is the most common prefix in NDI data (every brain_region and probe_location), but UBERON requests fell through to a stub. `UBERON:0001870` showed `label=null` on every popover, hero pill, and column rendering. OLS4 has the same query endpoint for every OBO ontology — adding UBERON/GO/OBI to the dict is a 3-line fix that unblocks the entire OBO ontology family. ## Important — tabular_query router error envelope * **tabular_query 500 → typed 503 on cloud errors**: the router raised cloud-client exceptions through FastAPI's global handler, producing opaque 500 JSON envelopes the chat tool layer couldn't surface usefully. Now mapped to 503 with `{error, errorKind, reason}` matching the discipline of /ndi_overview. Adds structured log at `tabular_query.cloud_error`. ## Tests * 29 backend tests pass (23 tabular_query + 6 ontology_service). * All 611 unit tests pass overall. ## What this restores After Railway redeploys: - WBStrain pills resolve to actual strain names ("N2 wild-type" instead of "00000001") across all data-browser surfaces. - UBERON popovers resolve ("frontal cortex" instead of null). - Cloud failures on the violin chart endpoint produce a typed error the chat can paraphrase, not an opaque 500. ## Combined with the frontend commit (293ddea) That commit fixed the chat's lookup_ontology field-name mismatch (was reading res.name when backend returns res.label, so every ontology lookup reported "found: false"). Together, these two commits make ontology resolution functional end-to-end — the user-reported "ontology not resolving in the data browser" issue, plus several similar bugs the chat-side tool was hitting.

…solve Visual-UX audit (a395 P0 #3, 2026-05-14) reported every anonymous summary-table view triggered a 403 from /api/ontology/batch-lookup, falling back to label-only display and surfacing a "1 warning · Some entries lack canonical ontology IDs" banner. The endpoint is POST- shaped because the body holds an array of up to 200 CURIEs (keeps the URL clean), but functionally it's a read-only lookup — no state mutation, idempotent. Pre-fix, anonymous visitors hit it on every dataset render before they'd had a chance to GET /api/auth/csrf, so the double-submit token check failed closed. Adding /api/ontology/batch-lookup to EXEMPT_PATHS lets anonymous requests through. The endpoint remains rate-limited via the existing limit_reads dependency in the router, so abuse vectors stay bounded (200 CURIE batches × per-IP read budget). Verification: - Backend unit test added: asserts /api/ontology/batch-lookup is in EXEMPT_PATHS so the exemption can't be silently lost in a refactor. - All 612 backend unit tests pass (+1 from the new test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two fixes in the ontology + facet pipelines surfaced by the visual-UX audit (Bhar dataset overview + /datasets / /query facet rail). WBStrain scrape (~P1, demo-relevant) ------------------------------------ NDI-python's WBStrain provider returns a URL but no label, so every WBStrain CURIE on the Bhar overview resolved to the bare strain ID ("WBStrain:00000001") instead of the strain name ("N2"). Add a scrape fallback inside `OntologyService._fetch_wormbase` that GETs the canonical wormbase.org strain page and parses the strain name from `<title>` (primary anchor: `<title>N2 (strain) - WormBase ...</title>`) with a page-title-breadcrumb regex as the secondary parser. 5s timeout; any failure (Cloudflare interstitial, 404, parse miss, network error) falls through to `label=None` so the existing NDI-python fallback path still fires. Result is cached via the existing OntologyCache so each strain page is hit at most once per 30-day TTL. No new dependency — pure httpx+re inside the existing async client. Six unit tests cover the happy path, breadcrumb fallback, Cloudflare-interstitial blocking, 404 fallthrough, network-error fallthrough, and end-to-end caching. Caenorhabditis facet dedup (visual-UX audit row #6 / a395) ---------------------------------------------------------- `/datasets` and `/query` showed two "Caenorhabditis elegans" chips because one contributing dataset reported the species with `ontologyId=NCBITaxon:6239` and another with `ontologyId=None`. The prior dedupe keyspace was disjoint — `oid::NCBITaxon:6239` and `norm::caenorhabditis elegans` lived in separate slots — so both surfaced as distinct chips. Backend fix (not frontend): future API consumers like the chat tool get clean facet data instead of having to re-dedupe at every read site. Refactored `_add_ontology_term` to register all candidate keys (oid, abbrev, norm) as aliases on the bucket. Lookup walks them in priority order (oid > abbrev > norm). The norm/abbrev alias merge honours an asymmetric guard: distinct ontologyIds with the same label stay distinct (different providers can legitimately catalog the same name as different concepts — preserves `test_ontology_id_still_takes_priority_over_label_normalization`). On promotion, the surviving entry inherits the labeled side's ontologyId. Verification ------------ - 628 backend unit tests pass, +8 from new tests (6 ontology scrape + 2 facet labeled/unlabeled merge). - ruff clean on touched files. - Bug repro before fix: 2 chips. After fix: 1 chip with NCBITaxon:6239 attribution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two targeted fixes for the experimental /ask chat tool wiring. ## Task 1 — ndi_overview 503 envelope now carries a stable `code` `DatasetBindingService` records the most-recent cold-load failure on `self._last_failure = (code, message)` and exposes it via `last_failure()`. The `/ndi_overview` router surfaces the code alongside the existing `reason` string so the chat tool (and operators in the dashboard) can tell `phase_a_unavailable` apart from `binding_unavailable`, `cache_dir_unwritable`, `cold_load_timeout`, or `cold_load_failed`. The generic 503 → "use ndi_query" fallback semantics are unchanged. Also adds an explicit `is_dataset_binding_available()` short-circuit so the binding never tries to import `ndi.cloud.orchestration` lazily after the boot probe already marked it missing — surfaces as a clean `binding_unavailable` 503 instead of bubbling an ImportError through the cold-load timeout. Production effect on the experimental Railway: `/ndi_overview` for 67f723d574f5f79c6062389d goes from a generic "NDI-python dataset materialization failed or is not configured on this server" string to `code=binding_unavailable, reason="ndi.dataset / ndi.cloud.orchestration not importable"` — same 503 status, just diagnosable. ## Task 2 — probe→element / epoch→element_epoch class alias `SummaryTableService._build_single_class` now retries an `isa <alias>` query when the literal class returns 0 IDs. Pinned to: probe → element epoch → element_epoch The projection key stays the user-requested class so `PROBE_COLUMNS` / `EPOCH_COLUMNS` are emitted regardless. The resolved class is logged (`table.single.alias_hit`, `resolved_class=` in the build log) for forensics. Smoke-tested against Dabrowska BNST (id 67f723d574f5f79c6062389d, 0 probe docs + 606 element docs): pre-fix `GET /api/datasets/.../tables/ probe` returned 0 rows even though `summary.probeTypes` listed patch-Vm / patch-I / stimulator. Post-fix the same endpoint returns 606 element rows under the probe column shape — which is what the chat's `query_documents(className="probe")` tool consumer already expects. Legacy datasets that DO emit `probe` directly (Van Hooser) skip the alias path entirely. Coverage: 4 new alias tests in test_summary_table_class_alias.py + 7 new `last_failure()` contract tests in test_dataset_binding_service.py. 628 unit tests pass (was 612 on this branch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Moves the Gantt-style treatment-timeline tool from the Next.js chat layer (apps/web/lib/ndi/tools/treatment-timeline.ts) into the backend so the heart of NDI processing lives next to ndi-python. The TS handler now shrinks to a thin proxy that forwards {datasetId, title, maxSubjects} to this endpoint and reshapes the raw response into the chat-specific chart_payload envelope. ## New endpoint POST /api/datasets/{dataset_id}/treatment-timeline Body (pydantic v2 with camelCase + snake_case aliases): {title?, maxSubjects? | max_subjects?} # default 30, hard cap 100 Response (raw — no chart_payload framing on the backend): { datasetId, title?, items: [{subject, treatment, start, end}], total_subjects, total_treatments, temporal_source: "explicit" | "ordinal" | "mixed", empty_hint?: {reason, available_columns?} } Auth: get_current_session dependency — public datasets work anonymously, private datasets honor the session cookie. Rate-limited under the standard reads bucket via Depends(limit_reads). ## Orchestration 1. PRIMARY: SummaryTableService.single_class(dataset_id, "treatment") returns rows with treatmentName + subjectDocumentIdentifier + numericValue + stringValue + treatmentOntology. 2. FALLBACK (only if primary empty): TabularQueryService.violin_groups(dataset_id, "Treatment", ...) synthesizes one row per group with subject="group:<name>". 3. Per-subject ordinal counter assigns [i, i+1] timing when no explicit start/end is present on a row. 4. temporal_source distinguishes all-explicit vs all-ordinal vs mixed so the LLM can disclose the timing caveat in prose. 5. empty_hint is set only when both backends are empty OR when rows came back but none had a usable subject+treatment pair. ## Tests 32 unit tests in test_treatment_timeline_service.py covering: - Pure helpers: subject/treatment label fallback, explicit-timing extraction (numericValue array/scalar, startDate/endDate pair, ISO string parsing), temporal-source classification. - Primary happy path: 5 treatments x 3 subjects, explicit timing. - Ordinal timing: rows without numericValue → per-subject [i, i+1]. - Mixed timing: some explicit, some ordinal → temporal_source='mixed'. - maxSubjects=30 cap on 50 distinct subjects → trims to 30. - Primary empty + fallback hits → synthetic group:<name> rows. - Primary empty + fallback empty → empty_hint with reason + available_columns. - Edge cases: rows missing subject/treatment dropped silently, unplottable-rows variant of empty_hint, primary failure falls through to fallback, both failures still return well-typed response with empty_hint set (not 500). ## Deviations from TS - chart_payload + references_summary stay in the TS proxy. The Python endpoint returns RAW data so the workspace can consume it directly without unwrapping chat-specific framing. - References array (per-subject citation chips) likewise stays in the TS layer — that's chat-tool framing, not data. - Otherwise byte-for-byte semantics: subject ordering, ordinal counter mechanics, temporal_source discriminator, empty_hint reasons, and the [start, end] field ordering all match. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Port the spike-summary discovery + per-unit shaping from the TS chat tool (apps/web/lib/ndi/tools/fetch-spike-summary.ts) to a Railway service so the heart of NDI processing lives next to ndi-python. - backend/services/spike_summary_service.py — three-mode discovery (unit_doc_id / unit_name_match / bare-scan), per-unit ``data.vmspikesummary.spike_times`` extraction with the canonical field-path probe order (spike_times → spiketimes → sample_times), numpy-backed ISI computation in ms, stride-sample caps mirroring the TS handler (5000 spikes/unit, 5000 ISIs/unit), kind-gated field omission so raster-only / isi-only callers get compact responses. - backend/routers/spike_summary.py — POST /api/datasets/{id}/spike-summary; reuses get_current_session + limit_reads, returns raw per-unit data (no chat-specific chart_payloads — TS layer reshapes), translates cloud failures to a typed 503 envelope. - 27 unit tests covering single-doc path, query path, max_units cap, stride-sample cap, t_window filter, kind gating, soft-error envelope on decode failure, camelCase alias round-trip, and the spike-times field-path fallback chain. Bypasses QueryService.execute and calls cloud.ndiquery directly because QueryService's scope validator enforces a Mongo-ObjectId regex that the existing dataset_id path-validator already covers (matches the PivotService pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add a new aggregator endpoint that turns one vmspikesummary unit + one stimulus document into a binned firing-rate-per-bin response, the canonical sensory-neuroscience visualization. Mirrors the spike-summary / treatment-timeline ports: TS chat tool becomes a thin proxy, heart of NDI processing stays next to ndi-python on Railway. - backend/services/psth_service.py — orchestration entry point ``compute_psth(request, ...)``. Resolves the unit (probes data.vmspikesummary.{spike_times, spiketimes, sample_times} then falls back to BinaryService.get_timeseries for non-inlined trains) and the stimulus events (canonical NDI paths first: data.stimulus_presentation.presentations[*].time_started, data.stimulus_response.responses[*].stim_time, then data.events and top-level events for preprocessed payloads). numpy.histogram does the binning across all trial-relative spike times. Returns parallel bin_centers / counts / mean_rate_hz arrays plus n_trials, n_spikes, unit_name, doc-id provenance, optional per_trial_raster when include_raster=True. Server-side caps: bin_size_ms >= 1 ms, (t1-t0) <= 10 s, N_bins <= 1000. - backend/routers/psth.py — POST /api/datasets/{id}/psth. Reuses get_current_session + limit_reads + DatasetId validator. Returns raw scientific data (no chart_payload/references decoration — TS layer reshapes for the chat fence; workspace consumes directly). Soft-error envelope (error + error_kind) for invalid_window / decode_failed / no_events / empty_window cases so the chat tool branches on error_kind instead of crashing. Cloud-tier exceptions translate to a typed 503 envelope, matching /spike-summary. - backend/tests/unit/test_psth_service.py — 30 tests covering pure helpers (spike-times + event extraction across all four doc-class paths, bin layout, raster cap, window validation) and the compute_psth integration (happy-path parallel arrays, empty-window envelope, include_raster, cap enforcement, decode_failed + no_events soft errors, binary fallback, camelCase alias).

Live verification (2026-05-18) shows the treatment-timeline endpoint already returns 56 items / 28 subjects for Haley (`682e7772cdf3f24938176fac`) — F-1e's merge-all-rows chain walker plus `_row_treatment`'s literal-`treatment` branch surface the 56 food-restriction-onset/offset docs correctly, and `_pick_subject_label` + `_pick_treatment_label` accept their values. What was NOT working: `temporal_source` came back `"ordinal"` for every Haley row because `_parse_iso_datetime` couldn't read the MATLAB `datestr` default format (`"03-Nov-2023 07:53:00"`) that Haley's `treatment.string_value` carries. Result: the Gantt's x-axis showed synthesised ordinal slots 0..N instead of real wall-time onset/offset events — accurate but visually misleading (every subject's onset appeared at "slot 0" rather than its true date in early November 2023). This commit: - Adds a MATLAB `datestr` fallback to `_parse_iso_datetime`. ISO still wins via `datetime.fromisoformat` (regression-pinned); only inputs that fail ISO try the MATLAB formats (`%d-%b-%Y %H:%M:%S` and the date-only `%d-%b-%Y`). Locale assumption (`C`/`en_*`) matches the Railway container shape. - Tightens the `dt: datetime | None` annotation so mypy doesn't flag the dual-source assignment. - Adds 7 unit tests: - Three covering the new `_parse_iso_datetime` MATLAB branch (`dd-MMM-yyyy hh:mm:ss`, date-only, garbage-still-None). - One ISO regression pin so the new branch can't accidentally shadow `fromisoformat`. - One end-to-end via `_extract_explicit_timing`. - Two `_fetch_primary_rows` integration tests covering literal-only (Haley shape, 56 rows, single contributing class) and merged literal+subclass (chain merges rows from multiple classes). - Adds 2 `_row_treatment` projection tests that pin the literal- `treatment` branch against the exact Haley doc shape (curl'd from the experimental backend) and an ISO-flavoured variant. Cache schema unchanged. The summary-table response shape is identical (no new columns); only the timeline endpoint's `temporal_source` value can shift from `"ordinal"` to `"explicit"` or `"mixed"` for datasets that emit MATLAB datestr stringValues, and the timeline endpoint does not cache. Acceptance: - `/api/datasets/682e7772cdf3f24938176fac/treatment-timeline` POST returned `total_subjects=28`, `total_treatments=56`, `temporal_source="ordinal"` pre-fix; post-fix temporal_source will be `"explicit"` (all 56 rows parse). Live curl confirmed the endpoint is non-empty. - 1026 tests pass (up from 1017; +9 new). - Lint + typecheck baseline preserved (pre-existing N802 + 2 type errors in untouched files unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The `_pick_default_signal_ref` heuristic that landed in e03d470 fixed the `channel_list.bin`-instead-of-`.nbf` bug on `get_timeseries`. This commit sweeps the rest of the binary- decode endpoints: Audited endpoints and their disposition: - `/api/datasets/{id}/documents/{id}/signal` (signal.py) — already delegated to `BinaryService.get_timeseries`, which uses the smart pick. Already benefits transitively; no code change. - `/api/datasets/{id}/spike-summary` (spike_summary.py) — reads spike_times from the JSON body inline (no binary file decode). No file-pick involved. - `/api/datasets/{id}/documents/{id}/image` (image.py via ImageService.fetch_image) — picked `refs[0]` blindly. FIXED via new `_pick_default_image_ref` (Pillow-aligned extension list, same metadata blocklist as the signal pick). - `/api/datasets/{id}/psth` (psth.py via psth_service) — uses `binary_service.get_timeseries`. Already benefits transitively; no code change. - `/api/datasets/{id}/documents/{id}/data/image` (binary.py via BinaryService.get_image, Document Explorer's image viewer) — same `refs[0]` bug. FIXED to use `_pick_default_image_ref`. - `/api/datasets/{id}/documents/{id}/data/raw` and `/data/raw` w/ Range (BinaryService.get_raw + get_raw_response) — by design, this is the imageStack passthrough where the caller has already established the doc IS a raw-bytes blob. The contract is "stream refs[0] verbatim" — changing this risks breaking imageStack flows. Left alone per the legacy contract. - `/api/datasets/{id}/documents/{id}/data/video` (binary.py via BinaryService.get_video_url) — videos are single-file in practice; multi-file video docs aren't a real NDI shape. Left alone. - `/api/visualize/distribution` — pure aggregation/stats. No binary file decoding. Implementation: refactors `_pick_default_signal_ref` to delegate to a new shared `_pick_ref_by_extension` helper. Adds the new `_pick_default_image_ref` consumer (same step-1/step-2/step-3 heuristic against `_DECODABLE_IMAGE_EXTENSIONS` = `.tif .tiff .png .jpg .jpeg .gif`). Both share the existing `_KNOWN_METADATA_FILENAMES` blocklist — channel_list.bin / meta.json / channels.json etc. are skipped regardless of which decoder is choosing. 10 new unit tests in `test_binary_default_image_pick.py` mirroring `test_binary_default_signal_pick.py`: TIFF/PNG/JPEG/ GIF variants picked, signal extensions NOT picked for image decode, case-insensitive matching, suffix-with-non-alphanumeric tail (`frame.tif_1`), single-file legacy fallback, all-metadata fallback, and a pin that step-1 (extension match) wins over step-2 (non-metadata fallback) when both apply. Cache schema unchanged. Response shapes unchanged on both endpoints — only file-pick selection changed, so a doc that previously surfaced `errorKind=unsupported` because the picker returned a JSON sidecar will now succeed with a valid image payload. Live signal-endpoint smoke confirmed post-fix (Francesconi daqreader `68d6e54703a03f5cfdac8f07` returns `format=nbf_compressed`, `sample_count=100` against `/signal?downsample=100`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

A "real" session is one with ≥1 other doc carrying depends_on.value pointing at its ndiId. Parent / aggregate session docs (administrative containers like Haley's `haley_2025` parent, ingested 10h after the two leaf recordings `haley_2025_Celegans` and `_Ecoli`) have zero downstream references. MATLAB enumerates only leaf sessions; the cloud's raw class count includes parents. Pre-fix on Haley (682e7772cdf3f24938176fac): - counts.sessions = 3 (raw) - tutorial documents 2 recording sessions - workspace Sessions picker rendered 3 rows, one unusable Implementation: - Adds `DatasetSummaryService._count_real_sessions` that fetches session docs via the existing `_fetch_class_bounded("session")` primitive, then fires one `depends_on * [ndiId]` ndiquery per session against the cloud's indexed reverse-dep path. Sessions with `totalItems > 0` are real; the rest are filtered out. - Skip conditions: 1. `counts.sessions <= 1` — nothing to filter. 2. `counts.totalDocuments <= counts.sessions` — no non-session docs that could be downstream (newly-published catalog, test fixture, etc.). Don't waste ndiquery calls only to fail-open. 3. `counts.sessions > _MAX_SESSIONS_FILTER_WALK` (50) — safety cap; multi-day series virtually always have downstream refs. - Runs as an additional gather leg alongside the existing openminds_subject / probe_location / element fanout, so it adds zero wall-clock latency on the hot path (3-10 indexed reverse-dep queries ≈ hundreds of ms; the structured-facts legs dominate at multi-second scale). - Fail-open semantics: * `_fetch_class_bounded` raises → keep raw count + typed warning * Per-session reverse-dep ndiquery raises → that session is counted as real * Every session looks unreferenced (real_count == 0) → keep raw count (probably a flaky cloud, not a real "all parents" dataset). Emits structured warning log. - Observability: `dataset_summary.session_filter` log line whenever the filtered count differs from raw, recording raw_count, filtered_count, and parent_or_aggregate_sessions. Cache schema unchanged (v7). `counts.sessions` is just an int field that already existed; only its value can shift for affected datasets. Existing cached summaries refresh naturally within their 24h TTL. +12 unit tests: - Canonical Haley case (3 → 2) - Skip on counts.sessions <= 1 - Skip on totalDocuments <= sessions (pure-session test fixture) - Skip over the 50-session safety cap - Fail-open: all-zero downstream → keep raw count - Fail-open: reverse-dep ndiquery 503 → session counted real - Fail-open: session-class fetch fails → keep raw + warning - 5 helper unit tests (constant, _filtered_sessions_or_warn paths, _identity_int) Backend: 1036 → 1048 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The B6 fix in 058107a changes the VALUE of counts.sessions for any dataset with parent/aggregate session docs (Haley: 3 → 2). Existing cached summaries under the v1 prefix would persist for up to 24h (full-success TTL), serving the pre-filter count across the rollout window. Bumping the cache prefix invalidates all v1 entries immediately; the next request runs the producer and writes a v2 entry with the filtered count. The response SHAPE is identical — only the value shifts — so the model's `schemaVersion` literal stays `summary:v1` (clients consuming that field don't need to recompile). Only the cache key namespace changes. Test pins updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…atasets Live verification of B6 on Haley (058107a + cache prefix v2 bump 9523950) showed counts.sessions still = 3 — every Haley session returns 0 downstream depends_on refs because the lab encodes session identity in `session.reference` strings rather than via the depends_on graph: - haley_2025_Celegans (leaf) - haley_2025_Ecoli (leaf) - haley_2025 (parent / aggregate container) The depends_on heuristic correctly returns 0 for all 3, triggering the fail-open path and preserving raw count = 3. The right answer is 2. Adds a structural fallback that fires ONLY when the depends_on heuristic returns 0 across all sessions: A session is a PARENT iff its `session.reference` is a strict prefix (separated by '_') of some OTHER session's reference in the same dataset. Intentionally narrower than "any naming pattern" — requires a SIBLING that extends this reference. A lone `haley_2025` without `haley_2025_<species>` siblings stays counted as real. Multi-level trees collapse to the deepest leaves correctly. Helpers added: - `_session_reference(doc)` — extracts `data.session.reference` with fallbacks through `session_reference` and `name` - `_filter_by_reference_prefix(session_docs)` — returns leaf count, or None when ambiguous (missing refs, all refs identical) The two heuristics now compose: 1. depends_on returns ≥1 real → use that count (canonical signal) 2. depends_on returns 0 → try prefix-suffix; if conclusive use it 3. Both inconclusive → fail-open with raw count + audit log The `via:` field in the structured log records which heuristic fired (`depends_on` vs `reference_prefix`) so operators can audit the rollout across all 8 published datasets. +10 new unit tests: - Canonical Haley case (3 → 2 via prefix) - No-parent shape (2 leaves stay real) - Missing reference → None (bail to fail-open) - All sessions share same reference → None - Underscore separator required (no false positives on `haley` → `haley2025`) - 4-level hierarchy collapses to deepest leaves - Single session → None - _session_reference extracts via session.reference / .session_reference / .name fallback chain, returns None when block absent Backend: 1048 → 1058 tests passing. mypy --strict baseline preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The cc64299 commit added the session.reference prefix-suffix fallback, but the v2 cache entries from 058107a + 9523950 still have the pre-fallback Haley count = 3. Bumping again invalidates those so the next request runs the new filter and writes a v3 entry with sessions = 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Live Haley still returns sessions=3 after the prefix-fallback ship, which means either (a) the heuristic branch never fires, or (b) it sees session docs without the expected `data.session.reference` shape. Add structured diagnostic log to capture exactly what `_count_real_sessions` sees in production: - fetched_session_docs count - prefix-filter computed value (or None) - sample doc top-level keys - sample data-level keys - extracted references for each session Bumps cache prefix to v4 to force the next request to re-run the producer through the new code path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Railway structured-log feed doesn't surface our structlog output. Push the prefix-fallback diagnostic into extractionWarnings temporarily so the next curl tells us exactly: - fetched_session_docs count - prefix_filtered_value (None / N / raw) - sample_refs extracted - sample_data_keys (so we can see if the doc shape has the `session` block we expect) Bumps cache prefix to v5 so the next request runs a fresh build through this diagnostic path. Will revert the warning push once B6 is verified working end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous diagnostic was inside the prefix-fallback block, but that branch only fires when real_count == 0 from depends_on. If depends_on returns >0 in production (unlikely per dependency-graph manual curls but apparently happening), the function returns early and the diagnostic never fires. Moves the B6-DEBUG warning to right after real_count is computed so it always surfaces. Reports the full results list so we can see which sessions are flagged True vs False vs exceptions. Bumps cache to v6 to force a fresh build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…v7 cache The diagnostic revealed the production behavior on Haley: ALL 3 sessions return >0 via depends_on (even the parent), because Haley publishes a `dataset_session_info` admin doc that depends_on the parent session ndiId. Without the doc-class-name oracle to distinguish "admin-only reference" from "experimental- data reference", depends_on alone returns the raw count. New composition policy: 1. Compute prefix-suffix filtered count (None / N). 2. If prefix returns a conclusive `0 < N < raw_count`, USE IT regardless of depends_on result. The structural signal "session B's name extends session A's name with `_`" is hard to satisfy coincidentally — sibling sessions don't typically share parent-extending names unless there's a real parent/child relationship. 3. Else, use depends_on count if > 0 (canonical for labs that use the dependency graph for session identity). 4. Else, fail-open with raw count. Removes the temporary B6-DEBUG diagnostic warnings. Bumps cache prefix to v7 to invalidate v6 entries cached during the diagnostic rollout. +2 new tests pin the composition: - `test_prefix_refines_depends_on_when_parent_has_admin_ref` — THE canonical Haley production case: all 3 sessions True via depends_on (parent referenced by admin doc), prefix returns 2, composition picks 2. - `test_depends_on_used_when_prefix_inconclusive` — when sessions lack reference fields entirely, prefix returns None and depends_on canonically filters. Backend: 1058 → 1060 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`probe` is a Python runtime alias for `element` (per services/class_aliases.py). Many datasets — Francesconi (67f723d574f5f79c6062389d) is the canonical case with 0 probe + 606 element + 3 probeType facets — emit no literal `probe` documents at all, so `counts.probes` rendered 0 on the snapshot tile and contradicted the catalog's probeTypes facet. The fallback path in `_counts_from_raw` was already in place from the prior arc. This commit: - tightens the log event name to `dataset_summary.probes_alias_resolved` and includes both raw + aliased values for observability - bumps SUMMARY_KEY_PREFIX v7 → v8 to invalidate stale entries cached before the fallback shipped (response SHAPE unchanged; only the `counts.probes` value can shift) - adds three regression tests pinning the three branches: literal probe non-zero → use literal literal probe zero + element non-zero → use element (alias hit) both zero → 0 Cache schema bump also requires updating the v7 literal in test_user_cache_keys_are_isolated to v8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…etch `/ndiquery` may return slim `{id, datasetId}` pairs (or bare id strings under a single-dataset scope) instead of full doc bodies, depending on search scope. The pre-F-7 code path read `body.get("documents")` and fed those entries straight to `_extract_numeric(doc, "data.subject.weight_grams")` — which silently returns None for slim refs (no `data.*`). Aggregations against datasets where ndiquery returned slim refs would report numeric_matches=0 even when valueField was present in every body. Fix: after ndiquery returns, classify each entry. Full bodies pass through (no-op fast path — zero extra cloud calls). Slim refs queue for chunked `bulk_fetch` (BULK_FETCH_MAX=500 per call), grouped by dataset_id since bulk_fetch is per-dataset. Bare id strings under a single-dataset scope attribute to that scope; under any other scope they're unattributable (bulk_fetch needs a dataset_id) and dropped with a structured warning. Concurrency bounded by MAX_CONCURRENT_BULK_FETCH=6, matching summary_table_service. Two structured log events for observability: - aggregate_documents.hydrated_via_bulk_fetch (info; per-call summary) - aggregate_documents.bare_ids_dropped (warning; cloud-shape anomaly) Numeric equivalence pinned by test: same fixture run as (a) full-body inline and (b) slim-refs-then-bulk-fetch produces byte-equal {count, mean, median, std, min, max} per group, identical numeric_matches, identical datasets_contributing. Tests (+6): - test_hydrates_slim_refs_via_bulk_fetch — bulk_fetch IS called when refs are slim - test_full_body_path_skips_bulk_fetch_no_op — happy path latency untouched - test_per_doc_vs_bulk_numeric_equivalence — the regression pin - test_hydration_chunks_at_bulk_fetch_max — 600 refs → 2 batches (500 + 100) - test_hydration_chunks_per_dataset — refs spanning 2 datasets fan out into 2 bulk_fetch calls - test_hydration_handles_bare_id_strings_under_single_dataset_scope No cache schema bump (response shape unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The /tabular_query router exposes both GET and POST variants so the cloud-app's POST proxy at /api/datasets/[id]/tabular-query can forward its body 1:1 without GET-vs-POST translation. Both routes share the same `_dispatch` helper and the same `TabularQueryBody` Pydantic validator (which mirrors the GET query-param contract), so the responses MUST be byte-identical for equivalent input. The dispatcher refactor + body validator landed during the prior arc — this commit pins the contract with integration tests: Tests (+4 against the existing app_and_cloud fixture): - test_tabular_query_get_and_post_return_identical_shape — same full-param request (variableNameContains + groupBy + groupOrder) yields byte-equal response bodies. - test_tabular_query_get_and_post_handle_optional_params_identically — omitting optional params likewise produces equivalent output. - test_tabular_query_post_rejects_missing_variable_name — POST body validator surfaces the typed 400 VALIDATION_ERROR envelope. - test_tabular_query_get_rejects_missing_variable_name — GET path surfaces the same envelope. The integration tests use the empty-ontology happy path (`_install_empty_ontology_mocks`) so the assertions don't depend on the cloud's real document corpus — both routes still exercise `_dispatch` → `TabularQueryService.violin_groups` end-to-end, just through the `_empty_response` branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Re-implements the S5.3 backend after the prior session's git reset during agent collision discarded the ~400 LOC service code. Full design preserved in apps/web/docs/reviews/2026-05-19b-post-handoff-execution.md and re-implemented deterministically from that spec. Backend changes: - backend/services/tabular_query_service.py - MAX_PAIRS = 1000 cap (matches MAX_GROUPS × MAX_VALUES_PER_GROUP scale) - _TREATMENT_CLASS_CHAIN, _SUBJECT_KEY, _TREATMENT_LABEL_FIELDS constants - _find_matching_group(..., exclude_group_idx=) — Y-side cross-table search forces a DIFFERENT group than X (preserves violin semantics when exclude_group_idx=None) - cross_table_pairs orchestrator + _cross_table_pairs_subject + _cross_table_pairs_treatment + _build_treatment_subject_map - 9 module-level helpers: _empty_pairs_response, _index_of_group, _build_subject_value_map, _build_subject_group_map, _columns_for_pair_group_by, _inner_join_pairs, _inner_join_treatment_pairs, _order_pairs_by_group, _pick_treatment_label_for_needle - backend/routers/tabular_query.py - CrossTableQueryBody Pydantic model (xVariableContains, yVariableContains, joinOn, groupBy?, groupOrder?) - POST /api/datasets/{dataset_id}/cross-table-query handler with 503 envelope on CloudInternalError/CloudUnreachable/CloudTimeout (mirrors the violin path's discipline) Tests (backend/tests/unit/test_tabular_query_service_cross_table.py): - 52 new unit tests covering subject-join happy path, groupBy resolution (X-first-then-Y), groupOrder, treatment-join with auto-color fallback, treatment chain walking with last-write-wins, flaky single_class recovery, empty-state diagnostics, MAX_PAIRS cap, and the 9 helper functions individually. Response contract matches the cloud-app's BackendCrossTableResponse type at apps/web/lib/ndi/tools/cross-table-query.ts — cloud-app side was already wired and waiting (handlers + scatter chart + panel toggle all pushed prior). CI: ruff clean, mypy --strict reports 4 PRE-EXISTING errors in untouched files (zero new errors from this change), 1125 tests pass (was 1060 + F-8 pin pre-arc). Refs: 2026-05-19b-post-handoff-execution.md "S5.3 detail" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Applies the F-1 stub diff preserved at apps/web/docs/specs/2026-05-18-f1-stimulus-projection-stub.diff (241 lines) which pins: - /tables/stimulus_presentation column shape (the six STIMULUS_COLUMNS keys + row content from depends_on edge + presentations list) - /tables/stimulus short-form resolves to stimulus_presentation via the _CLASS_ALIASES chain when the literal class returns 0 IDs - Stream 5.8 pagination respected (page/pageSize) WITHOUT re-fanning the cloud (proves cache-once-slice-in-memory) respx 0.23.1 fix: the original alias test used `router.post("/ndiquery").mock(side_effect=_ndiquery)` with a 2-arg callable signature. This hung indefinitely under pytest 9.x + asyncio_mode=auto. Rewrote to use FIFO route ordering with two chained `.respond()` calls (specific predicate via `json__searchstructure__0__param1` first, generic fallback second) — passes in 0.17s. Same end-state contract, no test debt. CI: 1128 backend tests pass (was 1125 + the 3 new F-1 tests); ruff clean. F-1 backend implementation has been live on Railway since commit 0231851 — this commit only adds integration tests pinning the contract. Refs: 2026-05-19b-post-handoff-execution.md "F-1 detail" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 13, 2026 23:03 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 01:54 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 01:59 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 02:04 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 18:13 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 18:37 Inactive

chore: remove inadvertent Finder dupe file_format 2.py

bfba171

Accidentally staged in the previous commit (6aebed9). Identical content to file_format.py modulo a unicode hyphen in a docstring. CI hygiene gate would reject it; removing now.

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 18:38 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 19:00 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 19:20 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 20:14 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 20:50 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 20:52 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 22:19 Inactive

railway-app Bot had a problem deploying to ndi-data-browser-v2 / experimental May 14, 2026 22:21 Failure

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 22:26 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 19:02 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 19:09 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:11 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:16 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:27 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:30 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:39 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:44 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:49 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:55 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 21:23 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 21:30 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 21:35 Inactive

railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 19, 2026 02:25 Inactive

railway-app Bot deployed to ndi-data-browser-v2 / experimental May 19, 2026 02:48 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE — experimental] feat(ndi-python): Phase A integration#112

[DO NOT MERGE — experimental] feat(ndi-python): Phase A integration#112
audriB wants to merge 54 commits into
mainfrom
feat/ndi-python-phase-a

audriB commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

audriB commented May 13, 2026

⚠️ DO NOT MERGE

Summary

Architecture: graceful degradation

Files

Test plan

Merge gate

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant