[DO NOT MERGE — experimental] feat(ndi-python): Phase A integration#112
Draft
audriB wants to merge 54 commits into
Draft
[DO NOT MERGE — experimental] feat(ndi-python): Phase A integration#112audriB wants to merge 54 commits into
audriB wants to merge 54 commits into
Conversation
…n/ontology
Adds the NDI-python stack (vlt + did + ndr + vhlab-toolbox-python + ndi-compress)
to the Railway image and wires three new capabilities into existing services:
1. VHSB text-tag decoding via vlt.file.vhsb_read — unblocks Haley + any
other VHSB-formatted dataset in QuickPlot / signal-chart. Today every
VHSB request returns the legacy "vlt library not available" soft error
because the early-return prefix check on b"This " never falls through
to a real decoder.
2. NDI-compressed binary detection + decompression via ndicompress.expand_*
— handles the .nbf.tgz wrapper format. Today these silently fail in
_parse_nbf.
3. NDI ontology fallback in OntologyService — fires when the existing
external providers miss. NDI's bundled lookup knows lab-specific terms
(NDIC, WBStrain, Cre lines) that public OLS providers don't.
Image size cost: ~80-100 MB. We pip install with --no-deps and handpick the
runtime deps in requirements.txt + pyproject.toml so matplotlib (~50-70 MB)
and opencv (~80 MB) — both declared by DID-python + tutorials but never
imported by our paths — stay out of the image.
Strategy: ALL changes go through a new services/ndi_python_service.py that
lazy-imports + degrades gracefully when the stack isn't installed (returns
None on every call, callers fall through to their legacy paths). So even
on a build where the NDI install fails, the public surface keeps working.
Branch protection: this work targets a separate "experimental" Railway
environment for byte-for-byte audit against production before any merge to
main. See docs/plans/2026-05-13-railway-experimental-env-runbook.md for
the dashboard-step walkthrough.
Test plan:
- [x] 19 new unit tests cover the dispatch logic with NDI uninstalled (CI
doesn't pip-install NDI; tests use sys.modules mocking)
- [x] All 44 existing binary tests + 535 other tests pass unchanged
- [x] mypy strict mode clean (with ndi/vlt/ndicompress in missing-imports
override matching the scipy + opentelemetry pattern)
- [x] ruff lint clean
- [ ] Build the experimental Railway env from this branch
- [ ] Run apps/web/scripts/audit-public-api.mjs (Layer 1) + the e2e
audit-public-pages spec (Layer 2 + 3) against production vs
experimental — see plan doc for the workflow
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… strict-boot
Adds the first Plotly-charts pipeline backend half: a tabular_query
endpoint that aggregates ontologyTableRow documents into per-group
statistics (mean, median, std, q1/q3, min/max, count) plus the raw
values needed for a violin-with-jitter render. Powers Dabrowska EPM
and Fear-Potentiated Startle plots, Bhar chemotaxis violins, and any
future categorical-by-group comparison.
Endpoint:
GET /api/datasets/{id}/tabular_query
?variableNameContains=ElevatedPlusMaze # required substring
&groupBy=treatment_group # optional
&groupOrder=Saline,CNO # optional CSV
Pipeline:
1. SummaryTableService.ontology_tables(...) returns one group per
distinct variableNames schema (already exists, used by Document
Explorer)
2. tabular_query_service finds the first group with a column whose
key/label matches the substring
3. Buckets rows by groupBy column (or single 'all' bucket if unset)
4. Computes stats on the full value list, then stride-samples
down to MAX_VALUES_PER_GROUP=500 for the violin's jitter overlay
(stats stay accurate)
5. Caps at MAX_GROUPS=20 with first-seen ordering + explicit
group_order override
Hygiene fixes folded into this commit:
- Pin all five NDI-python git deps to specific SHAs in infra/Dockerfile
(replaces @main which silently picks up upstream changes on every
redeploy). SHAs captured 2026-05-13. To upgrade: re-run
`git ls-remote <repo> HEAD` and bump.
- Strict-on-boot NDI check: when NDI_PYTHON_REQUIRED=1 (set by the
Dockerfile), lifespan hard-fails if vlt/ndicompress/ndi.ontology
aren't importable. Catches broken images at boot instead of letting
the chat silently degrade with every NDI tool returning None.
Tests: 21 new unit tests cover the violin aggregation math + edge
cases (no matches, no group_by, group_order overrides, group cap,
value cap with accurate stats, NaN/Inf skipping, empty substring,
no ontology docs). All 559 backend tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smoke-testing the new endpoint against the experimental Railway
deploy surfaced an issue with real-world data: ontologyTableRow
tables routinely have multiple columns sharing a topic prefix
(e.g. 'ElevatedPlusMaze: Test Identifier' + 'ElevatedPlusMaze:
Open Arm Entries' + …). The first-match logic picked the test-
identifier column → no numeric values → empty violin response
('no numeric values in matched column').
Fix: score each matching column by how many rows have finite-
numeric values, and pick the column with the most numeric rows
across all matching groups. Ties broken by first-seen order
(groups are already sorted by row count desc upstream).
Adds one test (test_violin_groups_prefers_numeric_column_over_
identifier) covering the real-world layout. All 19 tabular_query
tests pass; 561 backend tests pass overall.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Smoke-testing against real Dabrowska EPM data showed that the LLM naturally calls groupBy='Treatment' but the actual ontology column key is 'Treatment_CNOOrSalineAdministration'. Exact-match returns empty groups; the user/LLM has no way to know the canonical key. Fix: resolve groupBy via the same substring-match strategy as the value column. Exact key match wins; then substring against keys; then substring against labels. When nothing matches, return empty + a meta listing the column keys we DO have so the caller can retry. Verified live: 'Treatment' now resolves to 'Treatment_CNOOrSaline Administration', returning Saline (n=22, mean=5.86) + CNO (n=23, mean=5.09) for EPM open-arm-north entries. Tests: +2 cases (substring resolution + unresolvable-with-available). All 562 backend tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ding
Three backend additions wiring the new labchat chat tools.
## image_service + /api/datasets/:id/documents/:docId/image
Pillow-based decoder for TIFF/PNG/JPEG/GIF documents (with NDI-native
.nim formats flagged as 'unsupported' rather than failing). Downsamples
>512px via Pillow thumbnail; multi-frame support via `frame=N`. Returns
structured envelope with width/height/min/max/format + downsampled flag.
Errors get errorKind={notfound, decode, unsupported}. Same SSRF
hardening as existing binary download path (cloud-allowlisted hosts
only). 18 unit tests.
## summary_table_service.distinct_summary
Backend now computes per-column distinct value counts + top-K most
common values across ALL rows of a class-table response (not just the
page slice). Cached under the same TTL as columns+rows. Caps at
DISTINCT_SUMMARY_MAX_ROWS=10_000 (skips with a meta sentinel for very
large tables). Top-K = 5 per column.
The LLM sees this as `distinctSummary` on the query_documents tool
result. Existing test extended with TestHashable + TestBuildDistinctSummary
covering: Dabrowska 49-row collapse case, multi-value top-K, None/dict/
malformed handling, skip-when-too-many path.
## Sprint 1.5: dataset_binding_service + /api/datasets/:id/ndi_overview
Lazy in-memory LRU cache (max 5) wrapping ndi.cloud.orchestration.
downloadDataset. Per-dataset asyncio.Lock prevents concurrent cold-load
races. 90s cold-load timeout, 5GB on-disk soft cap with warn log.
Returns: element_count, subject_count, total epoch_count across all
elements, first 50 element {name, type} pairs, cache_hit + cache_age.
Pre-warm at app boot (production/preview only) for the 3 demo
datasets — fire-and-forget asyncio.create_task, cancellable on
shutdown. is_ndi_available() extended via is_dataset_binding_available()
probing ndi.dataset + ndi.cloud.orchestration imports.
Endpoint surface: GET /api/datasets/:id/ndi_overview with limit_reads
rate limit. 503 envelope { error, reason } on any failure so the
frontend can fall back to ndi_query gracefully. 60s request timeout.
12 unit tests + 2 live-cloud integration tests (gated behind
LIVE_NDI_TESTS=1).
## NDI cache directory
NDI_CACHE_DIR env (default /tmp/ndi-cache). Already-writable by the
ndb user in the Dockerfile. /tmp is ephemeral on Railway redeploys —
pre-warm at boot mitigates for the 3 demo datasets; user-driven
cold-loads of other datasets pay the 10-30s tax.
## Open risk
downloadDataset against the published demo datasets ANONYMOUSLY (no
service-account token) is unverified in the Railway environment. If
auth is required, the 503 fallback keeps chat usable. Plan B: add a
service-account token via Railway env vars and retry.
Combined backend test count: 603 unit + 110 integration (was 562
unit + 79 integration pre-wave). All green.
The previous response shape gave the frontend a single arbitrary docId
(`source.document_id = doc_ids[0]`) for the entire aggregation —
misleading when the chart is summarizing dozens of rows across 2+
groups. The user can verify the column being compared via the
ontology-tables view, but couldn't drill into specific group examples
("show me one Saline row, one CNO row").
This change threads per-row docIds through the bucketing so each
output group surfaces:
- docIds: list[str] (capped at MAX_DOC_IDS_PER_GROUP=3)
- totalRows: int (true contributing row count)
`source.document_id` is preserved for backwards-compat but is no
longer the primary citation path.
Implementation notes:
- `SummaryTableService.ontology_tables` already returns docIds
parallel to rows (same index). The service now enumerates rows by
index and routes each row's docId to the appropriate bucket.
- Missing-docId desync is tolerated silently (rather than padding
with bogus IDs). If the projection ever returns fewer docIds than
rows, affected buckets just get shorter docId lists.
- 3 docIds per group is enough for the chat to build per-group
sample-row chips without flooding the citation panel. The
complete set of contributing rows is reachable from the primary
table-view citation.
Tests:
- Extended test_violin_groups_basic to assert per-group docIds +
totalRows on the standard 2-group Saline/CNO shape.
- Added test_violin_groups_per_group_doc_id_cap covering the 10-row
cap behavior — Saline contributes 10 rows but only 3 docIds
surface; CNO with 2 rows surfaces both.
- Added test_violin_groups_missing_doc_ids_tolerated covering the
desync case — service must not crash or invent IDs.
23/23 unit tests pass.
This is the backend half of the user-reported "chart citation seems
to point to one arbitrary row, not the column or table of entries
being aggregated" fix. The frontend (cloud-app) side will land
separately and consume the new per-group docIds.
Accidentally staged in the previous commit (6aebed9). Identical content to file_format.py modulo a unicode hyphen in a docstring. CI hygiene gate would reject it; removing now.
… data browser
User observation: 'I can point to at least one place where ontology
is not resolving in the data browser.'
Root cause: ``OntologyService.lookup`` short-circuits on cache hit
BEFORE reaching the NDI-python fallback added in Phase A
(2026-05-13). Pre-Phase-A lookups of lab-specific terms like
``WBStrain:00000001`` or ``NDIC:1`` returned an empty stub
(``label=None``, ``definition=None``) from the legacy provider, and
those stubs got cached. ``ONTOLOGY_CACHE_TTL_DAYS`` defaults to 30,
so the stale stubs would persist for ~a month and prevent the new
fallback from ever firing — every consumer of the ontology service
(DatasetSummaryCard pills, FacetPanel filters, SummaryTableView
cells, ChatBot's lookup_ontology tool) sees the un-resolved CURIE
until the stub expires.
The fix:
- ``lookup`` now treats stubs as cache MISSES: a stub (label=None
AND definition=None) does NOT short-circuit. The legacy provider
+ NDI-python fallback both run again, and on a real hit the new
``self.cache.set`` OVERWRITES the stub with the resolved entry.
- ``batch_lookup`` inherits the fix automatically (it delegates to
``self.lookup`` per term).
Stuck stubs heal on first use — no admin endpoint or cache wipe
needed. Worst case for a term that NEITHER the legacy providers
NOR NDI-python can resolve: we keep the stub and re-attempt on
every call. That's negligible network traffic (NDI-python is local;
legacy providers have their own throttling) and only happens for
genuinely unknown terms.
Tests: 6 new tests covering:
- Real cache hit still short-circuits (no upstream calls)
- Stub cache entry triggers retry → NDI-python wins → cache
overwritten with real hit
- Both legacy and NDI-python failing keeps the existing stub (no
redundant write)
- Fresh term with legacy hit doesn't call NDI-python (it's a
fallback, not a co-resolver)
- Fresh term where legacy returns stub falls through to NDI-python
- Batch lookup inherits the bypass behavior
After this lands and Railway redeploys, every data-browser surface
that displays ontology terms (Dataset hero pills, facet filters,
summary table cells, chat lookup_ontology) automatically benefits
without any frontend changes.
…ar_query error envelope
Aggregated from audit agents that returned with backend findings.
## Critical — every consumer of OntologyService was leaking raw CURIEs
* **_fetch_wormbase echoed strain_id as label**: pre-fix, the
WormBase provider returned `label=strain_id` (e.g. "00000001")
WITHOUT actually scraping the strain page. This is a "label-is-
truthy" stub that bypassed the NDI-python fallback —
`WBStrain:00000001` showed up as "00000001" on the Bhar dataset
hero pills, the /query Strains facet, and the OpenmindsSubject
table. The visual UX audit caught this as a P0. Fix: return
label=None so the NDI-python fallback fires (which knows the
strain name → "N2 wild-type"). The dataset_summary_service's
`_enrich_ontology_labels` then writes the real label onto the
term and the cache stub gets overwritten on first use.
* **UBERON / GO / OBI missing from _OLS_PROVIDERS dict**: pre-fix,
the provider allowlist was `{CL, NCBITaxon, CHEBI, PATO, EFO}`.
UBERON is the most common prefix in NDI data (every brain_region
and probe_location), but UBERON requests fell through to a
stub. `UBERON:0001870` showed `label=null` on every popover, hero
pill, and column rendering. OLS4 has the same query endpoint for
every OBO ontology — adding UBERON/GO/OBI to the dict is a
3-line fix that unblocks the entire OBO ontology family.
## Important — tabular_query router error envelope
* **tabular_query 500 → typed 503 on cloud errors**: the router
raised cloud-client exceptions through FastAPI's global handler,
producing opaque 500 JSON envelopes the chat tool layer couldn't
surface usefully. Now mapped to 503 with `{error, errorKind,
reason}` matching the discipline of /ndi_overview. Adds
structured log at `tabular_query.cloud_error`.
## Tests
* 29 backend tests pass (23 tabular_query + 6 ontology_service).
* All 611 unit tests pass overall.
## What this restores
After Railway redeploys:
- WBStrain pills resolve to actual strain names ("N2 wild-type"
instead of "00000001") across all data-browser surfaces.
- UBERON popovers resolve ("frontal cortex" instead of null).
- Cloud failures on the violin chart endpoint produce a typed
error the chat can paraphrase, not an opaque 500.
## Combined with the frontend commit (293ddea)
That commit fixed the chat's lookup_ontology field-name mismatch
(was reading res.name when backend returns res.label, so every
ontology lookup reported "found: false"). Together, these
two commits make ontology resolution functional end-to-end —
the user-reported "ontology not resolving in the data browser"
issue, plus several similar bugs the chat-side tool was hitting.
…solve Visual-UX audit (a395 P0 #3, 2026-05-14) reported every anonymous summary-table view triggered a 403 from /api/ontology/batch-lookup, falling back to label-only display and surfacing a "1 warning · Some entries lack canonical ontology IDs" banner. The endpoint is POST- shaped because the body holds an array of up to 200 CURIEs (keeps the URL clean), but functionally it's a read-only lookup — no state mutation, idempotent. Pre-fix, anonymous visitors hit it on every dataset render before they'd had a chance to GET /api/auth/csrf, so the double-submit token check failed closed. Adding /api/ontology/batch-lookup to EXEMPT_PATHS lets anonymous requests through. The endpoint remains rate-limited via the existing limit_reads dependency in the router, so abuse vectors stay bounded (200 CURIE batches × per-IP read budget). Verification: - Backend unit test added: asserts /api/ontology/batch-lookup is in EXEMPT_PATHS so the exemption can't be silently lost in a refactor. - All 612 backend unit tests pass (+1 from the new test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes in the ontology + facet pipelines surfaced by the visual-UX
audit (Bhar dataset overview + /datasets / /query facet rail).
WBStrain scrape (~P1, demo-relevant)
------------------------------------
NDI-python's WBStrain provider returns a URL but no label, so every
WBStrain CURIE on the Bhar overview resolved to the bare strain ID
("WBStrain:00000001") instead of the strain name ("N2"). Add a scrape
fallback inside `OntologyService._fetch_wormbase` that GETs the canonical
wormbase.org strain page and parses the strain name from `<title>`
(primary anchor: `<title>N2 (strain) - WormBase ...</title>`) with a
page-title-breadcrumb regex as the secondary parser. 5s timeout; any
failure (Cloudflare interstitial, 404, parse miss, network error) falls
through to `label=None` so the existing NDI-python fallback path still
fires. Result is cached via the existing OntologyCache so each strain
page is hit at most once per 30-day TTL. No new dependency — pure
httpx+re inside the existing async client. Six unit tests cover the
happy path, breadcrumb fallback, Cloudflare-interstitial blocking,
404 fallthrough, network-error fallthrough, and end-to-end caching.
Caenorhabditis facet dedup (visual-UX audit row #6 / a395)
----------------------------------------------------------
`/datasets` and `/query` showed two "Caenorhabditis elegans" chips
because one contributing dataset reported the species with
`ontologyId=NCBITaxon:6239` and another with `ontologyId=None`. The
prior dedupe keyspace was disjoint — `oid::NCBITaxon:6239` and
`norm::caenorhabditis elegans` lived in separate slots — so both
surfaced as distinct chips. Backend fix (not frontend): future API
consumers like the chat tool get clean facet data instead of having to
re-dedupe at every read site.
Refactored `_add_ontology_term` to register all candidate keys (oid,
abbrev, norm) as aliases on the bucket. Lookup walks them in priority
order (oid > abbrev > norm). The norm/abbrev alias merge honours an
asymmetric guard: distinct ontologyIds with the same label stay
distinct (different providers can legitimately catalog the same name
as different concepts — preserves
`test_ontology_id_still_takes_priority_over_label_normalization`).
On promotion, the surviving entry inherits the labeled side's
ontologyId.
Verification
------------
- 628 backend unit tests pass, +8 from new tests (6 ontology scrape +
2 facet labeled/unlabeled merge).
- ruff clean on touched files.
- Bug repro before fix: 2 chips. After fix: 1 chip with NCBITaxon:6239
attribution.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two targeted fixes for the experimental /ask chat tool wiring. ## Task 1 — ndi_overview 503 envelope now carries a stable `code` `DatasetBindingService` records the most-recent cold-load failure on `self._last_failure = (code, message)` and exposes it via `last_failure()`. The `/ndi_overview` router surfaces the code alongside the existing `reason` string so the chat tool (and operators in the dashboard) can tell `phase_a_unavailable` apart from `binding_unavailable`, `cache_dir_unwritable`, `cold_load_timeout`, or `cold_load_failed`. The generic 503 → "use ndi_query" fallback semantics are unchanged. Also adds an explicit `is_dataset_binding_available()` short-circuit so the binding never tries to import `ndi.cloud.orchestration` lazily after the boot probe already marked it missing — surfaces as a clean `binding_unavailable` 503 instead of bubbling an ImportError through the cold-load timeout. Production effect on the experimental Railway: `/ndi_overview` for 67f723d574f5f79c6062389d goes from a generic "NDI-python dataset materialization failed or is not configured on this server" string to `code=binding_unavailable, reason="ndi.dataset / ndi.cloud.orchestration not importable"` — same 503 status, just diagnosable. ## Task 2 — probe→element / epoch→element_epoch class alias `SummaryTableService._build_single_class` now retries an `isa <alias>` query when the literal class returns 0 IDs. Pinned to: probe → element epoch → element_epoch The projection key stays the user-requested class so `PROBE_COLUMNS` / `EPOCH_COLUMNS` are emitted regardless. The resolved class is logged (`table.single.alias_hit`, `resolved_class=` in the build log) for forensics. Smoke-tested against Dabrowska BNST (id 67f723d574f5f79c6062389d, 0 probe docs + 606 element docs): pre-fix `GET /api/datasets/.../tables/ probe` returned 0 rows even though `summary.probeTypes` listed patch-Vm / patch-I / stimulator. Post-fix the same endpoint returns 606 element rows under the probe column shape — which is what the chat's `query_documents(className="probe")` tool consumer already expects. Legacy datasets that DO emit `probe` directly (Van Hooser) skip the alias path entirely. Coverage: 4 new alias tests in test_summary_table_class_alias.py + 7 new `last_failure()` contract tests in test_dataset_binding_service.py. 628 unit tests pass (was 612 on this branch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves the Gantt-style treatment-timeline tool from the Next.js chat
layer (apps/web/lib/ndi/tools/treatment-timeline.ts) into the backend
so the heart of NDI processing lives next to ndi-python. The TS
handler now shrinks to a thin proxy that forwards
{datasetId, title, maxSubjects} to this endpoint and reshapes the
raw response into the chat-specific chart_payload envelope.
## New endpoint
POST /api/datasets/{dataset_id}/treatment-timeline
Body (pydantic v2 with camelCase + snake_case aliases):
{title?, maxSubjects? | max_subjects?} # default 30, hard cap 100
Response (raw — no chart_payload framing on the backend):
{
datasetId, title?, items: [{subject, treatment, start, end}],
total_subjects, total_treatments,
temporal_source: "explicit" | "ordinal" | "mixed",
empty_hint?: {reason, available_columns?}
}
Auth: get_current_session dependency — public datasets work
anonymously, private datasets honor the session cookie. Rate-limited
under the standard reads bucket via Depends(limit_reads).
## Orchestration
1. PRIMARY: SummaryTableService.single_class(dataset_id, "treatment")
returns rows with treatmentName + subjectDocumentIdentifier +
numericValue + stringValue + treatmentOntology.
2. FALLBACK (only if primary empty):
TabularQueryService.violin_groups(dataset_id, "Treatment", ...)
synthesizes one row per group with subject="group:<name>".
3. Per-subject ordinal counter assigns [i, i+1] timing when no
explicit start/end is present on a row.
4. temporal_source distinguishes all-explicit vs all-ordinal vs
mixed so the LLM can disclose the timing caveat in prose.
5. empty_hint is set only when both backends are empty OR when rows
came back but none had a usable subject+treatment pair.
## Tests
32 unit tests in test_treatment_timeline_service.py covering:
- Pure helpers: subject/treatment label fallback, explicit-timing
extraction (numericValue array/scalar, startDate/endDate pair, ISO
string parsing), temporal-source classification.
- Primary happy path: 5 treatments x 3 subjects, explicit timing.
- Ordinal timing: rows without numericValue → per-subject [i, i+1].
- Mixed timing: some explicit, some ordinal → temporal_source='mixed'.
- maxSubjects=30 cap on 50 distinct subjects → trims to 30.
- Primary empty + fallback hits → synthetic group:<name> rows.
- Primary empty + fallback empty → empty_hint with reason +
available_columns.
- Edge cases: rows missing subject/treatment dropped silently,
unplottable-rows variant of empty_hint, primary failure falls
through to fallback, both failures still return well-typed
response with empty_hint set (not 500).
## Deviations from TS
- chart_payload + references_summary stay in the TS proxy. The
Python endpoint returns RAW data so the workspace can consume it
directly without unwrapping chat-specific framing.
- References array (per-subject citation chips) likewise stays in
the TS layer — that's chat-tool framing, not data.
- Otherwise byte-for-byte semantics: subject ordering, ordinal
counter mechanics, temporal_source discriminator, empty_hint
reasons, and the [start, end] field ordering all match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port the spike-summary discovery + per-unit shaping from the TS chat
tool (apps/web/lib/ndi/tools/fetch-spike-summary.ts) to a Railway
service so the heart of NDI processing lives next to ndi-python.
- backend/services/spike_summary_service.py — three-mode discovery
(unit_doc_id / unit_name_match / bare-scan), per-unit
``data.vmspikesummary.spike_times`` extraction with the canonical
field-path probe order (spike_times → spiketimes → sample_times),
numpy-backed ISI computation in ms, stride-sample caps mirroring
the TS handler (5000 spikes/unit, 5000 ISIs/unit), kind-gated
field omission so raster-only / isi-only callers get compact
responses.
- backend/routers/spike_summary.py — POST
/api/datasets/{id}/spike-summary; reuses get_current_session +
limit_reads, returns raw per-unit data (no chat-specific
chart_payloads — TS layer reshapes), translates cloud failures
to a typed 503 envelope.
- 27 unit tests covering single-doc path, query path, max_units
cap, stride-sample cap, t_window filter, kind gating, soft-error
envelope on decode failure, camelCase alias round-trip,
and the spike-times field-path fallback chain.
Bypasses QueryService.execute and calls cloud.ndiquery directly
because QueryService's scope validator enforces a Mongo-ObjectId
regex that the existing dataset_id path-validator already covers
(matches the PivotService pattern).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a new aggregator endpoint that turns one vmspikesummary unit + one
stimulus document into a binned firing-rate-per-bin response, the
canonical sensory-neuroscience visualization. Mirrors the
spike-summary / treatment-timeline ports: TS chat tool becomes a thin
proxy, heart of NDI processing stays next to ndi-python on Railway.
- backend/services/psth_service.py — orchestration entry point
``compute_psth(request, ...)``. Resolves the unit (probes
data.vmspikesummary.{spike_times, spiketimes, sample_times} then
falls back to BinaryService.get_timeseries for non-inlined trains)
and the stimulus events (canonical NDI paths first:
data.stimulus_presentation.presentations[*].time_started,
data.stimulus_response.responses[*].stim_time, then data.events and
top-level events for preprocessed payloads). numpy.histogram does
the binning across all trial-relative spike times. Returns parallel
bin_centers / counts / mean_rate_hz arrays plus n_trials, n_spikes,
unit_name, doc-id provenance, optional per_trial_raster when
include_raster=True. Server-side caps: bin_size_ms >= 1 ms,
(t1-t0) <= 10 s, N_bins <= 1000.
- backend/routers/psth.py — POST /api/datasets/{id}/psth. Reuses
get_current_session + limit_reads + DatasetId validator. Returns
raw scientific data (no chart_payload/references decoration — TS
layer reshapes for the chat fence; workspace consumes directly).
Soft-error envelope (error + error_kind) for invalid_window /
decode_failed / no_events / empty_window cases so the chat tool
branches on error_kind instead of crashing. Cloud-tier exceptions
translate to a typed 503 envelope, matching /spike-summary.
- backend/tests/unit/test_psth_service.py — 30 tests covering pure
helpers (spike-times + event extraction across all four doc-class
paths, bin layout, raster cap, window validation) and the
compute_psth integration (happy-path parallel arrays, empty-window
envelope, include_raster, cap enforcement, decode_failed +
no_events soft errors, binary fallback, camelCase alias).
Live verification (2026-05-18) shows the treatment-timeline endpoint
already returns 56 items / 28 subjects for Haley
(`682e7772cdf3f24938176fac`) — F-1e's merge-all-rows chain walker
plus `_row_treatment`'s literal-`treatment` branch surface the 56
food-restriction-onset/offset docs correctly, and `_pick_subject_label`
+ `_pick_treatment_label` accept their values.
What was NOT working: `temporal_source` came back `"ordinal"` for
every Haley row because `_parse_iso_datetime` couldn't read the
MATLAB `datestr` default format (`"03-Nov-2023 07:53:00"`) that
Haley's `treatment.string_value` carries. Result: the Gantt's
x-axis showed synthesised ordinal slots 0..N instead of real
wall-time onset/offset events — accurate but visually misleading
(every subject's onset appeared at "slot 0" rather than its true
date in early November 2023).
This commit:
- Adds a MATLAB `datestr` fallback to `_parse_iso_datetime`. ISO
still wins via `datetime.fromisoformat` (regression-pinned);
only inputs that fail ISO try the MATLAB formats
(`%d-%b-%Y %H:%M:%S` and the date-only `%d-%b-%Y`). Locale
assumption (`C`/`en_*`) matches the Railway container shape.
- Tightens the `dt: datetime | None` annotation so mypy doesn't
flag the dual-source assignment.
- Adds 7 unit tests:
- Three covering the new `_parse_iso_datetime` MATLAB branch
(`dd-MMM-yyyy hh:mm:ss`, date-only, garbage-still-None).
- One ISO regression pin so the new branch can't accidentally
shadow `fromisoformat`.
- One end-to-end via `_extract_explicit_timing`.
- Two `_fetch_primary_rows` integration tests covering
literal-only (Haley shape, 56 rows, single contributing class)
and merged literal+subclass (chain merges rows from multiple
classes).
- Adds 2 `_row_treatment` projection tests that pin the literal-
`treatment` branch against the exact Haley doc shape (curl'd
from the experimental backend) and an ISO-flavoured variant.
Cache schema unchanged. The summary-table response shape is
identical (no new columns); only the timeline endpoint's
`temporal_source` value can shift from `"ordinal"` to `"explicit"`
or `"mixed"` for datasets that emit MATLAB datestr stringValues,
and the timeline endpoint does not cache.
Acceptance:
- `/api/datasets/682e7772cdf3f24938176fac/treatment-timeline` POST
returned `total_subjects=28`, `total_treatments=56`,
`temporal_source="ordinal"` pre-fix; post-fix temporal_source
will be `"explicit"` (all 56 rows parse). Live curl confirmed
the endpoint is non-empty.
- 1026 tests pass (up from 1017; +9 new).
- Lint + typecheck baseline preserved (pre-existing N802 + 2
type errors in untouched files unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `_pick_default_signal_ref` heuristic that landed in e03d470 fixed the `channel_list.bin`-instead-of-`.nbf` bug on `get_timeseries`. This commit sweeps the rest of the binary- decode endpoints: Audited endpoints and their disposition: - `/api/datasets/{id}/documents/{id}/signal` (signal.py) — already delegated to `BinaryService.get_timeseries`, which uses the smart pick. Already benefits transitively; no code change. - `/api/datasets/{id}/spike-summary` (spike_summary.py) — reads spike_times from the JSON body inline (no binary file decode). No file-pick involved. - `/api/datasets/{id}/documents/{id}/image` (image.py via ImageService.fetch_image) — picked `refs[0]` blindly. FIXED via new `_pick_default_image_ref` (Pillow-aligned extension list, same metadata blocklist as the signal pick). - `/api/datasets/{id}/psth` (psth.py via psth_service) — uses `binary_service.get_timeseries`. Already benefits transitively; no code change. - `/api/datasets/{id}/documents/{id}/data/image` (binary.py via BinaryService.get_image, Document Explorer's image viewer) — same `refs[0]` bug. FIXED to use `_pick_default_image_ref`. - `/api/datasets/{id}/documents/{id}/data/raw` and `/data/raw` w/ Range (BinaryService.get_raw + get_raw_response) — by design, this is the imageStack passthrough where the caller has already established the doc IS a raw-bytes blob. The contract is "stream refs[0] verbatim" — changing this risks breaking imageStack flows. Left alone per the legacy contract. - `/api/datasets/{id}/documents/{id}/data/video` (binary.py via BinaryService.get_video_url) — videos are single-file in practice; multi-file video docs aren't a real NDI shape. Left alone. - `/api/visualize/distribution` — pure aggregation/stats. No binary file decoding. Implementation: refactors `_pick_default_signal_ref` to delegate to a new shared `_pick_ref_by_extension` helper. Adds the new `_pick_default_image_ref` consumer (same step-1/step-2/step-3 heuristic against `_DECODABLE_IMAGE_EXTENSIONS` = `.tif .tiff .png .jpg .jpeg .gif`). Both share the existing `_KNOWN_METADATA_FILENAMES` blocklist — channel_list.bin / meta.json / channels.json etc. are skipped regardless of which decoder is choosing. 10 new unit tests in `test_binary_default_image_pick.py` mirroring `test_binary_default_signal_pick.py`: TIFF/PNG/JPEG/ GIF variants picked, signal extensions NOT picked for image decode, case-insensitive matching, suffix-with-non-alphanumeric tail (`frame.tif_1`), single-file legacy fallback, all-metadata fallback, and a pin that step-1 (extension match) wins over step-2 (non-metadata fallback) when both apply. Cache schema unchanged. Response shapes unchanged on both endpoints — only file-pick selection changed, so a doc that previously surfaced `errorKind=unsupported` because the picker returned a JSON sidecar will now succeed with a valid image payload. Live signal-endpoint smoke confirmed post-fix (Francesconi daqreader `68d6e54703a03f5cfdac8f07` returns `format=nbf_compressed`, `sample_count=100` against `/signal?downsample=100`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A "real" session is one with ≥1 other doc carrying depends_on.value
pointing at its ndiId. Parent / aggregate session docs
(administrative containers like Haley's `haley_2025` parent, ingested
10h after the two leaf recordings `haley_2025_Celegans` and
`_Ecoli`) have zero downstream references. MATLAB enumerates only
leaf sessions; the cloud's raw class count includes parents.
Pre-fix on Haley (682e7772cdf3f24938176fac):
- counts.sessions = 3 (raw)
- tutorial documents 2 recording sessions
- workspace Sessions picker rendered 3 rows, one unusable
Implementation:
- Adds `DatasetSummaryService._count_real_sessions` that fetches
session docs via the existing `_fetch_class_bounded("session")`
primitive, then fires one `depends_on * [ndiId]` ndiquery per
session against the cloud's indexed reverse-dep path. Sessions
with `totalItems > 0` are real; the rest are filtered out.
- Skip conditions:
1. `counts.sessions <= 1` — nothing to filter.
2. `counts.totalDocuments <= counts.sessions` — no non-session
docs that could be downstream (newly-published catalog, test
fixture, etc.). Don't waste ndiquery calls only to fail-open.
3. `counts.sessions > _MAX_SESSIONS_FILTER_WALK` (50) — safety
cap; multi-day series virtually always have downstream refs.
- Runs as an additional gather leg alongside the existing
openminds_subject / probe_location / element fanout, so it adds
zero wall-clock latency on the hot path (3-10 indexed
reverse-dep queries ≈ hundreds of ms; the structured-facts legs
dominate at multi-second scale).
- Fail-open semantics:
* `_fetch_class_bounded` raises → keep raw count + typed warning
* Per-session reverse-dep ndiquery raises → that session is
counted as real
* Every session looks unreferenced (real_count == 0) → keep raw
count (probably a flaky cloud, not a real "all parents"
dataset). Emits structured warning log.
- Observability: `dataset_summary.session_filter` log line whenever
the filtered count differs from raw, recording raw_count,
filtered_count, and parent_or_aggregate_sessions.
Cache schema unchanged (v7). `counts.sessions` is just an int field
that already existed; only its value can shift for affected
datasets. Existing cached summaries refresh naturally within their
24h TTL.
+12 unit tests:
- Canonical Haley case (3 → 2)
- Skip on counts.sessions <= 1
- Skip on totalDocuments <= sessions (pure-session test fixture)
- Skip over the 50-session safety cap
- Fail-open: all-zero downstream → keep raw count
- Fail-open: reverse-dep ndiquery 503 → session counted real
- Fail-open: session-class fetch fails → keep raw + warning
- 5 helper unit tests (constant, _filtered_sessions_or_warn paths,
_identity_int)
Backend: 1036 → 1048 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The B6 fix in 058107a changes the VALUE of counts.sessions for any dataset with parent/aggregate session docs (Haley: 3 → 2). Existing cached summaries under the v1 prefix would persist for up to 24h (full-success TTL), serving the pre-filter count across the rollout window. Bumping the cache prefix invalidates all v1 entries immediately; the next request runs the producer and writes a v2 entry with the filtered count. The response SHAPE is identical — only the value shifts — so the model's `schemaVersion` literal stays `summary:v1` (clients consuming that field don't need to recompile). Only the cache key namespace changes. Test pins updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atasets Live verification of B6 on Haley (058107a + cache prefix v2 bump 9523950) showed counts.sessions still = 3 — every Haley session returns 0 downstream depends_on refs because the lab encodes session identity in `session.reference` strings rather than via the depends_on graph: - haley_2025_Celegans (leaf) - haley_2025_Ecoli (leaf) - haley_2025 (parent / aggregate container) The depends_on heuristic correctly returns 0 for all 3, triggering the fail-open path and preserving raw count = 3. The right answer is 2. Adds a structural fallback that fires ONLY when the depends_on heuristic returns 0 across all sessions: A session is a PARENT iff its `session.reference` is a strict prefix (separated by '_') of some OTHER session's reference in the same dataset. Intentionally narrower than "any naming pattern" — requires a SIBLING that extends this reference. A lone `haley_2025` without `haley_2025_<species>` siblings stays counted as real. Multi-level trees collapse to the deepest leaves correctly. Helpers added: - `_session_reference(doc)` — extracts `data.session.reference` with fallbacks through `session_reference` and `name` - `_filter_by_reference_prefix(session_docs)` — returns leaf count, or None when ambiguous (missing refs, all refs identical) The two heuristics now compose: 1. depends_on returns ≥1 real → use that count (canonical signal) 2. depends_on returns 0 → try prefix-suffix; if conclusive use it 3. Both inconclusive → fail-open with raw count + audit log The `via:` field in the structured log records which heuristic fired (`depends_on` vs `reference_prefix`) so operators can audit the rollout across all 8 published datasets. +10 new unit tests: - Canonical Haley case (3 → 2 via prefix) - No-parent shape (2 leaves stay real) - Missing reference → None (bail to fail-open) - All sessions share same reference → None - Underscore separator required (no false positives on `haley` → `haley2025`) - 4-level hierarchy collapses to deepest leaves - Single session → None - _session_reference extracts via session.reference / .session_reference / .name fallback chain, returns None when block absent Backend: 1048 → 1058 tests passing. mypy --strict baseline preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cc64299 commit added the session.reference prefix-suffix fallback, but the v2 cache entries from 058107a + 9523950 still have the pre-fallback Haley count = 3. Bumping again invalidates those so the next request runs the new filter and writes a v3 entry with sessions = 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live Haley still returns sessions=3 after the prefix-fallback ship, which means either (a) the heuristic branch never fires, or (b) it sees session docs without the expected `data.session.reference` shape. Add structured diagnostic log to capture exactly what `_count_real_sessions` sees in production: - fetched_session_docs count - prefix-filter computed value (or None) - sample doc top-level keys - sample data-level keys - extracted references for each session Bumps cache prefix to v4 to force the next request to re-run the producer through the new code path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Railway structured-log feed doesn't surface our structlog
output. Push the prefix-fallback diagnostic into extractionWarnings
temporarily so the next curl tells us exactly:
- fetched_session_docs count
- prefix_filtered_value (None / N / raw)
- sample_refs extracted
- sample_data_keys (so we can see if the doc shape has the
`session` block we expect)
Bumps cache prefix to v5 so the next request runs a fresh build
through this diagnostic path. Will revert the warning push once
B6 is verified working end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous diagnostic was inside the prefix-fallback block, but that branch only fires when real_count == 0 from depends_on. If depends_on returns >0 in production (unlikely per dependency-graph manual curls but apparently happening), the function returns early and the diagnostic never fires. Moves the B6-DEBUG warning to right after real_count is computed so it always surfaces. Reports the full results list so we can see which sessions are flagged True vs False vs exceptions. Bumps cache to v6 to force a fresh build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…v7 cache
The diagnostic revealed the production behavior on Haley: ALL 3
sessions return >0 via depends_on (even the parent), because
Haley publishes a `dataset_session_info` admin doc that
depends_on the parent session ndiId. Without the doc-class-name
oracle to distinguish "admin-only reference" from "experimental-
data reference", depends_on alone returns the raw count.
New composition policy:
1. Compute prefix-suffix filtered count (None / N).
2. If prefix returns a conclusive `0 < N < raw_count`, USE IT
regardless of depends_on result. The structural signal
"session B's name extends session A's name with `_`" is hard
to satisfy coincidentally — sibling sessions don't typically
share parent-extending names unless there's a real
parent/child relationship.
3. Else, use depends_on count if > 0 (canonical for labs that
use the dependency graph for session identity).
4. Else, fail-open with raw count.
Removes the temporary B6-DEBUG diagnostic warnings.
Bumps cache prefix to v7 to invalidate v6 entries cached during
the diagnostic rollout.
+2 new tests pin the composition:
- `test_prefix_refines_depends_on_when_parent_has_admin_ref` —
THE canonical Haley production case: all 3 sessions True via
depends_on (parent referenced by admin doc), prefix returns 2,
composition picks 2.
- `test_depends_on_used_when_prefix_inconclusive` — when sessions
lack reference fields entirely, prefix returns None and
depends_on canonically filters.
Backend: 1058 → 1060 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`probe` is a Python runtime alias for `element` (per services/class_aliases.py).
Many datasets — Francesconi (67f723d574f5f79c6062389d) is the canonical case
with 0 probe + 606 element + 3 probeType facets — emit no literal `probe`
documents at all, so `counts.probes` rendered 0 on the snapshot tile and
contradicted the catalog's probeTypes facet.
The fallback path in `_counts_from_raw` was already in place from the prior
arc. This commit:
- tightens the log event name to `dataset_summary.probes_alias_resolved`
and includes both raw + aliased values for observability
- bumps SUMMARY_KEY_PREFIX v7 → v8 to invalidate stale entries cached
before the fallback shipped (response SHAPE unchanged; only the
`counts.probes` value can shift)
- adds three regression tests pinning the three branches:
literal probe non-zero → use literal
literal probe zero + element non-zero → use element (alias hit)
both zero → 0
Cache schema bump also requires updating the v7 literal in
test_user_cache_keys_are_isolated to v8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…etch
`/ndiquery` may return slim `{id, datasetId}` pairs (or bare id strings
under a single-dataset scope) instead of full doc bodies, depending on
search scope. The pre-F-7 code path read `body.get("documents")` and
fed those entries straight to `_extract_numeric(doc, "data.subject.weight_grams")`
— which silently returns None for slim refs (no `data.*`). Aggregations
against datasets where ndiquery returned slim refs would report
numeric_matches=0 even when valueField was present in every body.
Fix: after ndiquery returns, classify each entry. Full bodies pass
through (no-op fast path — zero extra cloud calls). Slim refs queue
for chunked `bulk_fetch` (BULK_FETCH_MAX=500 per call), grouped by
dataset_id since bulk_fetch is per-dataset. Bare id strings under a
single-dataset scope attribute to that scope; under any other scope
they're unattributable (bulk_fetch needs a dataset_id) and dropped
with a structured warning.
Concurrency bounded by MAX_CONCURRENT_BULK_FETCH=6, matching
summary_table_service. Two structured log events for observability:
- aggregate_documents.hydrated_via_bulk_fetch (info; per-call summary)
- aggregate_documents.bare_ids_dropped (warning; cloud-shape anomaly)
Numeric equivalence pinned by test: same fixture run as (a) full-body
inline and (b) slim-refs-then-bulk-fetch produces byte-equal
{count, mean, median, std, min, max} per group, identical
numeric_matches, identical datasets_contributing.
Tests (+6):
- test_hydrates_slim_refs_via_bulk_fetch — bulk_fetch IS called when
refs are slim
- test_full_body_path_skips_bulk_fetch_no_op — happy path latency
untouched
- test_per_doc_vs_bulk_numeric_equivalence — the regression pin
- test_hydration_chunks_at_bulk_fetch_max — 600 refs → 2 batches
(500 + 100)
- test_hydration_chunks_per_dataset — refs spanning 2 datasets fan
out into 2 bulk_fetch calls
- test_hydration_handles_bare_id_strings_under_single_dataset_scope
No cache schema bump (response shape unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /tabular_query router exposes both GET and POST variants so the
cloud-app's POST proxy at /api/datasets/[id]/tabular-query can forward
its body 1:1 without GET-vs-POST translation. Both routes share the
same `_dispatch` helper and the same `TabularQueryBody` Pydantic
validator (which mirrors the GET query-param contract), so the
responses MUST be byte-identical for equivalent input.
The dispatcher refactor + body validator landed during the prior
arc — this commit pins the contract with integration tests:
Tests (+4 against the existing app_and_cloud fixture):
- test_tabular_query_get_and_post_return_identical_shape — same
full-param request (variableNameContains + groupBy + groupOrder)
yields byte-equal response bodies.
- test_tabular_query_get_and_post_handle_optional_params_identically
— omitting optional params likewise produces equivalent output.
- test_tabular_query_post_rejects_missing_variable_name — POST body
validator surfaces the typed 400 VALIDATION_ERROR envelope.
- test_tabular_query_get_rejects_missing_variable_name — GET path
surfaces the same envelope.
The integration tests use the empty-ontology happy path
(`_install_empty_ontology_mocks`) so the assertions don't depend on
the cloud's real document corpus — both routes still exercise
`_dispatch` → `TabularQueryService.violin_groups` end-to-end, just
through the `_empty_response` branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-implements the S5.3 backend after the prior session's git reset
during agent collision discarded the ~400 LOC service code. Full
design preserved in apps/web/docs/reviews/2026-05-19b-post-handoff-execution.md
and re-implemented deterministically from that spec.
Backend changes:
- backend/services/tabular_query_service.py
- MAX_PAIRS = 1000 cap (matches MAX_GROUPS × MAX_VALUES_PER_GROUP scale)
- _TREATMENT_CLASS_CHAIN, _SUBJECT_KEY, _TREATMENT_LABEL_FIELDS constants
- _find_matching_group(..., exclude_group_idx=) — Y-side cross-table
search forces a DIFFERENT group than X (preserves violin semantics
when exclude_group_idx=None)
- cross_table_pairs orchestrator + _cross_table_pairs_subject +
_cross_table_pairs_treatment + _build_treatment_subject_map
- 9 module-level helpers: _empty_pairs_response, _index_of_group,
_build_subject_value_map, _build_subject_group_map,
_columns_for_pair_group_by, _inner_join_pairs,
_inner_join_treatment_pairs, _order_pairs_by_group,
_pick_treatment_label_for_needle
- backend/routers/tabular_query.py
- CrossTableQueryBody Pydantic model (xVariableContains,
yVariableContains, joinOn, groupBy?, groupOrder?)
- POST /api/datasets/{dataset_id}/cross-table-query handler with
503 envelope on CloudInternalError/CloudUnreachable/CloudTimeout
(mirrors the violin path's discipline)
Tests (backend/tests/unit/test_tabular_query_service_cross_table.py):
- 52 new unit tests covering subject-join happy path, groupBy
resolution (X-first-then-Y), groupOrder, treatment-join with
auto-color fallback, treatment chain walking with last-write-wins,
flaky single_class recovery, empty-state diagnostics, MAX_PAIRS
cap, and the 9 helper functions individually.
Response contract matches the cloud-app's BackendCrossTableResponse
type at apps/web/lib/ndi/tools/cross-table-query.ts — cloud-app side
was already wired and waiting (handlers + scatter chart + panel
toggle all pushed prior).
CI: ruff clean, mypy --strict reports 4 PRE-EXISTING errors in
untouched files (zero new errors from this change), 1125 tests
pass (was 1060 + F-8 pin pre-arc).
Refs: 2026-05-19b-post-handoff-execution.md "S5.3 detail"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Applies the F-1 stub diff preserved at
apps/web/docs/specs/2026-05-18-f1-stimulus-projection-stub.diff
(241 lines) which pins:
- /tables/stimulus_presentation column shape (the six STIMULUS_COLUMNS
keys + row content from depends_on edge + presentations list)
- /tables/stimulus short-form resolves to stimulus_presentation via
the _CLASS_ALIASES chain when the literal class returns 0 IDs
- Stream 5.8 pagination respected (page/pageSize) WITHOUT re-fanning
the cloud (proves cache-once-slice-in-memory)
respx 0.23.1 fix: the original alias test used
`router.post("/ndiquery").mock(side_effect=_ndiquery)` with a 2-arg
callable signature. This hung indefinitely under pytest 9.x +
asyncio_mode=auto. Rewrote to use FIFO route ordering with two
chained `.respond()` calls (specific predicate via
`json__searchstructure__0__param1` first, generic fallback second)
— passes in 0.17s. Same end-state contract, no test debt.
CI: 1128 backend tests pass (was 1125 + the 3 new F-1 tests); ruff
clean. F-1 backend implementation has been live on Railway since
commit 0231851 — this commit only adds integration tests pinning the
contract.
Refs: 2026-05-19b-post-handoff-execution.md "F-1 detail"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This branch backs the experimental Railway environment for byte-for-byte audit comparison with production. Triple-protected: draft state + DO NOT MERGE title prefix + this comment. The merge gate is the audit passing, NOT CI green.
Summary
Phase A of the NDI-python integration. Installs the NDI stack on the Railway image (vlt + did + ndr + vhlab-toolbox-python + ndi-compress) with
--no-depsto skip matplotlib (~50-70 MB) and opencv (~80 MB), saving ~150 MB image growth vs the naive install. Wires three new capabilities into existing services:vlt.file.vhsb_read— unblocks Haley + any future VHSB dataset for QuickPlot + signal-chart. Today these always soft-error with the legacy "vlt library not available" message.ndicompress.expand_*— supports the.nbf.tgzwrapper. Today these silently fail in_parse_nbf.OntologyService— when existing external providers miss, falls back to NDI's bundled lookup which knows lab-specific terms (WBStrain, Cre lines, NDIC) that public OLS providers don't have.Architecture: graceful degradation
All NDI calls go through a new
services/ndi_python_service.pythat:Noneon miss, error, or "NDI not installed" — callers fall through to their existing legacy pathsSo even if the NDI install fails in production, the public surface keeps working. The audit script in
apps/web/scripts/audit-public-api.mjsproves this empirically.Files
infra/Dockerfilegitapt dep + 5pip install --no-depslines + import sanity checkbackend/requirements.txtbackend/pyproject.tomlbackend/services/ndi_python_service.pybackend/services/binary_service.pybackend/services/ontology_service.pybackend/tests/unit/test_ndi_python_service.pydocs/plans/2026-05-13-ndi-python-integration.mddocs/plans/2026-05-13-railway-experimental-env-runbook.mdTest plan
ruff checkcleanmypy --strictcleanapps/web/scripts/audit-public-api.mjsLayer 1,tests/e2e/audit-public-pages.spec.tsLayer 2+3)vlt_librarysoft errorsMerge gate
This PR merges only when:
🤖 Generated with Claude Code