Skip to content

[DO NOT MERGE — experimental] feat(ndi-python): Phase A integration#112

Draft
audriB wants to merge 54 commits into
mainfrom
feat/ndi-python-phase-a
Draft

[DO NOT MERGE — experimental] feat(ndi-python): Phase A integration#112
audriB wants to merge 54 commits into
mainfrom
feat/ndi-python-phase-a

Conversation

@audriB
Copy link
Copy Markdown
Contributor

@audriB audriB commented May 13, 2026

⚠️ DO NOT MERGE

This branch backs the experimental Railway environment for byte-for-byte audit comparison with production. Triple-protected: draft state + DO NOT MERGE title prefix + this comment. The merge gate is the audit passing, NOT CI green.

Summary

Phase A of the NDI-python integration. Installs the NDI stack on the Railway image (vlt + did + ndr + vhlab-toolbox-python + ndi-compress) with --no-deps to skip matplotlib (~50-70 MB) and opencv (~80 MB), saving ~150 MB image growth vs the naive install. Wires three new capabilities into existing services:

  1. VHSB text-tag decoding via vlt.file.vhsb_read — unblocks Haley + any future VHSB dataset for QuickPlot + signal-chart. Today these always soft-error with the legacy "vlt library not available" message.
  2. NDI-compressed binary handling via ndicompress.expand_* — supports the .nbf.tgz wrapper. Today these silently fail in _parse_nbf.
  3. Ontology fallback in OntologyService — when existing external providers miss, falls back to NDI's bundled lookup which knows lab-specific terms (WBStrain, Cre lines, NDIC) that public OLS providers don't have.

Architecture: graceful degradation

All NDI calls go through a new services/ndi_python_service.py that:

  • Lazy-imports the NDI stack only on first call (cold-start neutral)
  • Caches the import success/failure as a module flag
  • Returns None on miss, error, or "NDI not installed" — callers fall through to their existing legacy paths

So even if the NDI install fails in production, the public surface keeps working. The audit script in apps/web/scripts/audit-public-api.mjs proves this empirically.

Files

File Change
infra/Dockerfile + git apt dep + 5 pip install --no-deps lines + import sanity check
backend/requirements.txt + 7 NDI runtime deps (handpicked, no matplotlib/opencv)
backend/pyproject.toml mirror of requirements.txt + mypy override for NDI/vlt/ndicompress
backend/services/ndi_python_service.py NEW — three thin wrappers + import guard
backend/services/binary_service.py dispatch order rewritten: compression → VHSB → inline → NBF
backend/services/ontology_service.py NDI fallback when existing providers return a stub
backend/tests/unit/test_ndi_python_service.py NEW — 19 unit tests (no NDI install required)
docs/plans/2026-05-13-ndi-python-integration.md Phase A/B/C strategy + recon findings
docs/plans/2026-05-13-railway-experimental-env-runbook.md dashboard walkthrough for the experimental env

Test plan

  • 19 new unit tests pass (sys.modules-mocked, CI doesn't pip-install NDI)
  • All 535 existing tests still pass — no regression detected on the 44 binary tests
  • ruff check clean
  • mypy --strict clean
  • CI green on Actions
  • Spin up experimental Railway env per the runbook
  • Run audit (apps/web/scripts/audit-public-api.mjs Layer 1, tests/e2e/audit-public-pages.spec.ts Layer 2+3)
  • Verify VHSB endpoints on Haley now return decoded data instead of vlt_library soft errors
  • Verify NBF endpoints are byte-identical to production (the audit will catch any divergence)
  • Merge ONLY after audit clean OR audit diffs documented as intentional

Merge gate

This PR merges only when:

  1. CI is green AND
  2. The audit shows byte-identical responses on the public surface (with the expected exception of previously-soft-erroring endpoints now returning real data) AND
  3. The user (Audri) explicitly green-lights the merge

🤖 Generated with Claude Code

…n/ontology

Adds the NDI-python stack (vlt + did + ndr + vhlab-toolbox-python + ndi-compress)
to the Railway image and wires three new capabilities into existing services:

  1. VHSB text-tag decoding via vlt.file.vhsb_read — unblocks Haley + any
     other VHSB-formatted dataset in QuickPlot / signal-chart. Today every
     VHSB request returns the legacy "vlt library not available" soft error
     because the early-return prefix check on b"This " never falls through
     to a real decoder.
  2. NDI-compressed binary detection + decompression via ndicompress.expand_*
     — handles the .nbf.tgz wrapper format. Today these silently fail in
     _parse_nbf.
  3. NDI ontology fallback in OntologyService — fires when the existing
     external providers miss. NDI's bundled lookup knows lab-specific terms
     (NDIC, WBStrain, Cre lines) that public OLS providers don't.

Image size cost: ~80-100 MB. We pip install with --no-deps and handpick the
runtime deps in requirements.txt + pyproject.toml so matplotlib (~50-70 MB)
and opencv (~80 MB) — both declared by DID-python + tutorials but never
imported by our paths — stay out of the image.

Strategy: ALL changes go through a new services/ndi_python_service.py that
lazy-imports + degrades gracefully when the stack isn't installed (returns
None on every call, callers fall through to their legacy paths). So even
on a build where the NDI install fails, the public surface keeps working.

Branch protection: this work targets a separate "experimental" Railway
environment for byte-for-byte audit against production before any merge to
main. See docs/plans/2026-05-13-railway-experimental-env-runbook.md for
the dashboard-step walkthrough.

Test plan:
- [x] 19 new unit tests cover the dispatch logic with NDI uninstalled (CI
      doesn't pip-install NDI; tests use sys.modules mocking)
- [x] All 44 existing binary tests + 535 other tests pass unchanged
- [x] mypy strict mode clean (with ndi/vlt/ndicompress in missing-imports
      override matching the scipy + opentelemetry pattern)
- [x] ruff lint clean
- [ ] Build the experimental Railway env from this branch
- [ ] Run apps/web/scripts/audit-public-api.mjs (Layer 1) + the e2e
      audit-public-pages spec (Layer 2 + 3) against production vs
      experimental — see plan doc for the workflow

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 13, 2026 23:03 Inactive
… strict-boot

Adds the first Plotly-charts pipeline backend half: a tabular_query
endpoint that aggregates ontologyTableRow documents into per-group
statistics (mean, median, std, q1/q3, min/max, count) plus the raw
values needed for a violin-with-jitter render. Powers Dabrowska EPM
and Fear-Potentiated Startle plots, Bhar chemotaxis violins, and any
future categorical-by-group comparison.

Endpoint:
  GET /api/datasets/{id}/tabular_query
      ?variableNameContains=ElevatedPlusMaze    # required substring
      &groupBy=treatment_group                   # optional
      &groupOrder=Saline,CNO                     # optional CSV

Pipeline:
  1. SummaryTableService.ontology_tables(...) returns one group per
     distinct variableNames schema (already exists, used by Document
     Explorer)
  2. tabular_query_service finds the first group with a column whose
     key/label matches the substring
  3. Buckets rows by groupBy column (or single 'all' bucket if unset)
  4. Computes stats on the full value list, then stride-samples
     down to MAX_VALUES_PER_GROUP=500 for the violin's jitter overlay
     (stats stay accurate)
  5. Caps at MAX_GROUPS=20 with first-seen ordering + explicit
     group_order override

Hygiene fixes folded into this commit:

- Pin all five NDI-python git deps to specific SHAs in infra/Dockerfile
  (replaces @main which silently picks up upstream changes on every
  redeploy). SHAs captured 2026-05-13. To upgrade: re-run
  `git ls-remote <repo> HEAD` and bump.

- Strict-on-boot NDI check: when NDI_PYTHON_REQUIRED=1 (set by the
  Dockerfile), lifespan hard-fails if vlt/ndicompress/ndi.ontology
  aren't importable. Catches broken images at boot instead of letting
  the chat silently degrade with every NDI tool returning None.

Tests: 21 new unit tests cover the violin aggregation math + edge
cases (no matches, no group_by, group_order overrides, group cap,
value cap with accurate stats, NaN/Inf skipping, empty substring,
no ontology docs). All 559 backend tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 01:54 Inactive
Smoke-testing the new endpoint against the experimental Railway
deploy surfaced an issue with real-world data: ontologyTableRow
tables routinely have multiple columns sharing a topic prefix
(e.g. 'ElevatedPlusMaze: Test Identifier' + 'ElevatedPlusMaze:
Open Arm Entries' + …). The first-match logic picked the test-
identifier column → no numeric values → empty violin response
('no numeric values in matched column').

Fix: score each matching column by how many rows have finite-
numeric values, and pick the column with the most numeric rows
across all matching groups. Ties broken by first-seen order
(groups are already sorted by row count desc upstream).

Adds one test (test_violin_groups_prefers_numeric_column_over_
identifier) covering the real-world layout. All 19 tabular_query
tests pass; 561 backend tests pass overall.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 01:59 Inactive
Smoke-testing against real Dabrowska EPM data showed that the LLM
naturally calls groupBy='Treatment' but the actual ontology column
key is 'Treatment_CNOOrSalineAdministration'. Exact-match returns
empty groups; the user/LLM has no way to know the canonical key.

Fix: resolve groupBy via the same substring-match strategy as the
value column. Exact key match wins; then substring against keys;
then substring against labels. When nothing matches, return empty
+ a meta listing the column keys we DO have so the caller can retry.

Verified live: 'Treatment' now resolves to 'Treatment_CNOOrSaline
Administration', returning Saline (n=22, mean=5.86) + CNO (n=23,
mean=5.09) for EPM open-arm-north entries.

Tests: +2 cases (substring resolution + unresolvable-with-available).
All 562 backend tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 02:04 Inactive
…ding

Three backend additions wiring the new labchat chat tools.

## image_service + /api/datasets/:id/documents/:docId/image

Pillow-based decoder for TIFF/PNG/JPEG/GIF documents (with NDI-native
.nim formats flagged as 'unsupported' rather than failing). Downsamples
>512px via Pillow thumbnail; multi-frame support via `frame=N`. Returns
structured envelope with width/height/min/max/format + downsampled flag.
Errors get errorKind={notfound, decode, unsupported}. Same SSRF
hardening as existing binary download path (cloud-allowlisted hosts
only). 18 unit tests.

## summary_table_service.distinct_summary

Backend now computes per-column distinct value counts + top-K most
common values across ALL rows of a class-table response (not just the
page slice). Cached under the same TTL as columns+rows. Caps at
DISTINCT_SUMMARY_MAX_ROWS=10_000 (skips with a meta sentinel for very
large tables). Top-K = 5 per column.

The LLM sees this as `distinctSummary` on the query_documents tool
result. Existing test extended with TestHashable + TestBuildDistinctSummary
covering: Dabrowska 49-row collapse case, multi-value top-K, None/dict/
malformed handling, skip-when-too-many path.

## Sprint 1.5: dataset_binding_service + /api/datasets/:id/ndi_overview

Lazy in-memory LRU cache (max 5) wrapping ndi.cloud.orchestration.
downloadDataset. Per-dataset asyncio.Lock prevents concurrent cold-load
races. 90s cold-load timeout, 5GB on-disk soft cap with warn log.

Returns: element_count, subject_count, total epoch_count across all
elements, first 50 element {name, type} pairs, cache_hit + cache_age.

Pre-warm at app boot (production/preview only) for the 3 demo
datasets — fire-and-forget asyncio.create_task, cancellable on
shutdown. is_ndi_available() extended via is_dataset_binding_available()
probing ndi.dataset + ndi.cloud.orchestration imports.

Endpoint surface: GET /api/datasets/:id/ndi_overview with limit_reads
rate limit. 503 envelope { error, reason } on any failure so the
frontend can fall back to ndi_query gracefully. 60s request timeout.

12 unit tests + 2 live-cloud integration tests (gated behind
LIVE_NDI_TESTS=1).

## NDI cache directory

NDI_CACHE_DIR env (default /tmp/ndi-cache). Already-writable by the
ndb user in the Dockerfile. /tmp is ephemeral on Railway redeploys —
pre-warm at boot mitigates for the 3 demo datasets; user-driven
cold-loads of other datasets pay the 10-30s tax.

## Open risk

downloadDataset against the published demo datasets ANONYMOUSLY (no
service-account token) is unverified in the Railway environment. If
auth is required, the 503 fallback keeps chat usable. Plan B: add a
service-account token via Railway env vars and retry.

Combined backend test count: 603 unit + 110 integration (was 562
unit + 79 integration pre-wave). All green.
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 18:13 Inactive
The previous response shape gave the frontend a single arbitrary docId
(`source.document_id = doc_ids[0]`) for the entire aggregation —
misleading when the chart is summarizing dozens of rows across 2+
groups. The user can verify the column being compared via the
ontology-tables view, but couldn't drill into specific group examples
("show me one Saline row, one CNO row").

This change threads per-row docIds through the bucketing so each
output group surfaces:
  - docIds: list[str]   (capped at MAX_DOC_IDS_PER_GROUP=3)
  - totalRows: int       (true contributing row count)

`source.document_id` is preserved for backwards-compat but is no
longer the primary citation path.

Implementation notes:
- `SummaryTableService.ontology_tables` already returns docIds
  parallel to rows (same index). The service now enumerates rows by
  index and routes each row's docId to the appropriate bucket.
- Missing-docId desync is tolerated silently (rather than padding
  with bogus IDs). If the projection ever returns fewer docIds than
  rows, affected buckets just get shorter docId lists.
- 3 docIds per group is enough for the chat to build per-group
  sample-row chips without flooding the citation panel. The
  complete set of contributing rows is reachable from the primary
  table-view citation.

Tests:
- Extended test_violin_groups_basic to assert per-group docIds +
  totalRows on the standard 2-group Saline/CNO shape.
- Added test_violin_groups_per_group_doc_id_cap covering the 10-row
  cap behavior — Saline contributes 10 rows but only 3 docIds
  surface; CNO with 2 rows surfaces both.
- Added test_violin_groups_missing_doc_ids_tolerated covering the
  desync case — service must not crash or invent IDs.

23/23 unit tests pass.

This is the backend half of the user-reported "chart citation seems
to point to one arbitrary row, not the column or table of entries
being aggregated" fix. The frontend (cloud-app) side will land
separately and consume the new per-group docIds.
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 18:37 Inactive
Accidentally staged in the previous commit (6aebed9). Identical
content to file_format.py modulo a unicode hyphen in a docstring.
CI hygiene gate would reject it; removing now.
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 18:38 Inactive
… data browser

User observation: 'I can point to at least one place where ontology
is not resolving in the data browser.'

Root cause: ``OntologyService.lookup`` short-circuits on cache hit
BEFORE reaching the NDI-python fallback added in Phase A
(2026-05-13). Pre-Phase-A lookups of lab-specific terms like
``WBStrain:00000001`` or ``NDIC:1`` returned an empty stub
(``label=None``, ``definition=None``) from the legacy provider, and
those stubs got cached. ``ONTOLOGY_CACHE_TTL_DAYS`` defaults to 30,
so the stale stubs would persist for ~a month and prevent the new
fallback from ever firing — every consumer of the ontology service
(DatasetSummaryCard pills, FacetPanel filters, SummaryTableView
cells, ChatBot's lookup_ontology tool) sees the un-resolved CURIE
until the stub expires.

The fix:
- ``lookup`` now treats stubs as cache MISSES: a stub (label=None
  AND definition=None) does NOT short-circuit. The legacy provider
  + NDI-python fallback both run again, and on a real hit the new
  ``self.cache.set`` OVERWRITES the stub with the resolved entry.
- ``batch_lookup`` inherits the fix automatically (it delegates to
  ``self.lookup`` per term).

Stuck stubs heal on first use — no admin endpoint or cache wipe
needed. Worst case for a term that NEITHER the legacy providers
NOR NDI-python can resolve: we keep the stub and re-attempt on
every call. That's negligible network traffic (NDI-python is local;
legacy providers have their own throttling) and only happens for
genuinely unknown terms.

Tests: 6 new tests covering:
  - Real cache hit still short-circuits (no upstream calls)
  - Stub cache entry triggers retry → NDI-python wins → cache
    overwritten with real hit
  - Both legacy and NDI-python failing keeps the existing stub (no
    redundant write)
  - Fresh term with legacy hit doesn't call NDI-python (it's a
    fallback, not a co-resolver)
  - Fresh term where legacy returns stub falls through to NDI-python
  - Batch lookup inherits the bypass behavior

After this lands and Railway redeploys, every data-browser surface
that displays ontology terms (Dataset hero pills, facet filters,
summary table cells, chat lookup_ontology) automatically benefits
without any frontend changes.
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 19:00 Inactive
…ar_query error envelope

Aggregated from audit agents that returned with backend findings.

## Critical — every consumer of OntologyService was leaking raw CURIEs

* **_fetch_wormbase echoed strain_id as label**: pre-fix, the
  WormBase provider returned `label=strain_id` (e.g. "00000001")
  WITHOUT actually scraping the strain page. This is a "label-is-
  truthy" stub that bypassed the NDI-python fallback —
  `WBStrain:00000001` showed up as "00000001" on the Bhar dataset
  hero pills, the /query Strains facet, and the OpenmindsSubject
  table. The visual UX audit caught this as a P0. Fix: return
  label=None so the NDI-python fallback fires (which knows the
  strain name → "N2 wild-type"). The dataset_summary_service's
  `_enrich_ontology_labels` then writes the real label onto the
  term and the cache stub gets overwritten on first use.

* **UBERON / GO / OBI missing from _OLS_PROVIDERS dict**: pre-fix,
  the provider allowlist was `{CL, NCBITaxon, CHEBI, PATO, EFO}`.
  UBERON is the most common prefix in NDI data (every brain_region
  and probe_location), but UBERON requests fell through to a
  stub. `UBERON:0001870` showed `label=null` on every popover, hero
  pill, and column rendering. OLS4 has the same query endpoint for
  every OBO ontology — adding UBERON/GO/OBI to the dict is a
  3-line fix that unblocks the entire OBO ontology family.

## Important — tabular_query router error envelope

* **tabular_query 500 → typed 503 on cloud errors**: the router
  raised cloud-client exceptions through FastAPI's global handler,
  producing opaque 500 JSON envelopes the chat tool layer couldn't
  surface usefully. Now mapped to 503 with `{error, errorKind,
  reason}` matching the discipline of /ndi_overview. Adds
  structured log at `tabular_query.cloud_error`.

## Tests

* 29 backend tests pass (23 tabular_query + 6 ontology_service).
* All 611 unit tests pass overall.

## What this restores

After Railway redeploys:
- WBStrain pills resolve to actual strain names ("N2 wild-type"
  instead of "00000001") across all data-browser surfaces.
- UBERON popovers resolve ("frontal cortex" instead of null).
- Cloud failures on the violin chart endpoint produce a typed
  error the chat can paraphrase, not an opaque 500.

## Combined with the frontend commit (293ddea)

That commit fixed the chat's lookup_ontology field-name mismatch
(was reading res.name when backend returns res.label, so every
ontology lookup reported "found: false"). Together, these
two commits make ontology resolution functional end-to-end —
the user-reported "ontology not resolving in the data browser"
issue, plus several similar bugs the chat-side tool was hitting.
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 19:20 Inactive
…solve

Visual-UX audit (a395 P0 #3, 2026-05-14) reported every anonymous
summary-table view triggered a 403 from /api/ontology/batch-lookup,
falling back to label-only display and surfacing a "1 warning · Some
entries lack canonical ontology IDs" banner. The endpoint is POST-
shaped because the body holds an array of up to 200 CURIEs (keeps the
URL clean), but functionally it's a read-only lookup — no state
mutation, idempotent. Pre-fix, anonymous visitors hit it on every
dataset render before they'd had a chance to GET /api/auth/csrf, so
the double-submit token check failed closed.

Adding /api/ontology/batch-lookup to EXEMPT_PATHS lets anonymous
requests through. The endpoint remains rate-limited via the existing
limit_reads dependency in the router, so abuse vectors stay bounded
(200 CURIE batches × per-IP read budget).

Verification:
- Backend unit test added: asserts /api/ontology/batch-lookup is in
  EXEMPT_PATHS so the exemption can't be silently lost in a refactor.
- All 612 backend unit tests pass (+1 from the new test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 20:14 Inactive
Two fixes in the ontology + facet pipelines surfaced by the visual-UX
audit (Bhar dataset overview + /datasets / /query facet rail).

WBStrain scrape (~P1, demo-relevant)
------------------------------------
NDI-python's WBStrain provider returns a URL but no label, so every
WBStrain CURIE on the Bhar overview resolved to the bare strain ID
("WBStrain:00000001") instead of the strain name ("N2"). Add a scrape
fallback inside `OntologyService._fetch_wormbase` that GETs the canonical
wormbase.org strain page and parses the strain name from `<title>`
(primary anchor: `<title>N2 (strain) - WormBase ...</title>`) with a
page-title-breadcrumb regex as the secondary parser. 5s timeout; any
failure (Cloudflare interstitial, 404, parse miss, network error) falls
through to `label=None` so the existing NDI-python fallback path still
fires. Result is cached via the existing OntologyCache so each strain
page is hit at most once per 30-day TTL. No new dependency — pure
httpx+re inside the existing async client. Six unit tests cover the
happy path, breadcrumb fallback, Cloudflare-interstitial blocking,
404 fallthrough, network-error fallthrough, and end-to-end caching.

Caenorhabditis facet dedup (visual-UX audit row #6 / a395)
----------------------------------------------------------
`/datasets` and `/query` showed two "Caenorhabditis elegans" chips
because one contributing dataset reported the species with
`ontologyId=NCBITaxon:6239` and another with `ontologyId=None`. The
prior dedupe keyspace was disjoint — `oid::NCBITaxon:6239` and
`norm::caenorhabditis elegans` lived in separate slots — so both
surfaced as distinct chips. Backend fix (not frontend): future API
consumers like the chat tool get clean facet data instead of having to
re-dedupe at every read site.

Refactored `_add_ontology_term` to register all candidate keys (oid,
abbrev, norm) as aliases on the bucket. Lookup walks them in priority
order (oid > abbrev > norm). The norm/abbrev alias merge honours an
asymmetric guard: distinct ontologyIds with the same label stay
distinct (different providers can legitimately catalog the same name
as different concepts — preserves
`test_ontology_id_still_takes_priority_over_label_normalization`).
On promotion, the surviving entry inherits the labeled side's
ontologyId.

Verification
------------
- 628 backend unit tests pass, +8 from new tests (6 ontology scrape +
  2 facet labeled/unlabeled merge).
- ruff clean on touched files.
- Bug repro before fix: 2 chips. After fix: 1 chip with NCBITaxon:6239
  attribution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 20:50 Inactive
Two targeted fixes for the experimental /ask chat tool wiring.

## Task 1 — ndi_overview 503 envelope now carries a stable `code`

`DatasetBindingService` records the most-recent cold-load failure on
`self._last_failure = (code, message)` and exposes it via
`last_failure()`. The `/ndi_overview` router surfaces the code alongside
the existing `reason` string so the chat tool (and operators in the
dashboard) can tell `phase_a_unavailable` apart from `binding_unavailable`,
`cache_dir_unwritable`, `cold_load_timeout`, or `cold_load_failed`. The
generic 503 → "use ndi_query" fallback semantics are unchanged.

Also adds an explicit `is_dataset_binding_available()` short-circuit so
the binding never tries to import `ndi.cloud.orchestration` lazily after
the boot probe already marked it missing — surfaces as a clean
`binding_unavailable` 503 instead of bubbling an ImportError through the
cold-load timeout.

Production effect on the experimental Railway: `/ndi_overview` for
67f723d574f5f79c6062389d goes from a generic "NDI-python dataset
materialization failed or is not configured on this server" string to
`code=binding_unavailable, reason="ndi.dataset / ndi.cloud.orchestration
not importable"` — same 503 status, just diagnosable.

## Task 2 — probe→element / epoch→element_epoch class alias

`SummaryTableService._build_single_class` now retries an `isa <alias>`
query when the literal class returns 0 IDs. Pinned to:

  probe → element
  epoch → element_epoch

The projection key stays the user-requested class so `PROBE_COLUMNS` /
`EPOCH_COLUMNS` are emitted regardless. The resolved class is logged
(`table.single.alias_hit`, `resolved_class=` in the build log) for
forensics.

Smoke-tested against Dabrowska BNST (id 67f723d574f5f79c6062389d, 0
probe docs + 606 element docs): pre-fix `GET /api/datasets/.../tables/
probe` returned 0 rows even though `summary.probeTypes` listed
patch-Vm / patch-I / stimulator. Post-fix the same endpoint returns 606
element rows under the probe column shape — which is what the chat's
`query_documents(className="probe")` tool consumer already expects.
Legacy datasets that DO emit `probe` directly (Van Hooser) skip the
alias path entirely.

Coverage: 4 new alias tests in test_summary_table_class_alias.py + 7
new `last_failure()` contract tests in test_dataset_binding_service.py.
628 unit tests pass (was 612 on this branch).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 20:52 Inactive
Moves the Gantt-style treatment-timeline tool from the Next.js chat
layer (apps/web/lib/ndi/tools/treatment-timeline.ts) into the backend
so the heart of NDI processing lives next to ndi-python. The TS
handler now shrinks to a thin proxy that forwards
{datasetId, title, maxSubjects} to this endpoint and reshapes the
raw response into the chat-specific chart_payload envelope.

## New endpoint

POST /api/datasets/{dataset_id}/treatment-timeline

Body (pydantic v2 with camelCase + snake_case aliases):
  {title?, maxSubjects? | max_subjects?}      # default 30, hard cap 100

Response (raw — no chart_payload framing on the backend):
  {
    datasetId, title?, items: [{subject, treatment, start, end}],
    total_subjects, total_treatments,
    temporal_source: "explicit" | "ordinal" | "mixed",
    empty_hint?: {reason, available_columns?}
  }

Auth: get_current_session dependency — public datasets work
anonymously, private datasets honor the session cookie. Rate-limited
under the standard reads bucket via Depends(limit_reads).

## Orchestration

1. PRIMARY: SummaryTableService.single_class(dataset_id, "treatment")
   returns rows with treatmentName + subjectDocumentIdentifier +
   numericValue + stringValue + treatmentOntology.
2. FALLBACK (only if primary empty):
   TabularQueryService.violin_groups(dataset_id, "Treatment", ...)
   synthesizes one row per group with subject="group:<name>".
3. Per-subject ordinal counter assigns [i, i+1] timing when no
   explicit start/end is present on a row.
4. temporal_source distinguishes all-explicit vs all-ordinal vs
   mixed so the LLM can disclose the timing caveat in prose.
5. empty_hint is set only when both backends are empty OR when rows
   came back but none had a usable subject+treatment pair.

## Tests

32 unit tests in test_treatment_timeline_service.py covering:
- Pure helpers: subject/treatment label fallback, explicit-timing
  extraction (numericValue array/scalar, startDate/endDate pair, ISO
  string parsing), temporal-source classification.
- Primary happy path: 5 treatments x 3 subjects, explicit timing.
- Ordinal timing: rows without numericValue → per-subject [i, i+1].
- Mixed timing: some explicit, some ordinal → temporal_source='mixed'.
- maxSubjects=30 cap on 50 distinct subjects → trims to 30.
- Primary empty + fallback hits → synthetic group:<name> rows.
- Primary empty + fallback empty → empty_hint with reason +
  available_columns.
- Edge cases: rows missing subject/treatment dropped silently,
  unplottable-rows variant of empty_hint, primary failure falls
  through to fallback, both failures still return well-typed
  response with empty_hint set (not 500).

## Deviations from TS

- chart_payload + references_summary stay in the TS proxy. The
  Python endpoint returns RAW data so the workspace can consume it
  directly without unwrapping chat-specific framing.
- References array (per-subject citation chips) likewise stays in
  the TS layer — that's chat-tool framing, not data.
- Otherwise byte-for-byte semantics: subject ordering, ordinal
  counter mechanics, temporal_source discriminator, empty_hint
  reasons, and the [start, end] field ordering all match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 22:19 Inactive
Port the spike-summary discovery + per-unit shaping from the TS chat
tool (apps/web/lib/ndi/tools/fetch-spike-summary.ts) to a Railway
service so the heart of NDI processing lives next to ndi-python.

- backend/services/spike_summary_service.py — three-mode discovery
  (unit_doc_id / unit_name_match / bare-scan), per-unit
  ``data.vmspikesummary.spike_times`` extraction with the canonical
  field-path probe order (spike_times → spiketimes → sample_times),
  numpy-backed ISI computation in ms, stride-sample caps mirroring
  the TS handler (5000 spikes/unit, 5000 ISIs/unit), kind-gated
  field omission so raster-only / isi-only callers get compact
  responses.
- backend/routers/spike_summary.py — POST
  /api/datasets/{id}/spike-summary; reuses get_current_session +
  limit_reads, returns raw per-unit data (no chat-specific
  chart_payloads — TS layer reshapes), translates cloud failures
  to a typed 503 envelope.
- 27 unit tests covering single-doc path, query path, max_units
  cap, stride-sample cap, t_window filter, kind gating, soft-error
  envelope on decode failure, camelCase alias round-trip,
  and the spike-times field-path fallback chain.

Bypasses QueryService.execute and calls cloud.ndiquery directly
because QueryService's scope validator enforces a Mongo-ObjectId
regex that the existing dataset_id path-validator already covers
(matches the PivotService pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a new aggregator endpoint that turns one vmspikesummary unit + one
stimulus document into a binned firing-rate-per-bin response, the
canonical sensory-neuroscience visualization. Mirrors the
spike-summary / treatment-timeline ports: TS chat tool becomes a thin
proxy, heart of NDI processing stays next to ndi-python on Railway.

- backend/services/psth_service.py — orchestration entry point
  ``compute_psth(request, ...)``. Resolves the unit (probes
  data.vmspikesummary.{spike_times, spiketimes, sample_times} then
  falls back to BinaryService.get_timeseries for non-inlined trains)
  and the stimulus events (canonical NDI paths first:
  data.stimulus_presentation.presentations[*].time_started,
  data.stimulus_response.responses[*].stim_time, then data.events and
  top-level events for preprocessed payloads). numpy.histogram does
  the binning across all trial-relative spike times. Returns parallel
  bin_centers / counts / mean_rate_hz arrays plus n_trials, n_spikes,
  unit_name, doc-id provenance, optional per_trial_raster when
  include_raster=True. Server-side caps: bin_size_ms >= 1 ms,
  (t1-t0) <= 10 s, N_bins <= 1000.
- backend/routers/psth.py — POST /api/datasets/{id}/psth. Reuses
  get_current_session + limit_reads + DatasetId validator. Returns
  raw scientific data (no chart_payload/references decoration — TS
  layer reshapes for the chat fence; workspace consumes directly).
  Soft-error envelope (error + error_kind) for invalid_window /
  decode_failed / no_events / empty_window cases so the chat tool
  branches on error_kind instead of crashing. Cloud-tier exceptions
  translate to a typed 503 envelope, matching /spike-summary.
- backend/tests/unit/test_psth_service.py — 30 tests covering pure
  helpers (spike-times + event extraction across all four doc-class
  paths, bin layout, raster cap, window validation) and the
  compute_psth integration (happy-path parallel arrays, empty-window
  envelope, include_raster, cap enforcement, decode_failed +
  no_events soft errors, binary fallback, camelCase alias).
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 14, 2026 22:26 Inactive
Live verification (2026-05-18) shows the treatment-timeline endpoint
already returns 56 items / 28 subjects for Haley
(`682e7772cdf3f24938176fac`) — F-1e's merge-all-rows chain walker
plus `_row_treatment`'s literal-`treatment` branch surface the 56
food-restriction-onset/offset docs correctly, and `_pick_subject_label`
+ `_pick_treatment_label` accept their values.

What was NOT working: `temporal_source` came back `"ordinal"` for
every Haley row because `_parse_iso_datetime` couldn't read the
MATLAB `datestr` default format (`"03-Nov-2023 07:53:00"`) that
Haley's `treatment.string_value` carries. Result: the Gantt's
x-axis showed synthesised ordinal slots 0..N instead of real
wall-time onset/offset events — accurate but visually misleading
(every subject's onset appeared at "slot 0" rather than its true
date in early November 2023).

This commit:

- Adds a MATLAB `datestr` fallback to `_parse_iso_datetime`. ISO
  still wins via `datetime.fromisoformat` (regression-pinned);
  only inputs that fail ISO try the MATLAB formats
  (`%d-%b-%Y %H:%M:%S` and the date-only `%d-%b-%Y`). Locale
  assumption (`C`/`en_*`) matches the Railway container shape.
- Tightens the `dt: datetime | None` annotation so mypy doesn't
  flag the dual-source assignment.
- Adds 7 unit tests:
    - Three covering the new `_parse_iso_datetime` MATLAB branch
      (`dd-MMM-yyyy hh:mm:ss`, date-only, garbage-still-None).
    - One ISO regression pin so the new branch can't accidentally
      shadow `fromisoformat`.
    - One end-to-end via `_extract_explicit_timing`.
    - Two `_fetch_primary_rows` integration tests covering
      literal-only (Haley shape, 56 rows, single contributing class)
      and merged literal+subclass (chain merges rows from multiple
      classes).
- Adds 2 `_row_treatment` projection tests that pin the literal-
  `treatment` branch against the exact Haley doc shape (curl'd
  from the experimental backend) and an ISO-flavoured variant.

Cache schema unchanged. The summary-table response shape is
identical (no new columns); only the timeline endpoint's
`temporal_source` value can shift from `"ordinal"` to `"explicit"`
or `"mixed"` for datasets that emit MATLAB datestr stringValues,
and the timeline endpoint does not cache.

Acceptance:
- `/api/datasets/682e7772cdf3f24938176fac/treatment-timeline` POST
  returned `total_subjects=28`, `total_treatments=56`,
  `temporal_source="ordinal"` pre-fix; post-fix temporal_source
  will be `"explicit"` (all 56 rows parse). Live curl confirmed
  the endpoint is non-empty.
- 1026 tests pass (up from 1017; +9 new).
- Lint + typecheck baseline preserved (pre-existing N802 + 2
  type errors in untouched files unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 19:02 Inactive
The `_pick_default_signal_ref` heuristic that landed in e03d470
fixed the `channel_list.bin`-instead-of-`.nbf` bug on
`get_timeseries`. This commit sweeps the rest of the binary-
decode endpoints:

Audited endpoints and their disposition:

- `/api/datasets/{id}/documents/{id}/signal` (signal.py) — already
  delegated to `BinaryService.get_timeseries`, which uses the smart
  pick. Already benefits transitively; no code change.
- `/api/datasets/{id}/spike-summary` (spike_summary.py) — reads
  spike_times from the JSON body inline (no binary file decode).
  No file-pick involved.
- `/api/datasets/{id}/documents/{id}/image` (image.py via
  ImageService.fetch_image) — picked `refs[0]` blindly. FIXED via
  new `_pick_default_image_ref` (Pillow-aligned extension list,
  same metadata blocklist as the signal pick).
- `/api/datasets/{id}/psth` (psth.py via psth_service) — uses
  `binary_service.get_timeseries`. Already benefits transitively;
  no code change.
- `/api/datasets/{id}/documents/{id}/data/image` (binary.py via
  BinaryService.get_image, Document Explorer's image viewer) —
  same `refs[0]` bug. FIXED to use `_pick_default_image_ref`.
- `/api/datasets/{id}/documents/{id}/data/raw` and
  `/data/raw` w/ Range (BinaryService.get_raw +
  get_raw_response) — by design, this is the imageStack
  passthrough where the caller has already established the
  doc IS a raw-bytes blob. The contract is "stream refs[0]
  verbatim" — changing this risks breaking imageStack flows.
  Left alone per the legacy contract.
- `/api/datasets/{id}/documents/{id}/data/video` (binary.py via
  BinaryService.get_video_url) — videos are single-file in
  practice; multi-file video docs aren't a real NDI shape. Left
  alone.
- `/api/visualize/distribution` — pure aggregation/stats. No
  binary file decoding.

Implementation: refactors `_pick_default_signal_ref` to delegate
to a new shared `_pick_ref_by_extension` helper. Adds the new
`_pick_default_image_ref` consumer (same step-1/step-2/step-3
heuristic against `_DECODABLE_IMAGE_EXTENSIONS` =
`.tif .tiff .png .jpg .jpeg .gif`). Both share the existing
`_KNOWN_METADATA_FILENAMES` blocklist — channel_list.bin /
meta.json / channels.json etc. are skipped regardless of which
decoder is choosing.

10 new unit tests in `test_binary_default_image_pick.py`
mirroring `test_binary_default_signal_pick.py`: TIFF/PNG/JPEG/
GIF variants picked, signal extensions NOT picked for image
decode, case-insensitive matching, suffix-with-non-alphanumeric
tail (`frame.tif_1`), single-file legacy fallback, all-metadata
fallback, and a pin that step-1 (extension match) wins over
step-2 (non-metadata fallback) when both apply.

Cache schema unchanged. Response shapes unchanged on both
endpoints — only file-pick selection changed, so a doc that
previously surfaced `errorKind=unsupported` because the picker
returned a JSON sidecar will now succeed with a valid image
payload.

Live signal-endpoint smoke confirmed post-fix (Francesconi
daqreader `68d6e54703a03f5cfdac8f07` returns
`format=nbf_compressed`, `sample_count=100` against
`/signal?downsample=100`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 19:09 Inactive
A "real" session is one with ≥1 other doc carrying depends_on.value
pointing at its ndiId. Parent / aggregate session docs
(administrative containers like Haley's `haley_2025` parent, ingested
10h after the two leaf recordings `haley_2025_Celegans` and
`_Ecoli`) have zero downstream references. MATLAB enumerates only
leaf sessions; the cloud's raw class count includes parents.

Pre-fix on Haley (682e7772cdf3f24938176fac):
- counts.sessions = 3 (raw)
- tutorial documents 2 recording sessions
- workspace Sessions picker rendered 3 rows, one unusable

Implementation:

- Adds `DatasetSummaryService._count_real_sessions` that fetches
  session docs via the existing `_fetch_class_bounded("session")`
  primitive, then fires one `depends_on * [ndiId]` ndiquery per
  session against the cloud's indexed reverse-dep path. Sessions
  with `totalItems > 0` are real; the rest are filtered out.

- Skip conditions:
  1. `counts.sessions <= 1` — nothing to filter.
  2. `counts.totalDocuments <= counts.sessions` — no non-session
     docs that could be downstream (newly-published catalog, test
     fixture, etc.). Don't waste ndiquery calls only to fail-open.
  3. `counts.sessions > _MAX_SESSIONS_FILTER_WALK` (50) — safety
     cap; multi-day series virtually always have downstream refs.

- Runs as an additional gather leg alongside the existing
  openminds_subject / probe_location / element fanout, so it adds
  zero wall-clock latency on the hot path (3-10 indexed
  reverse-dep queries ≈ hundreds of ms; the structured-facts legs
  dominate at multi-second scale).

- Fail-open semantics:
  * `_fetch_class_bounded` raises → keep raw count + typed warning
  * Per-session reverse-dep ndiquery raises → that session is
    counted as real
  * Every session looks unreferenced (real_count == 0) → keep raw
    count (probably a flaky cloud, not a real "all parents"
    dataset). Emits structured warning log.

- Observability: `dataset_summary.session_filter` log line whenever
  the filtered count differs from raw, recording raw_count,
  filtered_count, and parent_or_aggregate_sessions.

Cache schema unchanged (v7). `counts.sessions` is just an int field
that already existed; only its value can shift for affected
datasets. Existing cached summaries refresh naturally within their
24h TTL.

+12 unit tests:
- Canonical Haley case (3 → 2)
- Skip on counts.sessions <= 1
- Skip on totalDocuments <= sessions (pure-session test fixture)
- Skip over the 50-session safety cap
- Fail-open: all-zero downstream → keep raw count
- Fail-open: reverse-dep ndiquery 503 → session counted real
- Fail-open: session-class fetch fails → keep raw + warning
- 5 helper unit tests (constant, _filtered_sessions_or_warn paths,
  _identity_int)

Backend: 1036 → 1048 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:11 Inactive
The B6 fix in 058107a changes the VALUE of counts.sessions for any
dataset with parent/aggregate session docs (Haley: 3 → 2). Existing
cached summaries under the v1 prefix would persist for up to 24h
(full-success TTL), serving the pre-filter count across the rollout
window. Bumping the cache prefix invalidates all v1 entries
immediately; the next request runs the producer and writes a v2
entry with the filtered count.

The response SHAPE is identical — only the value shifts — so the
model's `schemaVersion` literal stays `summary:v1` (clients
consuming that field don't need to recompile). Only the cache key
namespace changes.

Test pins updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:16 Inactive
…atasets

Live verification of B6 on Haley (058107a + cache prefix v2 bump
9523950) showed counts.sessions still = 3 — every Haley session
returns 0 downstream depends_on refs because the lab encodes
session identity in `session.reference` strings rather than via
the depends_on graph:

  - haley_2025_Celegans  (leaf)
  - haley_2025_Ecoli     (leaf)
  - haley_2025           (parent / aggregate container)

The depends_on heuristic correctly returns 0 for all 3, triggering
the fail-open path and preserving raw count = 3. The right answer
is 2.

Adds a structural fallback that fires ONLY when the depends_on
heuristic returns 0 across all sessions:

  A session is a PARENT iff its `session.reference` is a strict
  prefix (separated by '_') of some OTHER session's reference in
  the same dataset.

Intentionally narrower than "any naming pattern" — requires a
SIBLING that extends this reference. A lone `haley_2025` without
`haley_2025_<species>` siblings stays counted as real. Multi-level
trees collapse to the deepest leaves correctly.

Helpers added:
  - `_session_reference(doc)` — extracts `data.session.reference`
    with fallbacks through `session_reference` and `name`
  - `_filter_by_reference_prefix(session_docs)` — returns leaf count,
    or None when ambiguous (missing refs, all refs identical)

The two heuristics now compose:
  1. depends_on returns ≥1 real → use that count (canonical signal)
  2. depends_on returns 0 → try prefix-suffix; if conclusive use it
  3. Both inconclusive → fail-open with raw count + audit log

The `via:` field in the structured log records which heuristic
fired (`depends_on` vs `reference_prefix`) so operators can audit
the rollout across all 8 published datasets.

+10 new unit tests:
  - Canonical Haley case (3 → 2 via prefix)
  - No-parent shape (2 leaves stay real)
  - Missing reference → None (bail to fail-open)
  - All sessions share same reference → None
  - Underscore separator required (no false positives on `haley` →
    `haley2025`)
  - 4-level hierarchy collapses to deepest leaves
  - Single session → None
  - _session_reference extracts via session.reference / .session_reference
    / .name fallback chain, returns None when block absent

Backend: 1048 → 1058 tests passing. mypy --strict baseline preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:27 Inactive
The cc64299 commit added the session.reference prefix-suffix
fallback, but the v2 cache entries from 058107a + 9523950 still
have the pre-fallback Haley count = 3. Bumping again invalidates
those so the next request runs the new filter and writes a v3
entry with sessions = 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:30 Inactive
Live Haley still returns sessions=3 after the prefix-fallback ship,
which means either (a) the heuristic branch never fires, or (b) it
sees session docs without the expected `data.session.reference`
shape. Add structured diagnostic log to capture exactly what
`_count_real_sessions` sees in production:
  - fetched_session_docs count
  - prefix-filter computed value (or None)
  - sample doc top-level keys
  - sample data-level keys
  - extracted references for each session

Bumps cache prefix to v4 to force the next request to re-run the
producer through the new code path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:39 Inactive
Railway structured-log feed doesn't surface our structlog
output. Push the prefix-fallback diagnostic into extractionWarnings
temporarily so the next curl tells us exactly:
  - fetched_session_docs count
  - prefix_filtered_value (None / N / raw)
  - sample_refs extracted
  - sample_data_keys (so we can see if the doc shape has the
    `session` block we expect)

Bumps cache prefix to v5 so the next request runs a fresh build
through this diagnostic path. Will revert the warning push once
B6 is verified working end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:44 Inactive
Previous diagnostic was inside the prefix-fallback block, but
that branch only fires when real_count == 0 from depends_on. If
depends_on returns >0 in production (unlikely per dependency-graph
manual curls but apparently happening), the function returns
early and the diagnostic never fires.

Moves the B6-DEBUG warning to right after real_count is computed
so it always surfaces. Reports the full results list so we can
see which sessions are flagged True vs False vs exceptions.

Bumps cache to v6 to force a fresh build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:49 Inactive
…v7 cache

The diagnostic revealed the production behavior on Haley: ALL 3
sessions return >0 via depends_on (even the parent), because
Haley publishes a `dataset_session_info` admin doc that
depends_on the parent session ndiId. Without the doc-class-name
oracle to distinguish "admin-only reference" from "experimental-
data reference", depends_on alone returns the raw count.

New composition policy:

  1. Compute prefix-suffix filtered count (None / N).
  2. If prefix returns a conclusive `0 < N < raw_count`, USE IT
     regardless of depends_on result. The structural signal
     "session B's name extends session A's name with `_`" is hard
     to satisfy coincidentally — sibling sessions don't typically
     share parent-extending names unless there's a real
     parent/child relationship.
  3. Else, use depends_on count if > 0 (canonical for labs that
     use the dependency graph for session identity).
  4. Else, fail-open with raw count.

Removes the temporary B6-DEBUG diagnostic warnings.

Bumps cache prefix to v7 to invalidate v6 entries cached during
the diagnostic rollout.

+2 new tests pin the composition:
- `test_prefix_refines_depends_on_when_parent_has_admin_ref` —
  THE canonical Haley production case: all 3 sessions True via
  depends_on (parent referenced by admin doc), prefix returns 2,
  composition picks 2.
- `test_depends_on_used_when_prefix_inconclusive` — when sessions
  lack reference fields entirely, prefix returns None and
  depends_on canonically filters.

Backend: 1058 → 1060 tests passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 20:55 Inactive
`probe` is a Python runtime alias for `element` (per services/class_aliases.py).
Many datasets — Francesconi (67f723d574f5f79c6062389d) is the canonical case
with 0 probe + 606 element + 3 probeType facets — emit no literal `probe`
documents at all, so `counts.probes` rendered 0 on the snapshot tile and
contradicted the catalog's probeTypes facet.

The fallback path in `_counts_from_raw` was already in place from the prior
arc. This commit:
  - tightens the log event name to `dataset_summary.probes_alias_resolved`
    and includes both raw + aliased values for observability
  - bumps SUMMARY_KEY_PREFIX v7 → v8 to invalidate stale entries cached
    before the fallback shipped (response SHAPE unchanged; only the
    `counts.probes` value can shift)
  - adds three regression tests pinning the three branches:
    literal probe non-zero → use literal
    literal probe zero + element non-zero → use element (alias hit)
    both zero → 0

Cache schema bump also requires updating the v7 literal in
test_user_cache_keys_are_isolated to v8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 21:23 Inactive
…etch

`/ndiquery` may return slim `{id, datasetId}` pairs (or bare id strings
under a single-dataset scope) instead of full doc bodies, depending on
search scope. The pre-F-7 code path read `body.get("documents")` and
fed those entries straight to `_extract_numeric(doc, "data.subject.weight_grams")`
— which silently returns None for slim refs (no `data.*`). Aggregations
against datasets where ndiquery returned slim refs would report
numeric_matches=0 even when valueField was present in every body.

Fix: after ndiquery returns, classify each entry. Full bodies pass
through (no-op fast path — zero extra cloud calls). Slim refs queue
for chunked `bulk_fetch` (BULK_FETCH_MAX=500 per call), grouped by
dataset_id since bulk_fetch is per-dataset. Bare id strings under a
single-dataset scope attribute to that scope; under any other scope
they're unattributable (bulk_fetch needs a dataset_id) and dropped
with a structured warning.

Concurrency bounded by MAX_CONCURRENT_BULK_FETCH=6, matching
summary_table_service. Two structured log events for observability:
  - aggregate_documents.hydrated_via_bulk_fetch (info; per-call summary)
  - aggregate_documents.bare_ids_dropped (warning; cloud-shape anomaly)

Numeric equivalence pinned by test: same fixture run as (a) full-body
inline and (b) slim-refs-then-bulk-fetch produces byte-equal
{count, mean, median, std, min, max} per group, identical
numeric_matches, identical datasets_contributing.

Tests (+6):
  - test_hydrates_slim_refs_via_bulk_fetch — bulk_fetch IS called when
    refs are slim
  - test_full_body_path_skips_bulk_fetch_no_op — happy path latency
    untouched
  - test_per_doc_vs_bulk_numeric_equivalence — the regression pin
  - test_hydration_chunks_at_bulk_fetch_max — 600 refs → 2 batches
    (500 + 100)
  - test_hydration_chunks_per_dataset — refs spanning 2 datasets fan
    out into 2 bulk_fetch calls
  - test_hydration_handles_bare_id_strings_under_single_dataset_scope

No cache schema bump (response shape unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 21:30 Inactive
The /tabular_query router exposes both GET and POST variants so the
cloud-app's POST proxy at /api/datasets/[id]/tabular-query can forward
its body 1:1 without GET-vs-POST translation. Both routes share the
same `_dispatch` helper and the same `TabularQueryBody` Pydantic
validator (which mirrors the GET query-param contract), so the
responses MUST be byte-identical for equivalent input.

The dispatcher refactor + body validator landed during the prior
arc — this commit pins the contract with integration tests:

Tests (+4 against the existing app_and_cloud fixture):
  - test_tabular_query_get_and_post_return_identical_shape — same
    full-param request (variableNameContains + groupBy + groupOrder)
    yields byte-equal response bodies.
  - test_tabular_query_get_and_post_handle_optional_params_identically
    — omitting optional params likewise produces equivalent output.
  - test_tabular_query_post_rejects_missing_variable_name — POST body
    validator surfaces the typed 400 VALIDATION_ERROR envelope.
  - test_tabular_query_get_rejects_missing_variable_name — GET path
    surfaces the same envelope.

The integration tests use the empty-ontology happy path
(`_install_empty_ontology_mocks`) so the assertions don't depend on
the cloud's real document corpus — both routes still exercise
`_dispatch` → `TabularQueryService.violin_groups` end-to-end, just
through the `_empty_response` branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 18, 2026 21:35 Inactive
Re-implements the S5.3 backend after the prior session's git reset
during agent collision discarded the ~400 LOC service code. Full
design preserved in apps/web/docs/reviews/2026-05-19b-post-handoff-execution.md
and re-implemented deterministically from that spec.

Backend changes:
- backend/services/tabular_query_service.py
  - MAX_PAIRS = 1000 cap (matches MAX_GROUPS × MAX_VALUES_PER_GROUP scale)
  - _TREATMENT_CLASS_CHAIN, _SUBJECT_KEY, _TREATMENT_LABEL_FIELDS constants
  - _find_matching_group(..., exclude_group_idx=) — Y-side cross-table
    search forces a DIFFERENT group than X (preserves violin semantics
    when exclude_group_idx=None)
  - cross_table_pairs orchestrator + _cross_table_pairs_subject +
    _cross_table_pairs_treatment + _build_treatment_subject_map
  - 9 module-level helpers: _empty_pairs_response, _index_of_group,
    _build_subject_value_map, _build_subject_group_map,
    _columns_for_pair_group_by, _inner_join_pairs,
    _inner_join_treatment_pairs, _order_pairs_by_group,
    _pick_treatment_label_for_needle
- backend/routers/tabular_query.py
  - CrossTableQueryBody Pydantic model (xVariableContains,
    yVariableContains, joinOn, groupBy?, groupOrder?)
  - POST /api/datasets/{dataset_id}/cross-table-query handler with
    503 envelope on CloudInternalError/CloudUnreachable/CloudTimeout
    (mirrors the violin path's discipline)

Tests (backend/tests/unit/test_tabular_query_service_cross_table.py):
- 52 new unit tests covering subject-join happy path, groupBy
  resolution (X-first-then-Y), groupOrder, treatment-join with
  auto-color fallback, treatment chain walking with last-write-wins,
  flaky single_class recovery, empty-state diagnostics, MAX_PAIRS
  cap, and the 9 helper functions individually.

Response contract matches the cloud-app's BackendCrossTableResponse
type at apps/web/lib/ndi/tools/cross-table-query.ts — cloud-app side
was already wired and waiting (handlers + scatter chart + panel
toggle all pushed prior).

CI: ruff clean, mypy --strict reports 4 PRE-EXISTING errors in
untouched files (zero new errors from this change), 1125 tests
pass (was 1060 + F-8 pin pre-arc).

Refs: 2026-05-19b-post-handoff-execution.md "S5.3 detail"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@railway-app railway-app Bot temporarily deployed to ndi-data-browser-v2 / experimental May 19, 2026 02:25 Inactive
Applies the F-1 stub diff preserved at
apps/web/docs/specs/2026-05-18-f1-stimulus-projection-stub.diff
(241 lines) which pins:
- /tables/stimulus_presentation column shape (the six STIMULUS_COLUMNS
  keys + row content from depends_on edge + presentations list)
- /tables/stimulus short-form resolves to stimulus_presentation via
  the _CLASS_ALIASES chain when the literal class returns 0 IDs
- Stream 5.8 pagination respected (page/pageSize) WITHOUT re-fanning
  the cloud (proves cache-once-slice-in-memory)

respx 0.23.1 fix: the original alias test used
`router.post("/ndiquery").mock(side_effect=_ndiquery)` with a 2-arg
callable signature. This hung indefinitely under pytest 9.x +
asyncio_mode=auto. Rewrote to use FIFO route ordering with two
chained `.respond()` calls (specific predicate via
`json__searchstructure__0__param1` first, generic fallback second)
— passes in 0.17s. Same end-state contract, no test debt.

CI: 1128 backend tests pass (was 1125 + the 3 new F-1 tests); ruff
clean. F-1 backend implementation has been live on Railway since
commit 0231851 — this commit only adds integration tests pinning the
contract.

Refs: 2026-05-19b-post-handoff-execution.md "F-1 detail"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant