From a07983e546c6961c19947bca58edbbdc3da2ac05 Mon Sep 17 00:00:00 2001 From: audriB Date: Wed, 13 May 2026 18:14:47 -0400 Subject: [PATCH 01/54] =?UTF-8?q?feat(ndi-python):=20Phase=20A=20=E2=80=94?= =?UTF-8?q?=20install=20NDI-python=20+=20wire=20VHSB/compression/ontology?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the NDI-python stack (vlt + did + ndr + vhlab-toolbox-python + ndi-compress) to the Railway image and wires three new capabilities into existing services: 1. VHSB text-tag decoding via vlt.file.vhsb_read — unblocks Haley + any other VHSB-formatted dataset in QuickPlot / signal-chart. Today every VHSB request returns the legacy "vlt library not available" soft error because the early-return prefix check on b"This " never falls through to a real decoder. 2. NDI-compressed binary detection + decompression via ndicompress.expand_* — handles the .nbf.tgz wrapper format. Today these silently fail in _parse_nbf. 3. NDI ontology fallback in OntologyService — fires when the existing external providers miss. NDI's bundled lookup knows lab-specific terms (NDIC, WBStrain, Cre lines) that public OLS providers don't. Image size cost: ~80-100 MB. We pip install with --no-deps and handpick the runtime deps in requirements.txt + pyproject.toml so matplotlib (~50-70 MB) and opencv (~80 MB) — both declared by DID-python + tutorials but never imported by our paths — stay out of the image. Strategy: ALL changes go through a new services/ndi_python_service.py that lazy-imports + degrades gracefully when the stack isn't installed (returns None on every call, callers fall through to their legacy paths). So even on a build where the NDI install fails, the public surface keeps working. Branch protection: this work targets a separate "experimental" Railway environment for byte-for-byte audit against production before any merge to main. See docs/plans/2026-05-13-railway-experimental-env-runbook.md for the dashboard-step walkthrough. Test plan: - [x] 19 new unit tests cover the dispatch logic with NDI uninstalled (CI doesn't pip-install NDI; tests use sys.modules mocking) - [x] All 44 existing binary tests + 535 other tests pass unchanged - [x] mypy strict mode clean (with ndi/vlt/ndicompress in missing-imports override matching the scipy + opentelemetry pattern) - [x] ruff lint clean - [ ] Build the experimental Railway env from this branch - [ ] Run apps/web/scripts/audit-public-api.mjs (Layer 1) + the e2e audit-public-pages spec (Layer 2 + 3) against production vs experimental — see plan doc for the workflow Co-Authored-By: Claude Opus 4.7 (1M context) --- backend/pyproject.toml | 28 +- backend/requirements.txt | 16 +- backend/services/binary_service.py | 47 ++- backend/services/ndi_python_service.py | 335 +++++++++++++++++ backend/services/ontology_service.py | 48 ++- backend/tests/unit/test_ndi_python_service.py | 228 ++++++++++++ .../2026-05-13-ndi-python-integration.md | 340 ++++++++++++++++++ ...-05-13-railway-experimental-env-runbook.md | 141 ++++++++ infra/Dockerfile | 20 ++ 9 files changed, 1189 insertions(+), 14 deletions(-) create mode 100644 backend/services/ndi_python_service.py create mode 100644 backend/tests/unit/test_ndi_python_service.py create mode 100644 docs/plans/2026-05-13-ndi-python-integration.md create mode 100644 docs/plans/2026-05-13-railway-experimental-env-runbook.md diff --git a/backend/pyproject.toml b/backend/pyproject.toml index a4bc8b6..308e3ec 100644 --- a/backend/pyproject.toml +++ b/backend/pyproject.toml @@ -19,10 +19,19 @@ dependencies = [ "Pillow>=10.4.0", "numpy>=2.0.0", "scipy>=1.14.0", - # pandas was declared but never imported anywhere in backend/. Removed - # in audit 2026-04-23 (#58) — it was a ~30 MB wheel for no runtime - # use. If a future feature needs it, re-add with a comment pointing - # at the specific caller. + # pandas restored in Phase A of the NDI-python integration + # (2026-05-13) — required because vlt.file and ndi/__init__ both + # eagerly pull pandas via their submodule chains. Without it, the + # NDI stack imports fail at backend startup. + "pandas>=2.0.0", + # Direct runtime deps of the NDI-python stack (installed via + # --no-deps in infra/Dockerfile to skip matplotlib + opencv). + "networkx>=2.6", + "jsonschema>=4.0.0", + "requests>=2.28.0", + "openMINDS>=0.2.0", + "portalocker>=2.0.0", + "h5py>=3.0.0", ] [project.optional-dependencies] @@ -123,6 +132,17 @@ module = [ "scipy.*", "opentelemetry", "opentelemetry.*", + # NDI-python stack — installed via git+https in the Docker image. Locally + # (and in CI before pip install) the modules aren't on PYTHONPATH so mypy + # can't find their stubs. The wrappers in services/ndi_python_service.py + # lazy-import these and surface a typed None on import failure, so the + # "missing-imports" diagnostic is a false positive in our usage shape. + "ndi", + "ndi.*", + "ndicompress", + "ndicompress.*", + "vlt", + "vlt.*", ] ignore_missing_imports = true diff --git a/backend/requirements.txt b/backend/requirements.txt index f80a358..b2d85b4 100644 --- a/backend/requirements.txt +++ b/backend/requirements.txt @@ -12,4 +12,18 @@ python-multipart>=0.0.9 Pillow>=10.4.0 numpy>=2.0.0 scipy>=1.14.0 -# pandas removed in audit 2026-04-23 (#58) — never imported. +# pandas was previously removed in audit 2026-04-23 (#58). RESTORED in Phase A +# of the NDI-python integration: `vlt.file.__init__` and `ndi/__init__` both +# eagerly import pandas via their submodule chains, so the NDI stack won't +# import without it. Marginal image cost ~30 MB; runtime cost ~0 (lazy paths). +pandas>=2.0.0 +# Direct deps of NDI-python and its git kin, hand-listed so the +# `--no-deps` installs in the Dockerfile have everything they need at +# runtime. Skipping matplotlib (~50-70 MB) and opencv-python-headless +# (~80 MB) which are declared but never imported on our paths. +networkx>=2.6 +jsonschema>=4.0.0 +requests>=2.28.0 +openMINDS>=0.2.0 +portalocker>=2.0.0 +h5py>=3.0.0 diff --git a/backend/services/binary_service.py b/backend/services/binary_service.py index 6e221a8..f60f18c 100644 --- a/backend/services/binary_service.py +++ b/backend/services/binary_service.py @@ -156,22 +156,53 @@ async def get_timeseries( # noqa: PLR0911 return _timeseries_error("download", f"Failed to download file: {e}") name = (ref.filename or "").lower() - # VH-Lab's VHSB files use a text metadata header ("This is a VHSB file, - # http://github.com/VH-Lab") followed by typed binary slots. The v1 - # decoder used the DID-python `vlt` library for this. v2 doesn't bundle - # vlt today; we surface the same "vlt library not available" soft error - # the v1 TimeseriesChart already maps to a friendly message. + # Decoder dispatch order matters. We try the cheapest discriminators + # first and fall through: + # 1. NDI-compressed wrappers (gzip magic 0x1f 0x8b) → ndicompress + # 2. VHSB text-tag header ("This is a VHSB file, ...") → vlt + # 3. VHSB binary-magic (synthetic; rare/dead in practice) → inline _parse_vhsb + # 4. Everything else → inline _parse_nbf (the .nbf raw-binary case) + # + # Phase A swap (2026-05-13): paths #1 and #2 are NEW. Before, both + # short-circuited to soft errors because vlt and ndicompress were + # not installed. Paths #3 and #4 are unchanged. The audit at + # `apps/web/scripts/audit-public-api.mjs` exists to prove that + # the NBF (path #4) byte-shape is byte-identical before/after. + from backend.services import ndi_python_service as _ndi + + # Path 1: NDI-compressed binary (.nbf.tgz wrapper) + if _ndi.is_ndi_compressed(payload): + shaped = _ndi.expand_ephys_from_bytes(payload) + if shaped is not None: + return shaped + # Fall through to soft error below if expansion failed. + return _timeseries_error( + "ndi_compressed_failed", + "This file looks NDI-compressed (.nbf.tgz) but the decoder " + "could not expand it. Format may be a non-Ephys variant.", + ) + + # Path 2: VHSB with the ASCII text-tag header head = payload[:5] if len(payload) >= 5 else b"" if head.startswith(b"This "): + shaped = _ndi.read_vhsb_from_bytes(payload) + if shaped is not None: + return shaped + # NDI stack unavailable, or the file is malformed. Preserve the + # legacy soft-error code so the v1 TimeseriesChart's friendly + # message still surfaces if NDI-python isn't loaded. return _timeseries_error( "vlt_library", - "vlt library is not available on this server — full VHSB " - "decoding requires the DID-python `vlt` extension. The raw " - "file is available in the document's Files section.", + "VHSB decoder unavailable on this server. The raw file is " + "still available in the document's Files section.", ) + try: + # Path 3: synthetic binary-magic VHSB. In practice rare; kept + # so test fixtures continue to work. if name.endswith(".vhsb") or (payload[:4] == b"VHSB"): return _parse_vhsb(payload) + # Path 4: raw .nbf. Byte-shape under audit; do not modify. return _parse_nbf(payload) except Exception as e: log.warning("binary.decode_failed", kind="timeseries", error=str(e)) diff --git a/backend/services/ndi_python_service.py b/backend/services/ndi_python_service.py new file mode 100644 index 0000000..26d55cb --- /dev/null +++ b/backend/services/ndi_python_service.py @@ -0,0 +1,335 @@ +"""ndi_python_service — thin wrappers over the three NDI-python entry points +we use in Phase A. + +Why a separate service? Two reasons: + +1. **Centralized lazy imports.** NDI-python (~150 MB resident if everything + loaded eagerly) is gated behind module-level functions that import on first + call. The rest of the backend doesn't pay the import cost until something + actually exercises an NDI path. + +2. **Consistent error envelope.** Every call returns either a typed Python + value (numpy array / dict) on success, or `None` (or a sentinel) on a + recoverable miss. None of these raise on miss — that's the contract our + callers in `binary_service` and `ontology_service` rely on so they can + fall through to their existing inline / external paths. + +The three entry points are documented in: +`docs/plans/2026-05-13-ndi-python-integration.md`. + +Phase B may layer a real `ndi.dataset.Dataset` here. Phase A intentionally +operates only on byte payloads + ID strings, no Dataset object. +""" + +from __future__ import annotations + +import contextlib +import tempfile +from pathlib import Path +from typing import Any, Literal + +from ..observability.logging import get_logger + +log = get_logger(__name__) + +# Lightweight import guard. We don't want the *import* of this module to +# pull in pandas / numpy / etc. — that's deferred to first call. The flag +# below caches the result of the first import attempt so subsequent calls +# pay nothing extra. `None` = not-yet-tried; `True` = imported OK; +# `False` = import failed (NDI stack not available; callers fall back). +_NDI_AVAILABLE: bool | None = None + + +def is_ndi_available() -> bool: + """Best-effort check that the NDI-python stack is importable. Caches + the result so health checks + first-request paths don't pay the import + cost more than once.""" + global _NDI_AVAILABLE # noqa: PLW0603 — module-level cache flag + if _NDI_AVAILABLE is not None: + return _NDI_AVAILABLE + try: + # We probe one module from each git-sourced package to make sure + # the full transitive surface is on PYTHONPATH. Errors here at + # boot time become clear startup failures rather than mysterious + # first-request 500s. + import ndi.ontology # noqa: F401 + import ndicompress # noqa: F401 + import vlt.file.custom_file_formats # noqa: F401 + _NDI_AVAILABLE = True + except ImportError as e: + log.warning("ndi_python_service.import_failed", error=str(e)) + _NDI_AVAILABLE = False + return _NDI_AVAILABLE + + +# --------------------------------------------------------------------------- +# VHSB — vlt.file.custom_file_formats.vhsb_read +# --------------------------------------------------------------------------- +# +# Important contract from the Phase A recon (see plan doc): +# - vhsb_read takes a FILE PATH (str), not bytes / BytesIO +# - It internally reopens the file with `open(filename, 'rb')` +# - There is ONLY ONE VHSB format. It always begins with a 200-byte +# ASCII tag ("This is a VHSB file, http://github.com/VH-Lab\n" zero- +# padded) followed by a 1836-byte binary header, then payload. +# - Returns `(y, x)` — numpy arrays of values and time-axis samples. +# +# We materialize the payload bytes to a NamedTemporaryFile, call +# vhsb_read, then unlink. The 200-byte text tag is what current +# binary_service.py treats as the `vlt_library` early-return — the +# whole point of Phase A is to actually decode it instead. + + +def read_vhsb_from_bytes( + payload: bytes, +) -> dict[str, Any] | None: + """Decode a VHSB binary payload via vlt.file. + + Returns a dict matching `binary_service._ts_shape_single_channel`'s + envelope on success (so callers can drop it directly into a + timeseries response), or `None` on failure so the caller can fall + back to inline parsing or surface a typed error. + + No raise; all failures log + return None. + """ + if not is_ndi_available(): + return None + if not payload or len(payload) < 2036: + # Minimum: 200 byte text-tag + 1836 byte header. Smaller payloads + # cannot possibly be valid VHSB. + return None + + tmp_path = None + try: + # vhsb_read needs a real on-disk path. Suffix matters: the helper + # doesn't sniff the file type from extension, but downstream + # logging is clearer if we keep it. + with tempfile.NamedTemporaryFile( + delete=False, suffix=".vhsb", prefix="ndb_vhsb_" + ) as fh: + fh.write(payload) + tmp_path = fh.name + + # Lazy-import inside the function so the import cost is paid only + # on the first VHSB decode (or never, if no one ever hits this). + import numpy as np + from vlt.file.custom_file_formats import vhsb_read, vhsb_readheader + + header = vhsb_readheader(tmp_path) + n_samples = int(header.get("num_samples", 0)) + if n_samples <= 0: + log.warning("vhsb_read.bad_header", header=header) + return None + + y, x = vhsb_read(tmp_path, 0, n_samples) + if y is None or len(y) == 0: + log.warning("vhsb_read.empty_payload", n_samples=n_samples) + return None + + # Translate to the existing envelope shape. y is the value array + # (possibly multi-dim if Y_dim > 1), x is the time axis. We + # flatten y to a single channel for now — multi-channel VHSB + # support is a future enhancement (binary_service's envelope + # naturally supports it, but the demo datasets are all 1-D). + sample_rate = float(header.get("X_increment", 0.0)) + # X_increment is seconds-per-sample. Convert to Hz, guarding + # against zero. + sample_rate_hz = (1.0 / sample_rate) if sample_rate > 0 else 0.0 + + # Flatten to 1-D if vhsb_read returned (N, 1) or (N,). + values = np.asarray(y).reshape(-1).astype(np.float32, copy=False) + + return { + "channels": {"ch0": _nan_to_none(values.tolist())}, + "timestamps": np.asarray(x).reshape(-1).astype(np.float64, copy=False).tolist(), + "sample_count": int(values.size), + "format": "vhsb", + "sample_rate_hz": sample_rate_hz, + "error": None, + } + except Exception as e: + # vhsb_read raises on type mismatch / bad sizes; treat all as soft + # errors so callers can fall back. + log.warning("vhsb_read.failed", error=str(e), error_type=type(e).__name__) + return None + finally: + if tmp_path is not None: + with contextlib.suppress(OSError): + Path(tmp_path).unlink(missing_ok=True) + + +def _nan_to_none(values: list[float]) -> list[float | None]: + """Replace NaN with None so the frontend's uPlot sees explicit gaps + rather than rendering through NaN-poisoned line segments. Matches the + `_to_nullable_list` convention in binary_service.""" + import math + out: list[float | None] = [] + for v in values: + if isinstance(v, float) and math.isnan(v): + out.append(None) + else: + out.append(float(v)) + return out + + +# --------------------------------------------------------------------------- +# NDI-compressed binaries — ndicompress.expand_* +# --------------------------------------------------------------------------- +# +# Phase A scope: detect + decompress only. Like vhsb_read, ndicompress +# operates on file paths (subprocess-based, wraps platform-specific C +# executables). Magic byte detection: +# - Outer wrapper is gzipped tar (.nbf.tgz) +# - One inner file has the extension `.nbh` and starts with the +# 15-byte ASCII string `b"NDIBINARYHEADER"` +# +# encode_method dispatch (per the recon): +# 1 = Ephys (analog input/output) — most common for us +# 2 = Metadata (JSON-like; rarely shown as timeseries) +# 21 = Digital (uint8 0/1 channels) +# 41 = EventMarkText (sparse markers) +# 61 = Time (time-only data; used as derived axis) +# +# We only auto-handle method 1 (Ephys) on the timeseries path; the +# others surface a typed soft-error and fall through to the existing +# code (which already handles raw .nbf). + + +def is_ndi_compressed(payload: bytes) -> bool: + """Cheap prefix check for NDI's `.nbf.tgz` wrapper. + + Doesn't validate the inner contents — that's the job of the actual + expand call. False positives here are fine because the expand path + will fail gracefully and the caller will fall back to inline parsing. + + A gzipped tar archive begins with two bytes `0x1f 0x8b` (gzip magic). + We don't fingerprint deeper than that — every gzip stream we'd see + in this context is going to be either NDI-compressed or something + legitimately broken; in either case the expand call will tell us. + """ + return len(payload) >= 2 and payload[0] == 0x1F and payload[1] == 0x8B + + +def expand_ephys_from_bytes(payload: bytes) -> dict[str, Any] | None: + """Decode an NDI-compressed Ephys payload (encode_method=1). + + Returns the same envelope shape as `read_vhsb_from_bytes` so it's a + drop-in replacement in `BinaryService.get_timeseries`. Multi-channel + ephys becomes `{"ch0": [...], "ch1": [...], ...}`. + + None on miss / wrong codec / errors. No raise. + """ + if not is_ndi_available(): + return None + if not is_ndi_compressed(payload): + return None + + tmp_path = None + try: + with tempfile.NamedTemporaryFile( + delete=False, suffix=".nbf.tgz", prefix="ndb_ndic_" + ) as fh: + fh.write(payload) + tmp_path = fh.name + + import ndicompress + import numpy as np + + # ndicompress.expand_ephys returns (np.ndarray[S, C], None). + arr, _ = ndicompress.expand_ephys(tmp_path) + if arr is None or arr.size == 0: + return None + + # Shape: (n_samples, n_channels). We don't have a sample rate + # from ndicompress's return (yet — the .nbh header has it but + # the wrapper doesn't surface it). Caller can post-process with + # the document's metadata if needed. + n_samples, n_channels = arr.shape if arr.ndim == 2 else (arr.size, 1) + if arr.ndim == 1: + arr = arr.reshape(-1, 1) + + channels: dict[str, list[float | None]] = {} + for c in range(n_channels): + channels[f"ch{c}"] = _nan_to_none(arr[:, c].astype(np.float32).tolist()) + + return { + "channels": channels, + "timestamps": list(range(n_samples)), # sample-index axis; caller may rescale + "sample_count": int(n_samples), + "format": "nbf_compressed", + "sample_rate_hz": 0.0, # unknown without sidecar metadata + "error": None, + } + except Exception as e: + log.warning( + "ndicompress.expand_ephys.failed", + error=str(e), + error_type=type(e).__name__, + ) + return None + finally: + if tmp_path is not None: + with contextlib.suppress(OSError): + Path(tmp_path).unlink(missing_ok=True) + + +# --------------------------------------------------------------------------- +# Ontology — ndi.ontology.lookup +# --------------------------------------------------------------------------- +# +# Phase A's ontology contribution: when our existing external-provider +# lookup misses, fall back to NDI's. NDI's `lookup` knows lab-specific +# terms (WBStrain, Cre lines, internal NDIC identifiers) that public +# providers don't. +# +# Critical contract: +# - Input is a single CURIE string (`"WBStrain:00000001"`, `"CL:0000540"`) +# - Output is an OntologyResult with truthy-on-hit, falsy-on-miss +# - Never raises (provider errors swallowed internally) +# - Has a small module-level FIFO cache (~100 entries) +# - Most non-NDIC prefixes hit OLS4 (EBI) via `requests.get`, 30s timeout +# +# We re-cache results in our own redis-backed `ontology_cache` so a hit +# survives process restart. NDI's internal cache is per-process only. + + +_OntologyLookupKind = Literal["hit", "miss"] + + +def lookup_ontology(curie: str) -> dict[str, Any] | None: + """Resolve an ontology CURIE via NDI-python's ontology service. + + Returns the OntologyResult's `.to_dict()` on hit, `None` on miss + (incl. malformed input, unknown prefix, provider error — all silent + in NDI's implementation, surfaced as None here). + + Callers in `ontology_service.py` should use this as a FALLBACK after + their existing external-provider lookup misses — NOT as the primary + path (NDI's lookup hits the same OLS4 endpoints for many ontologies, + so duplication would double network traffic for hits we'd see anyway). + """ + if not is_ndi_available(): + return None + if not curie or ":" not in curie: + return None + + try: + from ndi.ontology import lookup + result = lookup(curie) + if not result: # OntologyResult __bool__ returns True only on hit + return None + # to_dict() yields {id, name, prefix, definition, synonyms, short_name}. + # mypy sees `lookup` as `Any` (NDI-python has no stubs), so the cast + # is needed to keep the function's declared return type honest under + # strict mode. + result_dict: dict[str, Any] = dict(result.to_dict()) + return result_dict + except Exception as e: + # NDI's lookup is documented to never raise on misses, but defensive: + log.warning( + "ndi.ontology.lookup.failed", + curie=curie, + error=str(e), + error_type=type(e).__name__, + ) + return None diff --git a/backend/services/ontology_service.py b/backend/services/ontology_service.py index e4f4f02..3f0f2dc 100644 --- a/backend/services/ontology_service.py +++ b/backend/services/ontology_service.py @@ -44,14 +44,60 @@ async def lookup(self, term: str) -> OntologyTerm: cached = self.cache.get(provider, term_id) if cached is not None: return cached + fetched: OntologyTerm | None = None try: fetched = await self._fetch_from_provider(provider, term_id) except Exception as e: log.warning("ontology.fetch_failed", provider=provider, term_id=term_id, error=str(e)) - raise OntologyLookupFailed(f"Could not look up {term}") from e + # Don't raise yet — fall through to the NDI-python fallback, which + # knows lab-specific terms (NDIC, WBStrain, internal Cre lines) + # the existing providers may miss. + + # NDI-python fallback: only fire when existing path didn't yield a + # usable record (stub with no label/definition, OR raised above). + # This is a Phase A addition (2026-05-13) — see plan doc. Wrapped in + # to_thread because ndi.ontology.lookup is sync and uses `requests` + # internally, which would block the event loop if called directly. + if fetched is None or (not fetched.label and not fetched.definition): + ndi_term = await self._try_ndi_fallback(term, provider, term_id) + if ndi_term is not None: + self.cache.set(ndi_term) + return ndi_term + + if fetched is None: + raise OntologyLookupFailed(f"Could not look up {term}") self.cache.set(fetched) return fetched + async def _try_ndi_fallback( + self, term: str, provider: str, term_id: str, + ) -> OntologyTerm | None: + """Probe NDI-python's bundled ontology lookup. Returns None on miss + (incl. NDI stack not installed, malformed input, unknown prefix). + + NDI's lookup hits the same OLS4 endpoints we do for many ontologies, + but it ALSO ships a local CSV for NDIC and has hand-curated providers + for WBStrain and a few others — that's where the additional hits + come from. Net: this fallback rarely fires but catches the long tail.""" + try: + from .ndi_python_service import lookup_ontology + result = await asyncio.to_thread(lookup_ontology, term) + except Exception as e: + log.warning("ontology.ndi_fallback_failed", term=term, error=str(e)) + return None + if result is None: + return None + # NDI's `.to_dict()` shape: {id, name, prefix, definition, synonyms, short_name}. + # Map onto our OntologyTerm. We preserve the original PROVIDER (case + # as it was passed in) so the cache key matches what the caller asked for. + return OntologyTerm( + provider=provider, + term_id=term_id, + label=result.get("name") or None, + definition=result.get("definition") or None, + url=None, + ) + async def batch_lookup(self, terms: list[str]) -> list[OntologyTerm]: unique = list(dict.fromkeys(t for t in terms if t)) results = await asyncio.gather(*[self._safe_lookup(t) for t in unique]) diff --git a/backend/tests/unit/test_ndi_python_service.py b/backend/tests/unit/test_ndi_python_service.py new file mode 100644 index 0000000..57eb0ba --- /dev/null +++ b/backend/tests/unit/test_ndi_python_service.py @@ -0,0 +1,228 @@ +"""Unit tests for the NDI-python service wrappers. + +These tests don't require the NDI-python stack to be installed (CI may +not have it). The service is designed to degrade gracefully when the +imports fail — and that's exactly what these tests pin down. When the +stack IS available, additional integration tests in the experimental +Railway env will exercise the real decoder paths against the production +Haley / Dabrowska binaries. +""" + +from __future__ import annotations + +from unittest.mock import patch + +import pytest + +from backend.services import ndi_python_service + + +@pytest.fixture(autouse=True) +def reset_available_cache(): + """Ensure each test starts with a fresh NDI-availability probe. + + The service caches the result of its first import attempt to avoid + re-paying the cost; tests need to clear that cache so they can + independently force-on or force-off the stack. + """ + ndi_python_service._NDI_AVAILABLE = None + yield + ndi_python_service._NDI_AVAILABLE = None + + +# --------------------------------------------------------------------------- +# is_ndi_compressed — pure byte-prefix check, no NDI dependency +# --------------------------------------------------------------------------- + + +class TestIsNdiCompressed: + def test_detects_gzip_magic(self): + assert ndi_python_service.is_ndi_compressed(b"\x1f\x8b\x08\x00") is True + + def test_rejects_short_payload(self): + assert ndi_python_service.is_ndi_compressed(b"") is False + assert ndi_python_service.is_ndi_compressed(b"\x1f") is False + + def test_rejects_non_gzip_payloads(self): + assert ndi_python_service.is_ndi_compressed(b"VHSB") is False + assert ndi_python_service.is_ndi_compressed(b"This is a VHSB file") is False + assert ndi_python_service.is_ndi_compressed(b"\x00\x00\x00\x00") is False + + def test_only_inspects_first_two_bytes(self): + # Gzip-magic prefix followed by garbage. Detection passes; the + # downstream expand call would surface the real format issue. + assert ndi_python_service.is_ndi_compressed(b"\x1f\x8b" + b"junk" * 100) is True + + +# --------------------------------------------------------------------------- +# read_vhsb_from_bytes — graceful degradation when NDI unavailable +# --------------------------------------------------------------------------- + + +class TestReadVhsbFromBytes: + def test_returns_none_when_ndi_unavailable(self): + ndi_python_service._NDI_AVAILABLE = False + result = ndi_python_service.read_vhsb_from_bytes(b"This is a VHSB file" + b"\x00" * 2100) + assert result is None + + def test_returns_none_on_short_payload(self): + ndi_python_service._NDI_AVAILABLE = True + # Minimum VHSB payload is 200 (text tag) + 1836 (header) = 2036 bytes + result = ndi_python_service.read_vhsb_from_bytes(b"This is a VHSB file") + assert result is None + + def test_returns_none_on_empty_payload(self): + ndi_python_service._NDI_AVAILABLE = True + assert ndi_python_service.read_vhsb_from_bytes(b"") is None + + def test_returns_none_when_vhsb_read_raises(self): + """When the real vlt call raises (malformed payload, etc.), we + swallow and return None so callers can fall through to their + legacy soft-error path. No exception escapes the service.""" + ndi_python_service._NDI_AVAILABLE = True + with patch.dict( + "sys.modules", + {"vlt.file.custom_file_formats": None}, + ): + # Module-set-to-None forces ImportError on `from vlt.file...` + result = ndi_python_service.read_vhsb_from_bytes(b"x" * 3000) + assert result is None + + +# --------------------------------------------------------------------------- +# expand_ephys_from_bytes — graceful degradation +# --------------------------------------------------------------------------- + + +class TestExpandEphysFromBytes: + def test_returns_none_when_ndi_unavailable(self): + ndi_python_service._NDI_AVAILABLE = False + result = ndi_python_service.expand_ephys_from_bytes(b"\x1f\x8b" + b"x" * 100) + assert result is None + + def test_returns_none_on_non_compressed_payload(self): + # Caller is supposed to gate on is_ndi_compressed first, but the + # wrapper double-checks defensively. + ndi_python_service._NDI_AVAILABLE = True + result = ndi_python_service.expand_ephys_from_bytes(b"VHSB" + b"x" * 100) + assert result is None + + def test_returns_none_when_ndicompress_fails(self): + ndi_python_service._NDI_AVAILABLE = True + with patch.dict("sys.modules", {"ndicompress": None}): + result = ndi_python_service.expand_ephys_from_bytes(b"\x1f\x8b" + b"x" * 100) + assert result is None + + +# --------------------------------------------------------------------------- +# lookup_ontology — never raises, returns None on miss +# --------------------------------------------------------------------------- + + +class TestLookupOntology: + def test_returns_none_on_malformed_curie(self): + # No `:` separator → not a CURIE → don't even probe. + assert ndi_python_service.lookup_ontology("WBStrain00000001") is None + + def test_returns_none_on_empty_input(self): + assert ndi_python_service.lookup_ontology("") is None + + def test_returns_none_when_ndi_unavailable(self): + ndi_python_service._NDI_AVAILABLE = False + result = ndi_python_service.lookup_ontology("CL:0000540") + assert result is None + + def test_returns_none_on_ndi_miss(self): + """NDI's lookup is documented to never raise — it returns a + falsy OntologyResult on miss. Make sure we surface None upward, + not an empty dict.""" + + class _FakeResult: + id = "" + name = "" + + def __bool__(self): + return False + + def to_dict(self): + return {} + + ndi_python_service._NDI_AVAILABLE = True + # ndi isn't installed in the test env, so we inject a fake module + # via sys.modules. The wrapper imports lazily via `from ndi.ontology + # import lookup` so monkey-patching sys.modules is the cleanest way. + fake_module = type("M", (), {"lookup": lambda _curie: _FakeResult()}) + with patch.dict("sys.modules", {"ndi.ontology": fake_module}): + result = ndi_python_service.lookup_ontology("CL:0000540") + assert result is None + + def test_returns_dict_on_ndi_hit(self): + class _FakeResult: + id = "0000540" + name = "T cell" + prefix = "CL" + definition = "Mature T cell." + synonyms = [] + short_name = "T cell" + + def __bool__(self): + return True + + def to_dict(self): + return { + "id": self.id, + "name": self.name, + "prefix": self.prefix, + "definition": self.definition, + "synonyms": self.synonyms, + "short_name": self.short_name, + } + + ndi_python_service._NDI_AVAILABLE = True + fake_module = type("M", (), {"lookup": lambda _curie: _FakeResult()}) + with patch.dict("sys.modules", {"ndi.ontology": fake_module}): + result = ndi_python_service.lookup_ontology("CL:0000540") + assert result is not None + assert result["name"] == "T cell" + assert result["prefix"] == "CL" + + def test_swallows_ndi_exception(self): + """Defensive: even though NDI is documented not to raise, if it + does, we swallow + return None so callers don't see exceptions.""" + ndi_python_service._NDI_AVAILABLE = True + + def _boom(_curie): + raise RuntimeError("boom") + + fake_module = type("M", (), {"lookup": _boom}) + with patch.dict("sys.modules", {"ndi.ontology": fake_module}): + result = ndi_python_service.lookup_ontology("CL:0000540") + assert result is None + + +# --------------------------------------------------------------------------- +# is_ndi_available — caches result, doesn't crash on missing imports +# --------------------------------------------------------------------------- + + +class TestIsNdiAvailable: + def test_caches_first_result(self): + ndi_python_service._NDI_AVAILABLE = True + # Without resetting, subsequent calls should not re-import. + assert ndi_python_service.is_ndi_available() is True + ndi_python_service._NDI_AVAILABLE = False + assert ndi_python_service.is_ndi_available() is False + + def test_returns_false_when_imports_fail(self): + ndi_python_service._NDI_AVAILABLE = None + with patch.dict( + "sys.modules", + { + "vlt.file.custom_file_formats": None, + "ndicompress": None, + "ndi.ontology": None, + }, + ): + assert ndi_python_service.is_ndi_available() is False + # And the cache survives: + assert ndi_python_service._NDI_AVAILABLE is False diff --git a/docs/plans/2026-05-13-ndi-python-integration.md b/docs/plans/2026-05-13-ndi-python-integration.md new file mode 100644 index 0000000..cc2fe48 --- /dev/null +++ b/docs/plans/2026-05-13-ndi-python-integration.md @@ -0,0 +1,340 @@ +# NDI-python integration plan — backend signal/edit layer + +**Status:** Draft for user review. **No backend code has been written yet.** +**Audience:** Audri. Companion to `ndi-cloud-app/apps/web/docs/specs/2026-05-13-ask-checkpoint-pre-compact.md`. +**Author:** Claude Sonnet 4.5 (1M context), 2026-05-13. + +## TL;DR + +Three phases, escalating risk. All work happens on `ndi-data-browser-v2` (the Railway FastAPI). The `feat/experimental-ask-chat` branch in `ndi-cloud-app` is **NOT** touched — it stays draft + DO NOT MERGE. The benefit chain is bottom-up: every phase that lands on ndb-v2 main flows automatically to (a) the live Document Explorer / QuickPlot, (b) the experimental Ask chat preview, and (c) the upcoming Data Browser product. + +1. **Phase A — vlt-only install (~1 day, low risk).** Add `vhlab-toolbox-python` to the Railway image (which pulls `vlt`), `apt-get install -y git` so pip can fetch the git-sourced source, and ~10 LOC in `binary_service.py` to call `vlt.file` instead of returning the `"vlt library not available"` soft error. **Unlocks Haley VHSB position-trace plotting — immediately benefits the live Document Explorer's QuickPlot for every VHSB dataset, and unblocks the Ask chat's chart prompt for Haley.** No architectural change; existing routes and tests untouched. +2. **Phase B — replace inline parsers with `database_openbinarydoc` (~1 week, medium-high risk).** Install full NDI-python (which pulls `did`, `ndr`, `vhlab-toolbox-python`, `ndi-compress`). Add a startup/cron job to call `ndi.cloud.orchestration.downloadDataset(dataset_id, /data/ndi/{id})` against a Railway persistent volume. Refactor `BinaryService.get_timeseries` to call `dataset.database_openbinarydoc(doc, filename)`. Feature-flag the swap for an A/B week; rollback = flag flip. +3. **Phase C — new rich endpoints (~1-2 weeks, low risk because additive).** New routes that *only* exist because we have NDI-python: `POST /api/datasets/:id/ndiquery` (Mongo-style structured queries via `ndi.query.Query`), `POST /api/datasets/:id/documents/:docId/edit` (auth-gated, foundation for the Data Browser product), and `GET /api/datasets/:id/elements/:elementId/native` (`ndi.element`-backed). Existing routes unchanged. + +**Phase A is the only phase that should ship before Audri reviews this spec.** Phases B & C need design buy-in on the cache + volume strategy (and Phase C scope). + +## Pre-flight state (2026-05-13, ~21:20 UTC) + +- PR #111 merged to ndb-v2 main (commit `c5b02884`) — Railway auto-deploying the `?file=` param fix +- Ask RAG index re-baked with `binarySignalExample` sidecar (staging v3 → production, atomic) +- ndi-cloud-app `feat/experimental-ask-chat` branch (PR #160, draft) is "demo-ready" for the NBF chart path; Haley VHSB still soft-errors +- 8 published NDI Commons datasets: 3 of them are tutorial-having (Bhar, Haley, Dabrowska); Haley is VHSB-formatted + +## Day 0 spike findings (research-only, no code) + +### F1. The chatbots are general-purpose by design — NDI-python integration is exactly the new capability the Ask chat (and the data browser) needs + +`vh-lab-chatbot` and `shrek-lab-chatbot` **don't import NDI-python.** They're general-purpose lab-document RAG systems (PDF / HTML / xlsx → pgvector + Voyage + Claude). That's deliberate. The Ask chat in `ndi-cloud-app` is the first product that *will* use NDI-python — both for richer chatbot answers (plotting, provenance walks, structured queries) AND to power richer data-browser interactions (public QuickPlot expansion, private dataset editing). The canonical "how to use NDI-python with cloud datasets" reference therefore comes from NDI-python's own surface: **`src/ndi/cloud/` package + `tests/test_cloud_*.py`** suite + the published tutorials. + +### F2. Cloud connectivity = `ndi.cloud.orchestration.downloadDataset` + +There is **no `ndi.cloud.Dataset(dataset_id)` lazy constructor**. The entry point is: + +```python +from ndi.cloud.orchestration import downloadDataset +from ndi.cloud.client import CloudClient + +dataset = downloadDataset( + cloud_dataset_id="682e7772cdf3f24938176fac", # Haley + target_folder="/data/ndi/682e7772cdf3f24938176fac", + sync_files=False, # binaries lazy + client=CloudClient.from_env(), +) +``` + +It performs (per `NDI-python/src/ndi/cloud/orchestration.py:23-186`): + +1. Eagerly downloads ALL JSON documents from Mongo (chunked bulk ZIPs) — **multi-minute for real datasets** (Haley = 78K docs, ~16 GB; Carbon-fiber test = 743 docs, ~9.7 GB) +2. Rewrites each binary-file `location` in document properties to an `ndic://{dataset_id}/{file_uid}` URI +3. Materializes a local `ndi_dataset_dir` under `target_folder/{cloud_dataset_id}` +4. Stashes the authenticated `CloudClient` on the returned object as `dataset.cloud_client` + +After that, binary files materialize **lazily** on the first `database_openbinarydoc(doc, filename)` call via presigned S3 URLs (`session/session_base.py:553-628` → `cloud/filehandler.py:121-177`). No `boto3` — direct `requests.get(url, stream=True)`. + +**Implication:** `downloadDataset` cannot be a per-request operation. It has to run at startup or on a cron, against a persistent volume. + +### F3. `vlt` is provided by `vhlab-toolbox-python` + +Verified at `NDI-python/src/ndi/check.py:72-74`: + +```python +# vhlab-toolbox-python +ok, detail = _try_import("vlt") +check("vhlab-toolbox-python (vlt)", ok, detail) +``` + +The git-pin is `vhlab-toolbox-python @ git+https://github.com/VH-Lab/vhlab-toolbox-python.git@main` (from `NDI-python/pyproject.toml:39`). Installing it alone gives us `vlt` without pulling all of NDI-python. + +### F4. NDI-python git-sourced deps need `git` in the Docker image + +NDI-python's four git-sourced pip deps (`did`, `ndr`, `vhlab-toolbox-python`, `ndi-compress`) need `git` available at install time. Current `infra/Dockerfile` does NOT install git (only `libjpeg62-turbo libtiff6 curl`). One `apt-get install` line fixes it. + +### F5. Current `binary_service.py` has an early-return for text-VHSB + +`backend/services/binary_service.py:164-184` (post-PR-#111): + +```python +head = payload[:5] if len(payload) >= 5 else b"" +if head.startswith(b"This "): + return _timeseries_error( + "vlt_library", + "vlt library is not available on this server — full VHSB " + "decoding requires the DID-python `vlt` extension. ...", + ) +try: + if name.endswith(".vhsb") or (payload[:4] == b"VHSB"): + return _parse_vhsb(payload) + return _parse_nbf(payload) +except Exception as e: + ... +``` + +So the current code: +- **Handles binary-magic VHSB** (`b"VHSB"` prefix, 24-byte header, float32 body) via the inline `_parse_vhsb` +- **Bails on text-header VHSB** (`This is a VHSB file, http://github.com/VH-Lab\n...`) with a soft error — this is the variant vlt handles + +**Critical:** Audri's "Phase A is just `pip install vlt` with zero code changes" assumption is **off by one short edit**. The current code never tries to import vlt — the soft error is an early return on payload prefix. We'd need ~10 LOC to actually call `vlt.file` after installing vhlab-toolbox-python. + +### F6. ndb-v2 already has numpy + scipy + +`backend/requirements.txt` already pins `numpy>=2.0.0` and `scipy>=1.14.0`. These are the heavy NDI-python deps. Image growth from adding vhlab-toolbox-python alone is modest (~10-20 MB). Full NDI-python adds ~80-150 MB (did, ndr, ndi-compress + their numpy/networkx/jsonschema/openMINDS overlap with what's already there). + +### F7. ndb-v2's ADR-009 bans `httpx`/`requests`/`aiohttp` in `services/` + +NDI-python uses `requests` internally. The ADR-009 ban (per `backend/pyproject.toml:90-94`) is **path-scoped** — it forbids importing these libs *inside* `backend/services/`. NDI-python's own use of `requests` is fine (it's a sub-package import, not a direct service import). But if we wrap NDI-python in a `backend/services/ndi_python_service.py`, that wrapper can't directly `import requests` — only NDI-python can. Per-file carve-outs are possible if needed; this is a containable lint problem. + +--- + +## Phase A — vlt-only install (the "free win" with one small caveat) + +**Goal:** Unblock text-header VHSB decoding so Haley's position traces become plottable in the Document Explorer and the Ask chat. Everything else stays exactly as it is today. + +**Scope:** the smallest possible change. + +### A.1 Files to modify + +| File | Change | LOC | Why | +|---|---|---|---| +| `infra/Dockerfile` | `apt-get install -y git` in the Stage 2 system-deps line | +1 | Required for pip to fetch the git-sourced `vhlab-toolbox-python` | +| `backend/requirements.txt` | Add `vhlab-toolbox-python @ git+https://github.com/VH-Lab/vhlab-toolbox-python.git@main` | +1 | Brings in `vlt` | +| `backend/pyproject.toml` | Same addition to `dependencies` | +1 | Mirror — pyproject is the source of truth for dev installs | +| `backend/services/binary_service.py` | Replace lines 164-171 (the soft-error early return) with a vlt call | ~10-15 | Actually use vlt to decode text-VHSB | +| `backend/tests/unit/test_binary_shape.py` | Add a text-VHSB fixture + decode test | +30-50 | Regression coverage | + +### A.2 Concrete `binary_service.py` change + +Current: +```python +head = payload[:5] if len(payload) >= 5 else b"" +if head.startswith(b"This "): + return _timeseries_error( + "vlt_library", + "vlt library is not available on this server — ..." + ) +try: + if name.endswith(".vhsb") or (payload[:4] == b"VHSB"): + return _parse_vhsb(payload) + return _parse_nbf(payload) +``` + +Proposed: +```python +head = payload[:5] if len(payload) >= 5 else b"" +if head.startswith(b"This "): + # Text-header VHSB ("This is a VHSB file, http://github.com/VH-Lab\n…") + # — DID-python's vlt extension parses the typed binary slots that + # follow the text header. Lazy import so a missing vlt downgrades + # cleanly rather than blowing up the worker. + try: + from vlt.file import vhsb_read # type: ignore + except ImportError: + return _timeseries_error( + "vlt_library", + "vlt library is not available — install vhlab-toolbox-python.", + ) + try: + return _from_vlt_vhsb(vhsb_read(io.BytesIO(payload))) + except Exception as e: + log.warning("binary.vlt_decode_failed", error=str(e)) + return _timeseries_error("decode", f"vlt VHSB decode failed: {e}") +try: + if name.endswith(".vhsb") or (payload[:4] == b"VHSB"): + return _parse_vhsb(payload) + return _parse_nbf(payload) +``` + +Plus a small private helper `_from_vlt_vhsb()` that converts vlt's output (likely a numpy array + sample-rate + channel-name list) into the existing `{channels, timestamps, sample_count, format, error}` envelope. Exact shape needs verification against vlt's actual API — Phase A *first action* is to read `vlt/file.py` upstream. + +### A.3 Test plan + +- **Unit**: synthesize a minimal text-header VHSB payload (or pull one from the Haley dataset by hand) and feed it through `BinaryService.get_timeseries` against a mocked cloud-download. Assert non-empty `channels`, correct `format == "vhsb"`, sane `sample_count`. +- **Integration**: extend `backend/tests/integration/test_routes.py` with a route test for `/api/datasets/.../documents/.../signal` against a Haley doc — requires either a recorded fixture or live cloud creds in CI. Recorded fixture is preferred (faster + no creds in CI). +- **Smoke (manual, post-deploy)**: hit `GET /api/datasets/682e7772cdf3f24938176fac/documents//signal` against the deployed Railway URL and confirm a JSON response with non-empty channels. +- **Backward-compat**: the NBF + binary-VHSB paths are untouched. The existing 56 binary-service tests must still pass. + +### A.4 Risk + rollback + +| Concern | Mitigation | +|---|---| +| Image grows by ~10-20 MB | Acceptable. Heavy deps (numpy, scipy) already in. | +| `git` in image adds ~30 MB | Acceptable. One-time cost. | +| vlt's API may not match our envelope | Phase A's *first* concrete action is to read `vlt/file.py` upstream and write the helper. If the API doesn't fit, we adapt or abort Phase A; no production impact. | +| New Dockerfile layer cache miss | First Railway build will be slow (~3-5 min). Subsequent builds re-use the apt layer. | +| Text-header VHSB variant has multiple sub-formats | The vlt library handles them all (that's its job). If we discover a sub-format vlt doesn't handle, we fall through to the existing `_parse_vhsb` (binary magic path). | + +**Rollback**: `git revert` the merge commit. The change is isolated to one branch / one merge; nothing else depends on it. + +### A.5 Pre-flight verification needed + +Before writing Phase A code: + +1. **Read `vhlab-toolbox-python/src/vlt/file.py` upstream** (GitHub) and confirm the public API surface — exact function names, return shapes. +2. **Confirm Railway's `pip install` step has internet access to github.com**. (It almost certainly does, since redis/etc come from PyPI which is GitHub-backed by some mirrors, but `git+https://github.com/...` is a different code path.) +3. **Pick one Haley VHSB doc** as the smoke-test target and note its docId + filename. + +### A.6 Estimated wall-clock + +- 2-3 hours: read vlt's API, write the binary_service change, write the unit test +- 1 hour: smoke-test locally against a saved Haley payload (or via `httpx` against the live cloud) +- 30 min: open PR, wait for CI +- 30 min: merge, wait for Railway deploy, smoke against live URL + +**Total: ~half a day.** Lower bound assumes vlt's API is well-documented and matches the shape we need. + +--- + +## Phase B — full NDI-python (`database_openbinarydoc` swap) + +**Goal:** Replace the two inline binary parsers (`_parse_nbf`, `_parse_vhsb` + the new vlt path) with a single canonical call: `dataset.database_openbinarydoc(doc, filename) → file_handle`. One source of truth for binary parsing, native multi-file selection (eliminates the `?file=` workaround entirely), and forward compatibility with any new binary formats NDI adds upstream. + +### B.1 The cache + volume design question (THE main thing to decide) + +`downloadDataset` is **not per-request**. It eagerly fetches Mongo metadata for a whole dataset (minutes for big ones). Three workable patterns: + +**Option B-1: Persistent volume + warm cache on startup** +- Mount a Railway persistent volume at `/data/ndi/` +- On worker startup, for each of the 8 published datasets, run `downloadDataset(id, /data/ndi/{id})` +- Cache survives across deploys (volume is persistent) +- First boot is slow (potentially 30-60 min for big datasets); subsequent boots are fast (already-cached metadata) +- **Multi-replica caveat**: if Railway scales to N workers, each one re-downloads to its own volume slice. Shared-volume solutions need RWX (NFS-class). Otherwise: download once via a separate one-shot job / cron, share via S3-backed `mount`. + +**Option B-2: Lazy + LRU** +- No startup work. First request for dataset X triggers `downloadDataset(X)` and the response waits. +- Sub-Pattern: a separate background job warms the top-K most-queried datasets while the worker is otherwise idle. +- Eviction: LRU on disk usage; when over budget, delete oldest dataset's `/data/ndi/{id}` dir. +- **Failure mode**: cold first-request latency is intolerable for chat UX (10-30 min). Mitigated by warming. + +**Option B-3: Hybrid — startup-warm the demo datasets only** +- Audri has 8 published datasets. Pre-warm the 3 tutorial-having ones (Bhar, Haley, Dabrowska) on startup. +- For the other 5, fall through to Option B-2 (lazy + LRU). +- **Best risk/reward** for the demo era — known-good warm path for the demo prompts, fallback for everything else. + +**My recommendation: Option B-3 with a `NDI_PREWARM_IDS` env var listing the dataset IDs to fetch on startup.** Cheap to implement; doesn't paint us into a corner. + +### B.2 Files to modify (rough) + +- `infra/Dockerfile`: install full NDI-python + add `/data` volume directive (the volume itself is configured in Railway, the Dockerfile just creates the mount-point dir) +- `infra/railway.toml`: declare the persistent volume +- `backend/requirements.txt` + `pyproject.toml`: add `ndi @ git+...` +- `backend/services/ndi_python_service.py` (NEW): wraps `ndi.cloud.orchestration.downloadDataset` + manages the in-memory `{dataset_id: ndi_dataset_dir}` cache +- `backend/services/binary_service.py`: refactor `get_timeseries` to call `ndi_python_service.open_binary(dataset_id, doc, filename)` behind a feature flag +- `backend/app.py`: startup hook that pre-warms `NDI_PREWARM_IDS` datasets in a background task +- `backend/auth/ndi_cloud.py` (NEW or extension of existing): manage the NDI Cloud JWT lifecycle (currently the FastAPI is using its own session auth; NDI-python needs `NDI_CLOUD_USERNAME` + `NDI_CLOUD_PASSWORD` env vars) +- Tests: characterization test that compares old-vs-new outputs for a known set of NBF + VHSB docs + +### B.3 Feature flag + rollback plan + +- Add `NDI_PYTHON_BINARY=on|off` env var (default `off` initially) +- Branch `get_timeseries`: + - `off`: keep the existing inline parser path (today's code) + - `on`: route to `ndi_python_service.open_binary` +- A/B for one week. Track: + - Latency P50/P95 for `/data/timeseries` + - Response-shape diff rate (should be 0) + - Error rate +- Rollback: flip flag back to `off`. Worst case `git revert` the merge. + +### B.4 Open questions + +1. **Multi-replica strategy**: Railway's persistent volume model — is RWX supported? How does it interact with `WEB_CONCURRENCY=4`? Currently each uvicorn worker is in the same container, so they share the volume trivially. If Railway autoscales to N containers, that breaks. +2. **NDI Cloud auth lifetime**: JWT exp is ~1h per `NDI-python/src/ndi/cloud/auth.py`. We need a refresh strategy (probably refresh-on-401 via the username/password fallback path). +3. **Image build time**: full NDI-python install with 4 git-sourced deps will lengthen CI build time. Cacheable via Docker layer ordering but worth measuring. +4. **Test creds in CI**: `NDI-python/tests/test_cloud_*.py` skip when `NDI_CLOUD_USERNAME` / `PASSWORD` aren't set. Should our own integration tests require live creds, or use a recorded fixture? +5. **AWS Lambda gateway flakiness**: `test_cloud_live.py:42-68` notes the cloud API returns frequent 504s. Need retry + backoff in `ndi_python_service`. + +### B.5 What this unlocks + +- Eliminates the `?file=` param workaround entirely (`database_openbinarydoc` takes the filename natively) +- Supports any future binary format NDI adds (we inherit decoders for free) +- The QuickPlot in the Document Explorer now reads the same upgraded outputs (same `{channels, timestamps, ...}` envelope) → public data-browser users see VHSB decoded too, without any frontend change +- The same `ndi.dataset.Dataset` handle becomes available to the upcoming **private data browser** — same Python API researchers use locally is the cloud read/edit surface +- Lays the groundwork for Phase C + +--- + +## Phase C — new rich endpoints + +**Goal:** Add capabilities the existing REST passthrough can't provide. Purely additive; existing routes stay byte-identical. + +### C.1 Proposed endpoints + +- **`POST /api/datasets/:id/ndiquery`** — accepts an `ndi.query.Query`-style JSON filter. Powers the killer cross-dataset chatbot question in Ask ("compare patch-clamp in V1 across mouse and rat datasets") AND surfaces in the public data browser as a richer query builder than today's class-table filter. Backed by `dataset.database_search(q)`. +- **`POST /api/datasets/:id/documents/:docId/edit`** (auth-gated) — uses `Dataset.database_add` / `_remove` for validated document edits. Foundation for the upcoming **private Data Browser** product where logged-in users can edit their own datasets through a UI. Reuses NDI's schema validation + provenance machinery — we don't reimplement either. +- **`GET /api/datasets/:id/elements/:elementId/native`** — wraps `ndi.element` for richer single-element responses (epoch lists with native typing, probe definitions, etc.). Used by Ask chat + public data browser's element detail view. + +### C.2 Risk + +Low. Each is a new route. If buggy, only Ask chat (which is opt-in feature-flagged on the frontend) and the upcoming Data Browser product (which isn't shipped yet) are affected. Public Document Explorer + catalog APIs untouched. + +### C.3 Out of scope for this spec + +The detailed contracts (request/response shapes, error mapping, rate limits, auth gating) deserve their own spec when we get to them. Phase A + B groundwork has to land first. + +--- + +## Concerns + mitigations (matrix) + +| Concern | Phase | Mitigation | +|---|---|---| +| Docker image size grows ~150-200 MB | B | Worth it. Phase A is just ~10-20 MB. | +| Cold-start adds ~500ms for the ndi import | B | Lazy import (existing pattern in `binary_service.py`). | +| NDI-python version drift | B/C | Pin `ndi==X.Y.Z` once stable; track upstream PRs. | +| Cloud-dataset volume strategy unknown | B | THIS spec's main open design decision. My recommendation: Option B-3 (warm 3 demo IDs at startup, lazy for the rest). Audri to confirm. | +| Multi-replica scaling on Railway | B | Need to research Railway's RWX volume support. If unavailable, use a separate one-shot warmer + S3-mounted shared dir. | +| Performance regression on public Document Explorer | B | Feature flag for week-long A/B; rollback is one flag flip. | +| AWS Lambda gateway 504s | B | Retry-with-backoff wrapper around every cloud call. NDI-python's tests already document this pattern. | +| Existing ADR-009 service-layer HTTP ban | B/C | Per-file carve-out for `services/ndi_python_service.py` (precedent: `services/ontology_service.py` already has one). | +| Test creds in CI | B/C | Use recorded fixtures; reserve live-cred tests for nightly or manual. | + +## Recommended sequence + +1. **Now**: Audri reads this spec, signs off on the Phase A code change + the Option B-3 cache strategy (or proposes an alternative). +2. **Phase A (today / tomorrow)**: ~half-day work. New branch on ndb-v2, ~10-15 LOC change in `binary_service.py`, +1 dep, +1 apt line, +1 test. PR → CI → merge → Railway deploys → smoke against Haley. **Done.** +3. **Demo**: re-run the chart prompt against the Vercel preview. With Phase A landed, *both* Dabrowska NBF and Haley VHSB voltage traces render in the chat. **This is the moment the demo gets the second "wow" datapoint.** +4. **Phase B research week**: write a Phase B detailed spec (separate doc) with the volume + auth + multi-replica answers nailed. Audri reviews before any Phase B code. +5. **Phase B implementation week**: feature-flagged refactor, week-long A/B, then flip. +6. **Phase C**: scope each endpoint individually as a separate PR. No rush. + +## Critical file pointers (so the next session can continue) + +- **This spec**: `ndi-data-browser-v2/docs/plans/2026-05-13-ndi-python-integration.md` (you're reading it) +- **Companion checkpoint**: `ndi-cloud-app/apps/web/docs/specs/2026-05-13-ask-checkpoint-pre-compact.md` +- **NDI-python cloud module**: `/Users/audribhowmick/Documents/ndi-projects/NDI-python/src/ndi/cloud/` +- **NDI-python cloud tests** (the canonical "how to use it" examples): `/Users/audribhowmick/Documents/ndi-projects/NDI-python/tests/test_cloud_*.py` +- **vhlab-toolbox-python (vlt)**: `https://github.com/VH-Lab/vhlab-toolbox-python` (not cloned locally yet — need to fetch for Phase A code) +- **Current binary parser**: `ndi-data-browser-v2/backend/services/binary_service.py` (lines 164-184 are the edit target for Phase A) +- **NDI-python tutorials** (real usage patterns): `/Users/audribhowmick/Documents/ndi-projects/NDI-python/tutorials/tutorial_*.py` + +## Open questions for Audri + +1. **Phase A approval?** ~10 LOC + 1 dep + 1 apt line + 1 test. Risk is low. Land it before any further architectural moves? +2. **Volume strategy for Phase B?** Option B-3 (warm 3 demo IDs on startup, lazy for the rest) — agreed, or different preference? +3. **Phase B feature flag** — fine to default `off` for a week of A/B, then flip? +4. **Phase C scope** — same set of three endpoints (`ndiquery`, `edit`, `element/native`), or different priorities? +5. **NDI Cloud test creds** — should our integration tests require live creds, or recorded fixtures only? +6. **Timing** — Phase A this week, Phase B research next week, Phase B implementation week after? Or different cadence? + +--- + +*No production code has been written for any of these phases. This document is a planning artifact only. The Phase A change is small and well-scoped; the Phase B refactor needs more design work; Phase C waits on Phase B.* diff --git a/docs/plans/2026-05-13-railway-experimental-env-runbook.md b/docs/plans/2026-05-13-railway-experimental-env-runbook.md new file mode 100644 index 0000000..91be501 --- /dev/null +++ b/docs/plans/2026-05-13-railway-experimental-env-runbook.md @@ -0,0 +1,141 @@ +# Railway experimental environment — setup runbook + +Companion to `2026-05-13-ndi-python-integration.md`. The Phase A backend changes live on `feat/ndi-python-phase-a`; this doc walks through the **dashboard-only steps** that are required to spin up an "experimental" Railway environment pointing at that branch, so the audit can compare it against production byte-for-byte. + +**Why manual?** Railway's MCP / API doesn't expose environment-creation. Environments are a project-level construct that has to be set up via the dashboard. Once it exists, all subsequent deploys + redeploys CAN be triggered programmatically. + +## Pre-flight + +- [ ] You're logged into Railway on the audrib's-Projects workspace +- [ ] The `feat/ndi-python-phase-a` branch is pushed to GitHub (Claude will commit + push as the last step of the implementation pass) +- [ ] You have ~10 minutes for the dashboard walk-through plus ~5 min for the first build to run + +## Step-by-step + +### 1. Open the project + +Navigate to: + +``` +https://railway.com/project/81a57456-ae9a-47d0-98ef-2b5463f4815b +``` + +You should see the `ndi-data-browser-v2` project with three services: **ndb-v2**, **Postgres**, **Redis**, and the environment dropdown showing **"production"** in the top-left. + +### 2. Create the new environment + +1. Click the environment dropdown (top-left, currently "production") +2. Click **"+ New Environment"** +3. Name it **`experimental`** (lowercase, no spaces) +4. Choose **"Fork from production"** when prompted — this copies the existing services and env vars as starting points (saves us from re-entering NDI_CLOUD_USERNAME etc.). DO NOT pick "Create empty" — that's much more work. +5. Click **Create** + +You should now be inside the new `experimental` environment, with copies of all three services. + +### 3. Point `ndb-v2` at the feature branch + +1. Click into the **`ndb-v2`** service (still in the `experimental` environment) +2. Go to **Settings** → **Service** → **Source** +3. Change the **Branch** from `main` → **`feat/ndi-python-phase-a`** +4. Save / confirm + +Railway will trigger a deploy. **Wait ~3-5 minutes** for the new image (with NDI-python + git deps) to build. The Dockerfile's added `RUN python -c "from vlt.file..."` sanity check will fail the build if anything is missing, so a successful deploy = the import chain works end-to-end. + +### 4. Verify Postgres + Redis are shared (or not) + +The forked environment SHOULD have its own logical instances of Postgres + Redis under the same project umbrella. **Open each service in the experimental env and confirm**: + +- The Postgres service inside `experimental` is a separate instance from production's. If it's NOT (i.e., it's the same `DATABASE_URL`), you have two options: + - **(a) Share — accept the risk**: experimental writes to production's Postgres. Acceptable IF the experimental backend is read-only on Postgres (which the NDI-python paths are — they don't write). + - **(b) Isolate — recommended**: in experimental's Postgres service settings, click **"Create new database"**. This adds a fresh empty Postgres instance for experimental only. +- Same checkbox for Redis. Redis is the cache layer; sharing it is mostly fine (cache poisoning is the only risk; experimental writes the same shape of data as production). + +**My recommendation:** isolate Postgres, share Redis. Cheapest cost, lowest risk. + +### 5. Get the public URL + +1. Inside the `experimental` env's **ndb-v2** service, go to **Settings** → **Networking** +2. Under **Public Networking**, click **"Generate Domain"** (or similar — Railway sometimes auto-assigns) +3. Copy the resulting URL — should look like `ndb-v2-experimental-production.up.railway.app` or `ndb-v2-experimental.up.railway.app` +4. Verify it responds: `curl https:///api/health` should return `{"status":"ok"}` (or similar) + +### 6. Set the cloud-app preview to point at this URL + +This step is on the Vercel side, NOT Railway. Two ways to do it: + +**Option A — Branch-scoped env vars (recommended):** + +1. Go to https://vercel.com/your-team/ndi-cloud-app/settings/environment-variables +2. For each of these vars, **add a new entry** scoped to the **Preview** environment for the **`feat/experimental-ask-chat`** branch: + +``` +UPSTREAM_API_URL=https:// +INTERNAL_API_URL=https:// +``` + +3. Hit **Save** for each +4. Trigger a fresh build of `feat/experimental-ask-chat` (push any commit, or click "Redeploy" in Vercel's Deployments tab) + +**Option B — Just override at deploy time:** + +If you don't want persistent env-var entries, you can pass them inline when triggering a redeploy from Vercel CLI: + +``` +vercel --prod=false env add UPSTREAM_API_URL preview feat/experimental-ask-chat +``` + +Either way, the Vercel preview that comes out the other side should now serve the experimental backend's responses to anonymous public page requests. + +### 7. Smoke-check before running the audit + +Open the Vercel preview URL in an incognito browser: + +- `/datasets` should load with the catalog (8 datasets) +- `/datasets/682e7772cdf3f24938176fac/documents` (Haley) should load +- Pick a Haley binary doc → expand QuickPlot → **should now render the position trace** (previously soft-errored with the vlt_library message — this is the Phase A unblock) + +If any of those fail, check the Railway logs for the `ndb-v2` service in the `experimental` env via: + +``` +gh api /repos/Waltham-Data-Science/ndi-data-browser-v2/actions # (or the railway-agent MCP) +``` + +Or pull logs directly from the dashboard. + +### 8. Tell Claude the audit is ready + +Reply with the two URLs and Claude will run the audit: + +``` +LIVE_URL=https://ndi-cloud.com +EXPERIMENTAL_URL= +``` + +Claude will also need the experimental Railway URL (for the Layer 1 backend-API diff): + +``` +EXPERIMENTAL_API_URL=https:// +``` + +## Expected cost + +For the `experimental` environment with 2 replicas of ndb-v2 + isolated Postgres + shared Redis, while the audit is running: + +- **ndb-v2**: ~$1-3/mo while actively serving traffic, much less idle +- **Postgres (new instance)**: ~$3-5/mo for the smallest tier +- **Redis (shared)**: $0 — already in production + +**Total marginal: ~$5-10/mo while the env exists.** Pro plan's $20 monthly credit absorbs this if you're not already near the ceiling. + +**Tear down after the audit:** once Phase A is decided (either merged to main or rejected), you can delete the `experimental` environment to stop the meter: + +1. Project page → environment dropdown → "experimental" → "Delete Environment" + +The Postgres data + Redis content go with it (the production env is untouched). + +## Rollback / abort + +If at any step you decide not to proceed: + +- **Easiest**: delete the `experimental` environment per above. Zero impact on production. +- **More cautious**: pause the deploy on the experimental ndb-v2 service via Settings → Service → Pause. This stops the meter but preserves the setup for resumption. diff --git a/infra/Dockerfile b/infra/Dockerfile index 9c0d3ad..42dd3b1 100644 --- a/infra/Dockerfile +++ b/infra/Dockerfile @@ -16,6 +16,7 @@ ENV PYTHONDONTWRITEBYTECODE=1 \ RUN apt-get update && apt-get install -y --no-install-recommends \ libjpeg62-turbo libtiff6 curl \ + git \ && rm -rf /var/lib/apt/lists/* WORKDIR /app @@ -27,6 +28,25 @@ RUN mkdir -p /tmp/ndb && chown -R ndb:ndb /tmp/ndb /app COPY backend/requirements.txt ./backend/requirements.txt RUN pip install -r backend/requirements.txt +# --- NDI-python integration (Phase A) --------------------------------------- +# NDI-python and its git-sourced kin (did, ndr, vhlab-toolbox-python, +# ndi-compress) are installed with `--no-deps` to skip matplotlib (~50-70 MB) +# and opencv-python-headless (~80 MB) which they declare but our backend never +# imports. The runtime deps we DO need are listed in backend/requirements.txt +# above (pandas, networkx, jsonschema, openMINDS, portalocker, h5py, requests). +# +# These pip installs are split into individual layers so a change to one of the +# four git pins re-runs only that single layer (each is small + cached). +RUN pip install --no-deps "vhlab-toolbox-python @ git+https://github.com/VH-Lab/vhlab-toolbox-python.git@main" +RUN pip install --no-deps "did @ git+https://github.com/VH-Lab/DID-python.git@main" +RUN pip install --no-deps "ndr @ git+https://github.com/VH-lab/NDR-python.git@main" +RUN pip install --no-deps "ndi-compress @ git+https://github.com/Waltham-Data-Science/NDI-compress-python.git@main" +RUN pip install --no-deps "ndi @ git+https://github.com/Waltham-Data-Science/NDI-python.git@main" +# Sanity: import every entry point we use in production so a missing +# transitive dep fails the build, not the first request. +RUN python -c "from vlt.file.custom_file_formats import vhsb_read, vhsb_readheader; \ +import ndicompress; from ndi.ontology import lookup; print('ndi-python stack importable')" + COPY backend/ ./backend/ COPY --from=frontend-build /app/frontend/dist ./frontend_dist From b6ac0a6246aa38cb6592586a1806456f2f4b438a Mon Sep 17 00:00:00 2001 From: audriB Date: Wed, 13 May 2026 21:54:08 -0400 Subject: [PATCH 02/54] feat(tabular_query): violin-chart endpoint + NDI-python SHA pinning + strict-boot Adds the first Plotly-charts pipeline backend half: a tabular_query endpoint that aggregates ontologyTableRow documents into per-group statistics (mean, median, std, q1/q3, min/max, count) plus the raw values needed for a violin-with-jitter render. Powers Dabrowska EPM and Fear-Potentiated Startle plots, Bhar chemotaxis violins, and any future categorical-by-group comparison. Endpoint: GET /api/datasets/{id}/tabular_query ?variableNameContains=ElevatedPlusMaze # required substring &groupBy=treatment_group # optional &groupOrder=Saline,CNO # optional CSV Pipeline: 1. SummaryTableService.ontology_tables(...) returns one group per distinct variableNames schema (already exists, used by Document Explorer) 2. tabular_query_service finds the first group with a column whose key/label matches the substring 3. Buckets rows by groupBy column (or single 'all' bucket if unset) 4. Computes stats on the full value list, then stride-samples down to MAX_VALUES_PER_GROUP=500 for the violin's jitter overlay (stats stay accurate) 5. Caps at MAX_GROUPS=20 with first-seen ordering + explicit group_order override Hygiene fixes folded into this commit: - Pin all five NDI-python git deps to specific SHAs in infra/Dockerfile (replaces @main which silently picks up upstream changes on every redeploy). SHAs captured 2026-05-13. To upgrade: re-run `git ls-remote HEAD` and bump. - Strict-on-boot NDI check: when NDI_PYTHON_REQUIRED=1 (set by the Dockerfile), lifespan hard-fails if vlt/ndicompress/ndi.ontology aren't importable. Catches broken images at boot instead of letting the chat silently degrade with every NDI tool returning None. Tests: 21 new unit tests cover the violin aggregation math + edge cases (no matches, no group_by, group_order overrides, group cap, value cap with accurate stats, NaN/Inf skipping, empty substring, no ontology docs). All 559 backend tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) --- backend/app.py | 28 ++ backend/routers/_deps.py | 5 + backend/routers/tabular_query.py | 89 +++++ backend/services/tabular_query_service.py | 307 ++++++++++++++++++ .../tests/unit/test_tabular_query_service.py | 292 +++++++++++++++++ infra/Dockerfile | 25 +- 6 files changed, 738 insertions(+), 8 deletions(-) create mode 100644 backend/routers/tabular_query.py create mode 100644 backend/services/tabular_query_service.py create mode 100644 backend/tests/unit/test_tabular_query_service.py diff --git a/backend/app.py b/backend/app.py index 9d4e1bb..dbbcba3 100644 --- a/backend/app.py +++ b/backend/app.py @@ -45,6 +45,7 @@ query, signal, tables, + tabular_query, visualize, ) from .services.ontology_cache import OntologyCache @@ -255,6 +256,32 @@ async def _facets_warm() -> None: log.info("keepwarm.started", interval_seconds=240) log.info("facets_warm.started", interval_seconds=240) + # NDI-python strict-boot check. + # + # The Phase A integration adds vlt (VHSB), ndicompress, and + # ndi.ontology. When `NDI_PYTHON_REQUIRED=1` (set by the Railway + # Dockerfile), the stack MUST be importable or we hard-fail. + # Unset (dev/test/CI), we log a warning if NDI is missing but + # keep going — every NDI-python call gracefully returns None and + # callers fall through to their legacy paths. + # + # Why an explicit env var rather than guessing from + # `settings.ENVIRONMENT`: the test/CI/local matrix is fuzzy, and + # the only thing that actually matters here is "is this image + # supposed to have NDI-python installed?" The Dockerfile knows; + # nothing else needs to. + import os as _os + if _os.environ.get("NDI_PYTHON_REQUIRED", "").strip() in ("1", "true", "yes"): + from .services import ndi_python_service as _ndi + if not _ndi.is_ndi_available(): + raise RuntimeError( + "ndi_python_service.is_ndi_available() returned False at " + "startup but NDI_PYTHON_REQUIRED=1. The NDI-python stack " + "(vlt, ndicompress, ndi.ontology) failed to import. Check " + "the Dockerfile's pinned git SHAs and the install layer logs." + ) + log.info("ndi_python.boot_ok") + log.info("app.startup", environment=settings.ENVIRONMENT) try: yield @@ -424,6 +451,7 @@ async def handle_unhandled(request: Request, exc: Exception) -> JSONResponse: app.include_router(query.facets_router) app.include_router(binary.router) app.include_router(signal.router) + app.include_router(tabular_query.router) app.include_router(ontology.router) app.include_router(visualize.router) diff --git a/backend/routers/_deps.py b/backend/routers/_deps.py index 7afc141..4bddcd6 100644 --- a/backend/routers/_deps.py +++ b/backend/routers/_deps.py @@ -23,6 +23,7 @@ from ..services.pivot_service import PivotService from ..services.query_service import QueryService from ..services.summary_table_service import SummaryTableService +from ..services.tabular_query_service import TabularQueryService from ..services.visualize_service import VisualizeService @@ -62,6 +63,10 @@ def summary_table_service(request: Request) -> SummaryTableService: return SummaryTableService(cloud(request), cache=table_cache(request)) +def tabular_query_service(request: Request) -> TabularQueryService: + return TabularQueryService(summary_table_service(request)) + + def dataset_summary_cache(request: Request) -> RedisTableCache | None: return getattr(request.app.state, "dataset_summary_cache", None) diff --git a/backend/routers/tabular_query.py b/backend/routers/tabular_query.py new file mode 100644 index 0000000..18d9fe9 --- /dev/null +++ b/backend/routers/tabular_query.py @@ -0,0 +1,89 @@ +"""Tabular-query endpoint for the experimental /ask chat's +``tabular_query`` tool + ``ViolinChart`` component. + +GET /api/datasets/{dataset_id}/tabular_query + ?variableNameContains=ElevatedPlusMaze (required substring) + &groupBy=treatment_group (optional grouping col) + &groupOrder=Saline,CNO (optional CSV order) + +Returns per-group summary stats + raw values for a violin / jitter +plot. See :mod:`backend.services.tabular_query_service` for the +aggregation logic. + +This is a NEW additive endpoint — no schema changes, no existing- +route changes. Anonymous-readable (matches the read posture of the +rest of v2's surface). +""" +from __future__ import annotations + +from typing import Annotated, Any + +from fastapi import APIRouter, Depends, Query + +from ..auth.dependencies import get_current_session +from ..auth.session import SessionData +from ..services.tabular_query_service import TabularQueryService +from ._deps import limit_reads, tabular_query_service +from ._validators import DatasetId + +router = APIRouter( + prefix="/api/datasets/{dataset_id}", + tags=["tabular_query"], + dependencies=[Depends(limit_reads)], +) + + +@router.get("/tabular_query") +async def tabular_query( + dataset_id: DatasetId, + svc: Annotated[TabularQueryService, Depends(tabular_query_service)], + session: Annotated[SessionData | None, Depends(get_current_session)], + variableNameContains: Annotated[ + str, + Query( + min_length=1, + max_length=200, + description=( + "Substring matched against the ontologyTableRow's name " + "and column headers. Case-insensitive." + ), + ), + ], + groupBy: Annotated[ + str | None, + Query( + min_length=1, + max_length=80, + description=( + "Optional grouping column (e.g. 'treatment_group', " + "'strain'). When unset, all rows form one group " + "named 'all'." + ), + ), + ] = None, + groupOrder: Annotated[ + str | None, + Query( + max_length=400, + description=( + "Optional CSV of group names defining left-to-right " + "order on the violin plot. Names not present in the " + "data are dropped; data with unlisted groups appears " + "after the listed ones." + ), + ), + ] = None, +) -> dict[str, Any]: + group_order_list = ( + [g.strip() for g in groupOrder.split(",") if g.strip()] + if groupOrder + else None + ) + result = await svc.violin_groups( + dataset_id, + variableNameContains, + group_by=groupBy, + group_order=group_order_list, + session=session, + ) + return result diff --git a/backend/services/tabular_query_service.py b/backend/services/tabular_query_service.py new file mode 100644 index 0000000..880075d --- /dev/null +++ b/backend/services/tabular_query_service.py @@ -0,0 +1,307 @@ +"""tabular_query_service — aggregate ``ontologyTableRow`` documents +into per-group statistics + raw values for violin/jitter rendering. + +Used by the chat's ``tabular_query`` tool. The chat passes a substring +match against an ``ontologyTableRow`` column name (e.g. +``"ElevatedPlusMaze_OpenArmNorth_Entries"``) plus an optional +grouping column key (e.g. ``"treatment_group"``). The service: + +1. Calls :meth:`SummaryTableService.ontology_tables` which projects + the dataset's ``ontologyTableRow`` docs into one group per + distinct ``variableNames`` schema +2. Finds the first group containing a column whose key/label matches + the substring; that column is the value column +3. If ``groupBy`` is given, finds the column with that key inside the + same group; that's the grouping column +4. Iterates rows (each row is a dict keyed by column key), bucketing + numeric values by group label +5. Computes per-group stats (mean, median, std, min/max, q1/q3, + count) plus the raw values (capped + stride-sampled) for the + violin's jitter overlay +6. Returns the response shape :class:`ViolinChart` consumes + +Notable: this service does NOT call NDI-python — it operates on the +already-decoded ``ontologyTableRow`` shape that +``SummaryTableService`` projects from cloud-node. NDI-python becomes +valuable on the binary/decoding side, not the tabular-aggregation +side. Keeping this service pure-Python (statistics module only) keeps +it fast + side-effect-free. +""" +from __future__ import annotations + +import math +import statistics +from typing import Any + +from ..auth.session import SessionData +from ..observability.logging import get_logger +from .summary_table_service import SummaryTableService + +log = get_logger(__name__) + + +# Bound the response size — a violin with 100 groups isn't a chart, +# it's a wall of text. The chat tool's `groupOrder` parameter is the +# right escape hatch when callers really want a curated subset. +MAX_GROUPS = 20 + +# Per-group raw-value cap. Plotly's violin trace can comfortably +# render ~500 jitter points per group before the chart slows down on +# resize. Beyond that we stride-sample. The summary stats are computed +# on the FULL value list before sampling, so they remain accurate. +MAX_VALUES_PER_GROUP = 500 + + +class TabularQueryService: + """Aggregate ontologyTableRow docs into per-group stats.""" + + def __init__(self, summary: SummaryTableService) -> None: + self.summary = summary + + async def violin_groups( + self, + dataset_id: str, + variable_name_contains: str, + *, + group_by: str | None, + group_order: list[str] | None, + session: SessionData | None, + ) -> dict[str, Any]: + """Return ``{groups: [...], yLabel, xLabel, source?}``. + + Each group has the shape consumed by + ``apps/web/components/charts/ViolinChart.tsx``:: + + {name, values, count, mean, median, std, min, max, q1, q3} + """ + if not variable_name_contains: + return _empty_response(group_by, reason="empty variableNameContains") + + ontology = await self.summary.ontology_tables(dataset_id, session=session) + groups = ontology.get("groups", []) + if not groups: + return _empty_response( + group_by, reason="no ontologyTableRow docs in dataset", + ) + + match = _find_matching_group(groups, variable_name_contains) + if match is None: + return _empty_response( + group_by, + reason=f"no ontologyTableRow column matched '{variable_name_contains}'", + available={"variable_names": [ + " | ".join(g.get("variableNames", []))[:120] + for g in groups[:5] + ]}, + ) + + group, value_col, value_label = match + rows = (group.get("table") or {}).get("rows") or [] + if not rows: + return _empty_response( + group_by, + reason="matched group had no rows", + yLabel=value_label, + ) + + buckets, order_seen = _bucket_rows(rows, value_col, group_by) + if not buckets: + return _empty_response( + group_by, + reason="no numeric values in matched column", + yLabel=value_label, + ) + + ordered_keys = _ordered_group_keys(buckets, order_seen, group_order) + out_groups = _build_group_payloads(buckets, ordered_keys) + + result: dict[str, Any] = { + "groups": out_groups, + "yLabel": value_label, + "xLabel": group_by or "group", + } + doc_ids = group.get("docIds") or [] + if doc_ids: + result["source"] = { + "dataset_id": dataset_id, + "document_id": doc_ids[0], + "variable_name": value_label, + } + return result + + +# --------------------------------------------------------------------------- +# Internal helpers — each is single-purpose so the orchestrator stays linear. +# --------------------------------------------------------------------------- + + +def _empty_response( + group_by: str | None, + *, + reason: str, + yLabel: str = "", + available: dict[str, Any] | None = None, +) -> dict[str, Any]: + meta: dict[str, Any] = {"reason": reason} + if available: + meta.update(available) + return { + "groups": [], + "yLabel": yLabel, + "xLabel": group_by or "", + "_meta": meta, + } + + +def _find_matching_group( + groups: list[dict[str, Any]], + needle: str, +) -> tuple[dict[str, Any], str, str] | None: + """Locate the first ontologyTableRow group containing a column + whose `key` OR `label` contains the search substring. + + Returns (group, value_column_key, value_column_label) or None. + """ + needle_lower = needle.lower() + for g in groups: + cols = (g.get("table") or {}).get("columns") or [] + for col in cols: + key = str(col.get("key", "")) + label = str(col.get("label", "")) + if needle_lower in key.lower() or needle_lower in label.lower(): + return g, key, label or key + return None + + +def _bucket_rows( + rows: list[dict[str, Any]], + value_col: str, + group_by: str | None, +) -> tuple[dict[str, list[float]], list[str]]: + """Walk rows, extract numeric value + grouping label. + + Returns (buckets_by_group_name, order_seen). + """ + buckets: dict[str, list[float]] = {} + order_seen: list[str] = [] + for row in rows: + v_raw = row.get(value_col) + if v_raw is None: + continue + try: + v = float(v_raw) + except (TypeError, ValueError): + continue + if not math.isfinite(v): + continue + if group_by: + g_raw = row.get(group_by) + if g_raw is None: + continue + g = str(g_raw) + else: + g = "all" + if g not in buckets: + buckets[g] = [] + order_seen.append(g) + buckets[g].append(v) + return buckets, order_seen + + +def _ordered_group_keys( + buckets: dict[str, list[float]], + order_seen: list[str], + group_order: list[str] | None, +) -> list[str]: + """Resolve final group ordering. Caller's explicit `group_order` + wins; unspecified groups append at the end (never silently + dropped); finally capped to MAX_GROUPS.""" + if group_order: + ordered = [g for g in group_order if g in buckets] + for g in order_seen: + if g not in ordered: + ordered.append(g) + else: + ordered = list(order_seen) + return ordered[:MAX_GROUPS] + + +def _build_group_payloads( + buckets: dict[str, list[float]], + ordered_keys: list[str], +) -> list[dict[str, Any]]: + out: list[dict[str, Any]] = [] + for name in ordered_keys: + vals = buckets.get(name) or [] + if not vals: + continue + stats = _summary_stats(vals) + # Cap raw values for the response payload — stats above were + # computed on the FULL list so they remain accurate. + sampled = _stride_sample(vals, MAX_VALUES_PER_GROUP) + out.append({ + "name": name, + "values": sampled, + **stats, + }) + return out + + +def _summary_stats(values: list[float]) -> dict[str, float | int]: + """Compute the stats payload ViolinChart expects.""" + n = len(values) + sorted_v = sorted(values) + mean = statistics.fmean(values) + median = statistics.median(values) + std = statistics.stdev(values) if n >= 2 else 0.0 + # Linear-interpolated percentile — matches numpy.percentile default + # closely enough for chart annotation purposes. + q1 = _percentile(sorted_v, 25) + q3 = _percentile(sorted_v, 75) + return { + "count": n, + "mean": float(mean), + "median": float(median), + "std": float(std), + "min": float(sorted_v[0]), + "max": float(sorted_v[-1]), + "q1": float(q1), + "q3": float(q3), + } + + +def _percentile(sorted_values: list[float], p: float) -> float: + """Linear-interpolated percentile on a pre-sorted list.""" + if not sorted_values: + return 0.0 + if len(sorted_values) == 1: + return sorted_values[0] + rank = (p / 100.0) * (len(sorted_values) - 1) + lo = math.floor(rank) + hi = math.ceil(rank) + if lo == hi: + return sorted_values[lo] + frac = rank - lo + return sorted_values[lo] * (1 - frac) + sorted_values[hi] * frac + + +def _stride_sample(values: list[float], cap: int) -> list[float]: + """Stride-sample to (at most) `cap` points. Preserves first + last + via linspace-style stepping so the violin's jitter overlay shows + the distribution shape end-to-end.""" + n = len(values) + if n <= cap: + return list(values) + if cap <= 2: + return [values[0], values[-1]][:cap] + step = (n - 1) / (cap - 1) + indices = [round(i * step) for i in range(cap)] + # Dedupe in case rounding collapses adjacent indices (rare; + # happens only when `cap` approaches `n`). + seen: set[int] = set() + picked: list[int] = [] + for i in indices: + if i not in seen: + seen.add(i) + picked.append(i) + return [values[i] for i in picked] diff --git a/backend/tests/unit/test_tabular_query_service.py b/backend/tests/unit/test_tabular_query_service.py new file mode 100644 index 0000000..cc4b0ea --- /dev/null +++ b/backend/tests/unit/test_tabular_query_service.py @@ -0,0 +1,292 @@ +"""Unit tests for TabularQueryService. + +Tests focus on the aggregation math + edge cases. The SummaryTableService +dependency is stubbed — its own tests cover the ontologyTableRow +projection logic. +""" +from __future__ import annotations + +from typing import Any + +import pytest + +from backend.services.tabular_query_service import ( + MAX_GROUPS, + MAX_VALUES_PER_GROUP, + TabularQueryService, + _percentile, + _stride_sample, + _summary_stats, +) + +# --------------------------------------------------------------------------- +# Stat-helper unit tests — pure functions, no IO +# --------------------------------------------------------------------------- + + +class TestSummaryStats: + def test_basic_stats(self): + vals = [1.0, 2.0, 3.0, 4.0, 5.0] + s = _summary_stats(vals) + assert s["count"] == 5 + assert s["mean"] == 3.0 + assert s["median"] == 3.0 + assert s["min"] == 1.0 + assert s["max"] == 5.0 + assert abs(s["std"] - 1.5811) < 0.001 + assert s["q1"] == 2.0 + assert s["q3"] == 4.0 + + def test_single_value_zero_std(self): + s = _summary_stats([7.0]) + assert s["count"] == 1 + assert s["std"] == 0.0 + assert s["mean"] == 7.0 + + def test_two_values(self): + s = _summary_stats([10.0, 20.0]) + assert s["count"] == 2 + assert s["mean"] == 15.0 + assert s["median"] == 15.0 + + +class TestPercentile: + def test_quartiles(self): + assert _percentile([1, 2, 3, 4, 5], 25) == 2.0 + assert _percentile([1, 2, 3, 4, 5], 50) == 3.0 + assert _percentile([1, 2, 3, 4, 5], 75) == 4.0 + + def test_endpoints(self): + assert _percentile([1, 2, 3, 4, 5], 0) == 1.0 + assert _percentile([1, 2, 3, 4, 5], 100) == 5.0 + + def test_empty_returns_zero(self): + assert _percentile([], 50) == 0.0 + + def test_single_value(self): + assert _percentile([42.0], 50) == 42.0 + + +class TestStrideSample: + def test_under_cap_returns_all(self): + assert _stride_sample([1.0, 2.0, 3.0], cap=10) == [1.0, 2.0, 3.0] + + def test_over_cap_preserves_endpoints(self): + vals = [float(i) for i in range(100)] + out = _stride_sample(vals, cap=10) + assert len(out) == 10 + assert out[0] == 0.0 + assert out[-1] == 99.0 + + +# --------------------------------------------------------------------------- +# Service-level: stub SummaryTableService with the real ontology_tables +# response shape (one group per distinct variableNames schema, rows are +# dicts keyed by variableName). +# --------------------------------------------------------------------------- + + +def _make_ontology_response( + columns: list[dict[str, Any]], + rows: list[dict[str, Any]], + *, + doc_ids: list[str] | None = None, +) -> dict[str, Any]: + """Build a one-group ontology_tables response matching the real + shape returned by SummaryTableService.ontology_tables. + """ + return { + "groups": [ + { + "variableNames": [c["key"] for c in columns], + "names": [c.get("label", c["key"]) for c in columns], + "ontologyNodes": [c.get("ontologyTerm") for c in columns], + "table": {"columns": columns, "rows": rows}, + "docIds": doc_ids or [], + "rowCount": len(rows), + }, + ], + } + + +class _FakeSummaryService: + """Stub for SummaryTableService — returns a canned ontology_tables payload.""" + + def __init__(self, response: dict[str, Any]) -> None: + self._response = response + + async def ontology_tables( + self, + dataset_id: str, # noqa: ARG002 — stub mirrors the real signature + *, + session: Any, # noqa: ARG002 — stub mirrors the real signature + ) -> dict[str, Any]: + return self._response + + +@pytest.mark.asyncio +async def test_violin_groups_basic(): + """Two-group violin keyed on a column label substring.""" + columns = [ + {"key": "treatment_group", "label": "treatment_group"}, + {"key": "EPM_OpenArm_Entries", "label": "EPM Open Arm Entries"}, + ] + rows = [ + {"treatment_group": "Saline", "EPM_OpenArm_Entries": 5.0}, + {"treatment_group": "Saline", "EPM_OpenArm_Entries": 7.0}, + {"treatment_group": "Saline", "EPM_OpenArm_Entries": 6.0}, + {"treatment_group": "CNO", "EPM_OpenArm_Entries": 2.0}, + {"treatment_group": "CNO", "EPM_OpenArm_Entries": 3.0}, + {"treatment_group": "CNO", "EPM_OpenArm_Entries": 1.0}, + ] + response = _make_ontology_response(columns, rows, doc_ids=["doc_abc"]) + svc = TabularQueryService(_FakeSummaryService(response)) # type: ignore[arg-type] + result = await svc.violin_groups( + "dataset_xyz", + "OpenArm", + group_by="treatment_group", + group_order=None, + session=None, + ) + assert len(result["groups"]) == 2 + by_name = {g["name"]: g for g in result["groups"]} + assert by_name["Saline"]["mean"] == 6.0 + assert by_name["CNO"]["mean"] == 2.0 + assert by_name["Saline"]["count"] == 3 + assert result["source"]["document_id"] == "doc_abc" + assert result["xLabel"] == "treatment_group" + # Label comes from the human-readable column label, not the raw key. + assert "Open Arm Entries" in result["yLabel"] + + +@pytest.mark.asyncio +async def test_violin_groups_no_match_returns_empty_with_meta(): + columns = [{"key": "unrelated", "label": "Unrelated Variable"}] + rows = [{"unrelated": 1.0}] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "ElevatedPlusMaze", group_by="g", group_order=None, session=None, + ) + assert result["groups"] == [] + assert "no ontologyTableRow column matched" in result["_meta"]["reason"] + + +@pytest.mark.asyncio +async def test_violin_groups_respects_group_order(): + columns = [ + {"key": "group", "label": "group"}, + {"key": "y", "label": "y"}, + ] + rows = [ + {"group": "A", "y": 1.0}, + {"group": "B", "y": 2.0}, + {"group": "C", "y": 3.0}, + ] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "y", group_by="group", group_order=["C", "A"], session=None, + ) + names = [g["name"] for g in result["groups"]] + # C and A specified first; B (unspecified) appears after. + assert names == ["C", "A", "B"] + + +@pytest.mark.asyncio +async def test_violin_groups_no_group_by_makes_single_group(): + columns = [{"key": "y", "label": "Value"}] + rows = [{"y": 1.0}, {"y": 2.0}, {"y": 3.0}, {"y": 4.0}] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "Value", group_by=None, group_order=None, session=None, + ) + assert len(result["groups"]) == 1 + assert result["groups"][0]["name"] == "all" + assert result["groups"][0]["count"] == 4 + + +@pytest.mark.asyncio +async def test_violin_groups_caps_group_count(): + columns = [{"key": "g", "label": "g"}, {"key": "y", "label": "y"}] + rows = [ + {"g": f"g{i}", "y": float(i)} for i in range(MAX_GROUPS + 5) + ] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "y", group_by="g", group_order=None, session=None, + ) + assert len(result["groups"]) == MAX_GROUPS + + +@pytest.mark.asyncio +async def test_violin_groups_caps_values_per_group_but_stats_use_full(): + """Stats are computed BEFORE the value-list sampling so they remain accurate.""" + columns = [{"key": "g", "label": "g"}, {"key": "y", "label": "Value"}] + n = MAX_VALUES_PER_GROUP + 200 + rows = [{"g": "all", "y": float(i)} for i in range(n)] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "Value", group_by="g", group_order=None, session=None, + ) + g = result["groups"][0] + assert len(g["values"]) <= MAX_VALUES_PER_GROUP + expected_mean = (n - 1) / 2 + assert abs(g["mean"] - expected_mean) < 0.001 + assert g["count"] == n + + +@pytest.mark.asyncio +async def test_violin_groups_skips_nonfinite_values(): + """NaN / inf rows shouldn't blow up the aggregation.""" + columns = [{"key": "g", "label": "g"}, {"key": "y", "label": "y"}] + rows = [ + {"g": "A", "y": 1.0}, + {"g": "A", "y": 2.0}, + {"g": "A", "y": float("nan")}, + {"g": "A", "y": float("inf")}, + {"g": "B", "y": 5.0}, + ] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "y", group_by="g", group_order=None, session=None, + ) + by_name = {g["name"]: g for g in result["groups"]} + assert by_name["A"]["count"] == 2 + assert by_name["A"]["mean"] == 1.5 + + +@pytest.mark.asyncio +async def test_violin_groups_empty_substring_returns_empty(): + columns = [{"key": "y", "label": "y"}] + rows = [{"y": 1.0}] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "", group_by=None, group_order=None, session=None, + ) + assert result["groups"] == [] + assert "empty" in result["_meta"]["reason"] + + +@pytest.mark.asyncio +async def test_violin_groups_no_ontology_docs_returns_empty(): + svc = TabularQueryService( + _FakeSummaryService({"groups": []}), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "anything", group_by=None, group_order=None, session=None, + ) + assert result["groups"] == [] + assert "no ontologyTableRow docs" in result["_meta"]["reason"] diff --git a/infra/Dockerfile b/infra/Dockerfile index 42dd3b1..54163a2 100644 --- a/infra/Dockerfile +++ b/infra/Dockerfile @@ -35,13 +35,21 @@ RUN pip install -r backend/requirements.txt # imports. The runtime deps we DO need are listed in backend/requirements.txt # above (pandas, networkx, jsonschema, openMINDS, portalocker, h5py, requests). # -# These pip installs are split into individual layers so a change to one of the -# four git pins re-runs only that single layer (each is small + cached). -RUN pip install --no-deps "vhlab-toolbox-python @ git+https://github.com/VH-Lab/vhlab-toolbox-python.git@main" -RUN pip install --no-deps "did @ git+https://github.com/VH-Lab/DID-python.git@main" -RUN pip install --no-deps "ndr @ git+https://github.com/VH-lab/NDR-python.git@main" -RUN pip install --no-deps "ndi-compress @ git+https://github.com/Waltham-Data-Science/NDI-compress-python.git@main" -RUN pip install --no-deps "ndi @ git+https://github.com/Waltham-Data-Science/NDI-python.git@main" +# CRITICAL: pinned to specific git SHAs (NOT @main) to prevent silent upstream +# drift. If upstream pushes a breaking change to e.g. NDI-python's +# `ontology.lookup` return shape, our experimental Railway redeploys silently +# with the new code — and chats start returning wrong data with no signal. +# Pinning forces every drift to be an explicit Dockerfile edit + redeploy. +# Bump these SHAs by re-running `git ls-remote HEAD` and pasting the +# new hashes. See docs/plans/2026-05-13-ndi-python-integration.md for the +# overall integration plan. +# +# Pins captured 2026-05-13: +RUN pip install --no-deps "vhlab-toolbox-python @ git+https://github.com/VH-Lab/vhlab-toolbox-python.git@b073185565ea5b47bb0307cddeae923fa9b86268" +RUN pip install --no-deps "did @ git+https://github.com/VH-Lab/DID-python.git@1b1491fb98f37a61a74b86cccdfabae8b6bbce9e" +RUN pip install --no-deps "ndr @ git+https://github.com/VH-lab/NDR-python.git@896ed637c35cd8ba118e1512a8c65bdd634a7622" +RUN pip install --no-deps "ndi-compress @ git+https://github.com/Waltham-Data-Science/NDI-compress-python.git@0c05d9dbd63ed5d15866eb1bf0a096568ef0c192" +RUN pip install --no-deps "ndi @ git+https://github.com/Waltham-Data-Science/NDI-python.git@9c64acb13bfbe0baf7f6b40bee18925fd7d9117c" # Sanity: import every entry point we use in production so a missing # transitive dep fails the build, not the first request. RUN python -c "from vlt.file.custom_file_formats import vhsb_read, vhsb_readheader; \ @@ -55,7 +63,8 @@ USER ndb ENV ONTOLOGY_CACHE_DB_PATH=/tmp/ndb/ontology.db \ LOG_FORMAT=json \ WEB_CONCURRENCY=4 \ - PORT=8000 + PORT=8000 \ + NDI_PYTHON_REQUIRED=1 HEALTHCHECK --interval=30s --timeout=5s --start-period=60s --retries=3 \ CMD curl -fsS http://127.0.0.1:${PORT}/api/health || exit 1 From 3be7c96d73f61db5d8b2f17c5a621f990050844e Mon Sep 17 00:00:00 2001 From: audriB Date: Wed, 13 May 2026 21:59:10 -0400 Subject: [PATCH 03/54] fix(tabular_query): prefer numeric column when multiple match MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Smoke-testing the new endpoint against the experimental Railway deploy surfaced an issue with real-world data: ontologyTableRow tables routinely have multiple columns sharing a topic prefix (e.g. 'ElevatedPlusMaze: Test Identifier' + 'ElevatedPlusMaze: Open Arm Entries' + …). The first-match logic picked the test- identifier column → no numeric values → empty violin response ('no numeric values in matched column'). Fix: score each matching column by how many rows have finite- numeric values, and pick the column with the most numeric rows across all matching groups. Ties broken by first-seen order (groups are already sorted by row count desc upstream). Adds one test (test_violin_groups_prefers_numeric_column_over_ identifier) covering the real-world layout. All 19 tabular_query tests pass; 561 backend tests pass overall. Co-Authored-By: Claude Opus 4.7 (1M context) --- backend/services/tabular_query_service.py | 45 +++++++++++++++---- .../tests/unit/test_tabular_query_service.py | 34 ++++++++++++++ 2 files changed, 71 insertions(+), 8 deletions(-) diff --git a/backend/services/tabular_query_service.py b/backend/services/tabular_query_service.py index 880075d..8d36cce 100644 --- a/backend/services/tabular_query_service.py +++ b/backend/services/tabular_query_service.py @@ -157,20 +157,49 @@ def _find_matching_group( groups: list[dict[str, Any]], needle: str, ) -> tuple[dict[str, Any], str, str] | None: - """Locate the first ontologyTableRow group containing a column - whose `key` OR `label` contains the search substring. - - Returns (group, value_column_key, value_column_label) or None. + """Locate the best ontologyTableRow column matching the search + substring, preferring columns whose values are numeric. + + Real ontologyTableRow tables typically have multiple columns whose + names share the same topic prefix (e.g. ``ElevatedPlusMaze: Test + Identifier`` + ``ElevatedPlusMaze: Open Arm Entries`` + …). A naive + first-match would pick the identifier column → no numeric values → + empty violin. We instead score each matching column by how many + rows have finite-numeric values in it, and return the highest- + scoring column across all groups. + + Ties broken by first-seen order (group order is already sorted by + row count desc in SummaryTableService). """ needle_lower = needle.lower() + best: tuple[dict[str, Any], str, str, int] | None = None for g in groups: - cols = (g.get("table") or {}).get("columns") or [] + table = g.get("table") or {} + cols = table.get("columns") or [] + rows = table.get("rows") or [] for col in cols: key = str(col.get("key", "")) label = str(col.get("label", "")) - if needle_lower in key.lower() or needle_lower in label.lower(): - return g, key, label or key - return None + if needle_lower not in key.lower() and needle_lower not in label.lower(): + continue + numeric_count = sum(1 for row in rows if _is_finite_numeric(row.get(key))) + if numeric_count == 0: + continue + if best is None or numeric_count > best[3]: + best = (g, key, label or key, numeric_count) + if best is None: + return None + return best[0], best[1], best[2] + + +def _is_finite_numeric(v: Any) -> bool: + """Defensive coerce — `True` only when `v` parses to a finite float.""" + if v is None: + return False + try: + return math.isfinite(float(v)) + except (TypeError, ValueError): + return False def _bucket_rows( diff --git a/backend/tests/unit/test_tabular_query_service.py b/backend/tests/unit/test_tabular_query_service.py index cc4b0ea..909d492 100644 --- a/backend/tests/unit/test_tabular_query_service.py +++ b/backend/tests/unit/test_tabular_query_service.py @@ -280,6 +280,40 @@ async def test_violin_groups_empty_substring_returns_empty(): assert "empty" in result["_meta"]["reason"] +@pytest.mark.asyncio +async def test_violin_groups_prefers_numeric_column_over_identifier(): + """Real ontologyTableRow tables often have multiple columns sharing + a topic prefix (e.g. an identifier column + measurement columns). + The matcher must skip the non-numeric identifier and pick the + numeric measurement. + """ + columns = [ + {"key": "EPM_TestIdentifier", "label": "EPM: Test Identifier"}, + {"key": "EPM_OpenArmEntries", "label": "EPM: Open Arm Entries"}, + {"key": "treatment", "label": "treatment"}, + ] + rows = [ + {"EPM_TestIdentifier": "EPM-001", "EPM_OpenArmEntries": 5.0, "treatment": "Saline"}, + {"EPM_TestIdentifier": "EPM-002", "EPM_OpenArmEntries": 7.0, "treatment": "Saline"}, + {"EPM_TestIdentifier": "EPM-003", "EPM_OpenArmEntries": 3.0, "treatment": "CNO"}, + {"EPM_TestIdentifier": "EPM-004", "EPM_OpenArmEntries": 2.0, "treatment": "CNO"}, + ] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + # Search "EPM" matches BOTH identifier and entries; should pick + # the numeric one. + result = await svc.violin_groups( + "ds", "EPM", group_by="treatment", group_order=None, session=None, + ) + assert len(result["groups"]) == 2 # Saline + CNO + by_name = {g["name"]: g for g in result["groups"]} + assert by_name["Saline"]["mean"] == 6.0 # (5+7)/2 + assert by_name["CNO"]["mean"] == 2.5 # (3+2)/2 + # And the label should be the numeric column's label. + assert "Open Arm Entries" in result["yLabel"] + + @pytest.mark.asyncio async def test_violin_groups_no_ontology_docs_returns_empty(): svc = TabularQueryService( From 83a935856e8960e9e1f4919c689ff8d636204ebf Mon Sep 17 00:00:00 2001 From: audriB Date: Wed, 13 May 2026 22:04:44 -0400 Subject: [PATCH 04/54] feat(tabular_query): substring-match groupBy too (LLM ergonomics) Smoke-testing against real Dabrowska EPM data showed that the LLM naturally calls groupBy='Treatment' but the actual ontology column key is 'Treatment_CNOOrSalineAdministration'. Exact-match returns empty groups; the user/LLM has no way to know the canonical key. Fix: resolve groupBy via the same substring-match strategy as the value column. Exact key match wins; then substring against keys; then substring against labels. When nothing matches, return empty + a meta listing the column keys we DO have so the caller can retry. Verified live: 'Treatment' now resolves to 'Treatment_CNOOrSaline Administration', returning Saline (n=22, mean=5.86) + CNO (n=23, mean=5.09) for EPM open-arm-north entries. Tests: +2 cases (substring resolution + unresolvable-with-available). All 562 backend tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) --- backend/services/tabular_query_service.py | 49 ++++++++++++++++++- .../tests/unit/test_tabular_query_service.py | 48 ++++++++++++++++++ 2 files changed, 95 insertions(+), 2 deletions(-) diff --git a/backend/services/tabular_query_service.py b/backend/services/tabular_query_service.py index 8d36cce..dadb009 100644 --- a/backend/services/tabular_query_service.py +++ b/backend/services/tabular_query_service.py @@ -58,7 +58,7 @@ class TabularQueryService: def __init__(self, summary: SummaryTableService) -> None: self.summary = summary - async def violin_groups( + async def violin_groups( # noqa: PLR0911 (linear-control-flow with early-return per failure mode is clearer than a state machine) self, dataset_id: str, variable_name_contains: str, @@ -104,7 +104,28 @@ async def violin_groups( yLabel=value_label, ) - buckets, order_seen = _bucket_rows(rows, value_col, group_by) + # Resolve the groupBy column. Like the value column, callers + # rarely know the exact column key — substring-match against the + # group's columns (key OR label, case-insensitive). When the user + # leaves group_by unset, this returns None and the bucketing + # produces a single 'all' group. + resolved_group_col = ( + _resolve_group_column(group, group_by) if group_by else None + ) + if group_by and resolved_group_col is None: + return _empty_response( + group_by, + reason=f"no column matched groupBy '{group_by}' in the " + f"selected table", + yLabel=value_label, + available={"columns": [ + c.get("key") + for c in (group.get("table") or {}).get("columns") or [] + if c.get("key") != value_col + ][:20]}, + ) + + buckets, order_seen = _bucket_rows(rows, value_col, resolved_group_col) if not buckets: return _empty_response( group_by, @@ -192,6 +213,30 @@ def _find_matching_group( return best[0], best[1], best[2] +def _resolve_group_column(group: dict[str, Any], group_by: str) -> str | None: + """Resolve a possibly-imprecise group_by argument to an actual + column key in the matched group. + + Substring-match against `key` first (exact key wins immediately), + then against `label`. Returns None when nothing matches so the + caller can surface an explicit error. + """ + needle_lower = group_by.lower() + cols = (group.get("table") or {}).get("columns") or [] + # Exact key match wins immediately. + for col in cols: + if str(col.get("key", "")) == group_by: + return group_by + # Substring fallback — key first (more stable than labels). + for col in cols: + if needle_lower in str(col.get("key", "")).lower(): + return str(col["key"]) + for col in cols: + if needle_lower in str(col.get("label", "")).lower(): + return str(col["key"]) + return None + + def _is_finite_numeric(v: Any) -> bool: """Defensive coerce — `True` only when `v` parses to a finite float.""" if v is None: diff --git a/backend/tests/unit/test_tabular_query_service.py b/backend/tests/unit/test_tabular_query_service.py index 909d492..54f96aa 100644 --- a/backend/tests/unit/test_tabular_query_service.py +++ b/backend/tests/unit/test_tabular_query_service.py @@ -314,6 +314,54 @@ async def test_violin_groups_prefers_numeric_column_over_identifier(): assert "Open Arm Entries" in result["yLabel"] +@pytest.mark.asyncio +async def test_violin_groups_substring_groupby_resolves_to_column(): + """LLM rarely knows the exact column key. A `groupBy='Treatment'` + should resolve to `Treatment_CNOOrSalineAdministration` via + substring match. + """ + columns = [ + {"key": "value", "label": "Measurement"}, + {"key": "Treatment_CNOOrSalineAdministration", "label": "treatment: CNO or saline"}, + ] + rows = [ + {"value": 1.0, "Treatment_CNOOrSalineAdministration": "Saline"}, + {"value": 2.0, "Treatment_CNOOrSalineAdministration": "Saline"}, + {"value": 3.0, "Treatment_CNOOrSalineAdministration": "CNO"}, + {"value": 4.0, "Treatment_CNOOrSalineAdministration": "CNO"}, + ] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "Measurement", group_by="Treatment", group_order=None, session=None, + ) + assert len(result["groups"]) == 2 + by_name = {g["name"]: g for g in result["groups"]} + assert by_name["Saline"]["mean"] == 1.5 + assert by_name["CNO"]["mean"] == 3.5 + + +@pytest.mark.asyncio +async def test_violin_groups_unresolvable_groupby_returns_empty_with_available(): + """When groupBy doesn't match any column, return empty + the list + of available columns so the caller can retry.""" + columns = [ + {"key": "value", "label": "Measurement"}, + {"key": "strain", "label": "strain"}, + ] + rows = [{"value": 1.0, "strain": "N2"}] + svc = TabularQueryService( + _FakeSummaryService(_make_ontology_response(columns, rows)), # type: ignore[arg-type] + ) + result = await svc.violin_groups( + "ds", "Measurement", group_by="NotAColumn", group_order=None, session=None, + ) + assert result["groups"] == [] + assert "no column matched groupBy" in result["_meta"]["reason"] + assert "strain" in result["_meta"]["columns"] + + @pytest.mark.asyncio async def test_violin_groups_no_ontology_docs_returns_empty(): svc = TabularQueryService( From d62610cf59089172c97c2a821c2fa659fa45447b Mon Sep 17 00:00:00 2001 From: audriB Date: Thu, 14 May 2026 14:13:00 -0400 Subject: [PATCH 05/54] =?UTF-8?q?feat(backend):=20labchat=20wave=20?= =?UTF-8?q?=E2=80=94=20image,=20distinct=5Fsummary,=20Sprint=201.5=20bindi?= =?UTF-8?q?ng?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three backend additions wiring the new labchat chat tools. ## image_service + /api/datasets/:id/documents/:docId/image Pillow-based decoder for TIFF/PNG/JPEG/GIF documents (with NDI-native .nim formats flagged as 'unsupported' rather than failing). Downsamples >512px via Pillow thumbnail; multi-frame support via `frame=N`. Returns structured envelope with width/height/min/max/format + downsampled flag. Errors get errorKind={notfound, decode, unsupported}. Same SSRF hardening as existing binary download path (cloud-allowlisted hosts only). 18 unit tests. ## summary_table_service.distinct_summary Backend now computes per-column distinct value counts + top-K most common values across ALL rows of a class-table response (not just the page slice). Cached under the same TTL as columns+rows. Caps at DISTINCT_SUMMARY_MAX_ROWS=10_000 (skips with a meta sentinel for very large tables). Top-K = 5 per column. The LLM sees this as `distinctSummary` on the query_documents tool result. Existing test extended with TestHashable + TestBuildDistinctSummary covering: Dabrowska 49-row collapse case, multi-value top-K, None/dict/ malformed handling, skip-when-too-many path. ## Sprint 1.5: dataset_binding_service + /api/datasets/:id/ndi_overview Lazy in-memory LRU cache (max 5) wrapping ndi.cloud.orchestration. downloadDataset. Per-dataset asyncio.Lock prevents concurrent cold-load races. 90s cold-load timeout, 5GB on-disk soft cap with warn log. Returns: element_count, subject_count, total epoch_count across all elements, first 50 element {name, type} pairs, cache_hit + cache_age. Pre-warm at app boot (production/preview only) for the 3 demo datasets — fire-and-forget asyncio.create_task, cancellable on shutdown. is_ndi_available() extended via is_dataset_binding_available() probing ndi.dataset + ndi.cloud.orchestration imports. Endpoint surface: GET /api/datasets/:id/ndi_overview with limit_reads rate limit. 503 envelope { error, reason } on any failure so the frontend can fall back to ndi_query gracefully. 60s request timeout. 12 unit tests + 2 live-cloud integration tests (gated behind LIVE_NDI_TESTS=1). ## NDI cache directory NDI_CACHE_DIR env (default /tmp/ndi-cache). Already-writable by the ndb user in the Dockerfile. /tmp is ephemeral on Railway redeploys — pre-warm at boot mitigates for the 3 demo datasets; user-driven cold-loads of other datasets pay the 10-30s tax. ## Open risk downloadDataset against the published demo datasets ANONYMOUSLY (no service-account token) is unverified in the Railway environment. If auth is required, the 503 fallback keeps chat usable. Plan B: add a service-account token via Railway env vars and retry. Combined backend test count: 603 unit + 110 integration (was 562 unit + 79 integration pre-wave). All green. --- backend/app.py | 81 +++ backend/routers/_deps.py | 18 + backend/routers/image.py | 93 ++++ backend/routers/ndi_dataset.py | 125 +++++ backend/services/dataset_binding_service.py | 464 ++++++++++++++++++ backend/services/image_service.py | 312 ++++++++++++ backend/services/ndi_python_service.py | 52 +- backend/services/summary_table_service.py | 133 ++++- .../integration/test_dataset_binding_live.py | 83 ++++ .../unit/test_dataset_binding_service.py | 367 ++++++++++++++ backend/tests/unit/test_image_service.py | 339 +++++++++++++ .../unit/test_summary_table_projection.py | 165 +++++++ 12 files changed, 2213 insertions(+), 19 deletions(-) create mode 100644 backend/routers/image.py create mode 100644 backend/routers/ndi_dataset.py create mode 100644 backend/services/dataset_binding_service.py create mode 100644 backend/services/image_service.py create mode 100644 backend/tests/integration/test_dataset_binding_live.py create mode 100644 backend/tests/unit/test_dataset_binding_service.py create mode 100644 backend/tests/unit/test_image_service.py diff --git a/backend/app.py b/backend/app.py index dbbcba3..bb3295d 100644 --- a/backend/app.py +++ b/backend/app.py @@ -41,6 +41,8 @@ datasets, documents, health, + image, + ndi_dataset, ontology, query, signal, @@ -48,6 +50,7 @@ tabular_query, visualize, ) +from .services.dataset_binding_service import DatasetBindingService from .services.ontology_cache import OntologyCache from .services.ontology_service import OntologyService from .static_files import safe_static_path @@ -282,6 +285,72 @@ async def _facets_warm() -> None: ) log.info("ndi_python.boot_ok") + # Sprint 1.5 dataset-binding service — singleton, lives on app.state. + # Always instantiated (cheap object — empty LRU). The router behind + # ``/api/datasets/{id}/ndi_overview`` calls into it; on internal + # failure (NDI-python missing, cloud unreachable, etc.) the service + # returns None and the router maps that to a 503. Frontend tool + # falls back to ndi_query gracefully. + app.state.dataset_binding_service = DatasetBindingService() + + # Optional pre-warm of the 3 demo datasets. We fire-and-forget per + # dataset so a single failure doesn't block the others. Each task + # is parked on app.state so asyncio doesn't GC the reference + # mid-flight (RUF006). We DO NOT await them — they run in the + # background while the app starts serving requests immediately. + # + # If NDI-python isn't available, the service returns None on the + # first call and we skip the rest — costs essentially nothing. + async def _prewarm_dataset(dataset_id: str) -> None: + try: + log.info("dataset_binding.prewarm_start", dataset_id=dataset_id) + result = await app.state.dataset_binding_service.get_dataset( + dataset_id + ) + if result is not None: + log.info( + "dataset_binding.prewarm_done", + dataset_id=dataset_id, + ) + else: + # Service already logged the reason at WARN — keep this + # at INFO so the boot timeline is one-line-per-dataset. + log.info( + "dataset_binding.prewarm_skipped", + dataset_id=dataset_id, + ) + except _asyncio.CancelledError: + raise + except Exception as exc: + # Truly defensive: get_dataset() is documented to never + # raise, but log loudly if that contract breaks so we know. + log.warning( + "dataset_binding.prewarm_unexpected_raise", + dataset_id=dataset_id, + error=str(exc), + error_type=type(exc).__name__, + ) + + # Three demo datasets surfaced by the experimental /ask chat: + # Dabrowska BNST (EPM behavior), Bhar (chemotaxis), Haley + # (patch-encounter). Order does not matter; tasks run concurrently. + # Pre-warm is gated to production-like environments so dev/test + # boots stay fast. + if settings.ENVIRONMENT in ("production", "preview"): + prewarm_ids = ( + "67f723d574f5f79c6062389d", # Dabrowska BNST + "69bc5ca11d547b1f6d083761", # Bhar + "682e7772cdf3f24938176fac", # Haley + ) + app.state.dataset_binding_prewarm_tasks = [ + _asyncio.create_task(_prewarm_dataset(did)) + for did in prewarm_ids + ] + log.info( + "dataset_binding.prewarm_started", + count=len(prewarm_ids), + ) + log.info("app.startup", environment=settings.ENVIRONMENT) try: yield @@ -299,6 +368,16 @@ async def _facets_warm() -> None: # so it surfaces in logs instead of disappearing. with _contextlib.suppress(_asyncio.CancelledError): await task + # Cancel any in-flight dataset-binding pre-warm tasks. + # downloadDataset is blocking I/O inside asyncio.to_thread — we + # can't actually interrupt it mid-thread, but cancellation + # prevents the post-download cache-write from running after + # teardown. + prewarm_tasks = getattr(app.state, "dataset_binding_prewarm_tasks", None) or [] + for task in prewarm_tasks: + task.cancel() + with _contextlib.suppress(_asyncio.CancelledError, Exception): + await task await cloud_client.close() await ontology_service.close() # `redis.asyncio.Redis.aclose()` is the correct async-context @@ -451,7 +530,9 @@ async def handle_unhandled(request: Request, exc: Exception) -> JSONResponse: app.include_router(query.facets_router) app.include_router(binary.router) app.include_router(signal.router) + app.include_router(image.router) app.include_router(tabular_query.router) + app.include_router(ndi_dataset.router) app.include_router(ontology.router) app.include_router(visualize.router) diff --git a/backend/routers/_deps.py b/backend/routers/_deps.py index 4bddcd6..72b06f6 100644 --- a/backend/routers/_deps.py +++ b/backend/routers/_deps.py @@ -10,6 +10,7 @@ from ..clients.ndi_cloud import NdiCloudClient from ..middleware.rate_limit import Limit, RateLimiter from ..services.binary_service import BinaryService +from ..services.dataset_binding_service import DatasetBindingService from ..services.dataset_provenance_service import DatasetProvenanceService from ..services.dataset_service import DatasetService from ..services.dataset_summary_service import ( @@ -19,6 +20,7 @@ from ..services.dependency_graph_service import DependencyGraphService from ..services.document_service import DocumentService from ..services.facet_service import FacetService +from ..services.image_service import ImageService from ..services.ontology_service import OntologyService from ..services.pivot_service import PivotService from ..services.query_service import QueryService @@ -122,6 +124,10 @@ def binary_service(request: Request) -> BinaryService: return BinaryService(cloud(request)) +def image_service(request: Request) -> ImageService: + return ImageService(cloud(request)) + + def visualize_service(request: Request) -> VisualizeService: return VisualizeService(cloud(request)) @@ -130,6 +136,18 @@ def ontology_service(request: Request) -> OntologyService: return request.app.state.ontology_service # type: ignore[no-any-return] +def dataset_binding_service(request: Request) -> DatasetBindingService: + """Return the singleton DatasetBindingService held on app.state. + + The service owns an in-memory LRU of materialized ndi.dataset.Dataset + objects + per-id locks for download coalescing — both must persist + across requests, so this MUST resolve to the shared instance, not a + new one per call. Lifespan wires + ``app.state.dataset_binding_service`` at startup. + """ + return request.app.state.dataset_binding_service # type: ignore[no-any-return] + + # --- Rate-limit helpers --- async def _subject( diff --git a/backend/routers/image.py b/backend/routers/image.py new file mode 100644 index 0000000..10ac9ad --- /dev/null +++ b/backend/routers/image.py @@ -0,0 +1,93 @@ +"""Image endpoint for the experimental /ask chat's ``fetch_image`` tool. + +GET /api/datasets/{dataset_id}/documents/{document_id}/image + ?frame=N (multi-frame TIFF / animated GIF frame index; default 0) + +The route fetches the document's primary image file, decodes it via +Pillow (supports TIFF/PNG/JPEG/GIF auto-detect), converts to a 2D +grayscale float array, and returns the array plus min/max for Plotly's +heatmap colorscale. + +Targets the patch-encounter map / fluorescence image / cell-image use +cases for Haley accept-reject-foraging and Bhar memory datasets. PIs +asking "show me the encounter map" now get an inline heatmap instead of +"that's not currently supported". + +Soft errors (decode failure, missing file, unsupported format) surface +as ``{"error", "errorKind"}`` — the chat tool inspects the envelope and +the LLM tells the user plainly rather than emitting a chart fence. + +This is a NEW additive endpoint. Anonymous-readable. 60s timeout (large +TIFFs from the cloud can be slow to download). +""" +from __future__ import annotations + +from typing import Annotated, Any + +from fastapi import APIRouter, Depends, Query + +from ..auth.dependencies import get_current_session +from ..auth.session import SessionData +from ..services.document_service import DocumentService +from ..services.image_service import ImageService +from ._deps import document_service, image_service, limit_reads +from ._validators import DatasetId, DocumentId + +router = APIRouter( + prefix="/api/datasets/{dataset_id}/documents/{document_id}", + tags=["image"], + dependencies=[Depends(limit_reads)], +) + + +@router.get("/image") +async def get_image( + dataset_id: DatasetId, + document_id: DocumentId, + docs: Annotated[DocumentService, Depends(document_service)], + svc: Annotated[ImageService, Depends(image_service)], + session: Annotated[SessionData | None, Depends(get_current_session)], + frame: Annotated[ + int, + Query( + ge=0, + le=10_000, + description=( + "Frame index for multi-frame containers (TIFF stack, " + "animated GIF). Defaults to 0 (first frame). Out-of-range " + "values clamp to the last frame and log a warning." + ), + ), + ] = 0, +) -> dict[str, Any]: + """Return a 2D image array with provenance. + + Response shape (success):: + + { + "width": int, + "height": int, + "data": [[float, ...], ...], + "min": float, + "max": float, + "format": "tiff" | "png" | "jpeg" | "...", + "downsampled": bool, + "source": { + "dataset_id": str, + "document_id": str, + "doc_class": str | None, + "doc_name": str | None, + "filename": str | None, + } + } + + Response shape (soft error):: + + {"error": "...", "errorKind": "notfound|decode|unsupported"} + """ + document = await docs.detail( + dataset_id, + document_id, + access_token=session.access_token if session else None, + ) + return await svc.fetch_image(document, frame=frame, session=session) diff --git a/backend/routers/ndi_dataset.py b/backend/routers/ndi_dataset.py new file mode 100644 index 0000000..af7b5ae --- /dev/null +++ b/backend/routers/ndi_dataset.py @@ -0,0 +1,125 @@ +"""ndi_dataset router — Sprint 1.5 cloud-backed dataset binding endpoint. + +GET /api/datasets/{dataset_id}/ndi_overview + Returns a high-level summary (element / subject / epoch counts + + element listing) computed by traversing a LOCAL + ``ndi.dataset.Dataset`` materialized from the cloud's documents. + +Failure posture (deliberate): when the binding can't produce a value — +NDI-python missing, downloadDataset timed out, cloud unreachable, +anything — the endpoint returns **HTTP 503** with a JSON envelope so +the chat tool can gracefully fall back to its existing ``ndi_query`` +path. Callers should NOT treat 503 as a hard failure. + +Why a separate router rather than folding into ``datasets.py``: +1. The Sprint 1.5 binding is OPTIONAL infrastructure — keeping it in + its own module makes it trivial to disable (just unmount the + router) if the cloud auth / Mongo download path fails in + production. +2. The endpoint has a different latency posture (cold loads up to + 90s) — visible isolation helps with metrics + rate-limit reasoning. +""" +from __future__ import annotations + +import asyncio +from typing import Annotated, Any + +from fastapi import APIRouter, Depends +from fastapi.responses import JSONResponse + +from ..observability.logging import get_logger +from ..services.dataset_binding_service import DatasetBindingService +from ._deps import dataset_binding_service, limit_reads +from ._validators import DatasetId + +log = get_logger(__name__) + + +# Per-call wall-clock cap. Cold loads can take 10-30s for the demo +# datasets; we allow up to 60s before surfacing a 503 so the chat +# doesn't hang. The service's own ``COLD_LOAD_TIMEOUT_SECONDS`` is 90s +# — that's the BACKGROUND limit (warm/pre-warm tasks). This per- +# request cap is stricter so a user-facing request never blocks the +# router for ~90s. +REQUEST_TIMEOUT_SECONDS = 60.0 + + +router = APIRouter( + prefix="/api/datasets/{dataset_id}", + tags=["ndi_dataset"], + dependencies=[Depends(limit_reads)], +) + + +@router.get("/ndi_overview") +async def ndi_overview( + dataset_id: DatasetId, + svc: Annotated[DatasetBindingService, Depends(dataset_binding_service)], +) -> Any: + """High-level dataset summary computed by NDI-python's SDK. + + Returns a dict shape on success: + + { + element_count: int, + subject_count: int, + epoch_count: int, + elements: [{name, type}], # capped at 50 + elements_truncated: bool, + reference: str, + cache_hit: bool, + cache_age_seconds: float, + } + + Returns 503 with ``{error, reason}`` on any failure (binding + unavailable, cold-load timeout, cloud unreachable). The chat tool + layer translates 503 → graceful fallback prompt. + """ + try: + result = await asyncio.wait_for( + svc.overview(dataset_id), + timeout=REQUEST_TIMEOUT_SECONDS, + ) + except TimeoutError: + log.warning( + "ndi_dataset.overview.request_timeout", + dataset_id=dataset_id, + timeout_seconds=REQUEST_TIMEOUT_SECONDS, + ) + return JSONResponse( + status_code=503, + content={ + "error": "dataset binding unavailable", + "reason": ( + f"overview computation exceeded {REQUEST_TIMEOUT_SECONDS:.0f}s " + "wall clock; try again in a moment" + ), + }, + ) + except Exception as exc: # blind — must not 500 a user request + log.warning( + "ndi_dataset.overview.unexpected_failure", + dataset_id=dataset_id, + error=str(exc), + error_type=type(exc).__name__, + ) + return JSONResponse( + status_code=503, + content={ + "error": "dataset binding unavailable", + "reason": "binding raised an unexpected error", + }, + ) + + if result is None: + return JSONResponse( + status_code=503, + content={ + "error": "dataset binding unavailable", + "reason": ( + "NDI-python dataset materialization failed or is not " + "configured on this server" + ), + }, + ) + return result diff --git a/backend/services/dataset_binding_service.py b/backend/services/dataset_binding_service.py new file mode 100644 index 0000000..1657685 --- /dev/null +++ b/backend/services/dataset_binding_service.py @@ -0,0 +1,464 @@ +"""dataset_binding_service — Sprint 1.5 cloud-backed ``ndi.dataset.Dataset`` +binding for the experimental ``/ask`` chat. + +The chat already has a structured ``ndi_query`` tool that proxies to the +cloud-node Mongo layer. This service adds ONE more capability: surfacing +SDK-level abstractions (``dataset.elements()``, ``element.epochs()``, +session traversal) over a LOCAL copy of the dataset that NDI-python has +materialized via :func:`ndi.cloud.orchestration.downloadDataset`. + +Why local materialization? + Most useful summary numbers — element count, total epoch count + across elements, list of (name, type) tuples — are computed by the + SDK by walking ``element`` + ``element_epoch`` docs and traversing + dependencies. The cloud-node's ``/ndiquery`` endpoint returns raw + docs but doesn't perform that traversal. Spinning up a real + ``ndi.dataset.Dataset`` once, in-process, lets us answer these + "how many X are there?" questions cheaply. + +Lifecycle +───────── +- First call for a given dataset_id is a cold load (10-30s typical for + the demo datasets). The cold path runs ``downloadDataset`` in a + thread so it doesn't block the asyncio loop. +- Subsequent calls are warm hits: bounded by an in-memory LRU. +- Cache is keyed by dataset_id; eviction at MAX_CACHED_DATASETS. +- Concurrent calls for the SAME dataset coalesce on an + :class:`asyncio.Lock` so we never download twice in parallel. + +Cache target folder +─────────────────── +- Env-var ``NDI_CACHE_DIR`` (default ``/tmp/ndi-cache``). +- Per-dataset subfolder under that root; downloadDataset itself appends + the dataset_id, so the on-disk layout is + ``//.ndi/…``. +- /tmp is ephemeral on Railway (no persistent volume requested for + this task) — that's fine for the demo. Entries get rebuilt after a + redeploy and the pre-warm tasks fan out automatically. + +Failure posture +─────────────── +- Every public method NEVER raises. On any internal failure they log + a warning and return None. Callers (the FastAPI router) treat None + as "binding unavailable" → 503 → frontend tool falls back to + /ndiquery. Safety > completeness. +""" + +from __future__ import annotations + +import asyncio +import os +import time +from collections import OrderedDict +from pathlib import Path +from typing import Any + +from ..observability.logging import get_logger + +log = get_logger(__name__) + + +# --------------------------------------------------------------------------- +# Tunables (module-level so tests can monkeypatch them) +# --------------------------------------------------------------------------- + +# Max simultaneously-cached datasets. Each cached dataset holds: +# - the Python ndi_dataset_dir object (~MB-scale heap), +# - its on-disk .ndi store under NDI_CACHE_DIR//. +# 5 is enough to cover the 3 demo datasets + headroom for occasional +# user-driven calls without ballooning memory or disk. +MAX_CACHED_DATASETS = 5 + +# Per-cold-load wall-clock cap. downloadDataset is mostly I/O-bound +# (bulk Mongo fetch + S3 presign rewrites). 90s gives slow networks +# enough rope; longer than that and we'd rather surface the failure to +# the caller than hold the request handler open. +COLD_LOAD_TIMEOUT_SECONDS = 90.0 + +# Soft size budget for the on-disk cache. We log a warning when the +# cache exceeds this; we don't auto-prune because eviction-on-LRU +# already bounds growth in the steady state. The warning is the +# operator hint that something's leaking. +CACHE_DIR_SOFT_LIMIT_BYTES = 5 * 1024 * 1024 * 1024 # 5 GB + +# Max elements surfaced in the overview payload. The LLM token budget +# won't survive a 500-element table; the dataset-detail page already +# shows the full element list, the chat is the wrong place for it. +MAX_ELEMENTS_IN_OVERVIEW = 50 + + +# --------------------------------------------------------------------------- +# Internal cache entry — keeps the dataset object + bookkeeping together +# --------------------------------------------------------------------------- + + +class _CacheEntry: + """Single LRU slot. ``dataset`` is the NDI-python object; ``loaded_at`` + powers the ``cache_age_seconds`` field in the overview response. + + Mutable on purpose: ``DatasetBindingService`` rewrites ``loaded_at`` + on every warm hit so the LRU ordering reflects recency-of-use, not + recency-of-load. + """ + + __slots__ = ("dataset", "first_loaded_at", "loaded_at") + + def __init__(self, dataset: Any) -> None: + now = time.monotonic() + self.dataset = dataset + self.loaded_at = now + self.first_loaded_at = now + + +class DatasetBindingService: + """LRU-cached wrapper around :func:`ndi.cloud.orchestration.downloadDataset`. + + Public surface is two coroutines: :meth:`get_dataset` and + :meth:`overview`. Both swallow all exceptions and return ``None`` + on any failure so the router can map that to a 503 and the chat + falls through to its existing tools. + """ + + def __init__(self, *, cache_dir: str | None = None) -> None: + self._cache: OrderedDict[str, _CacheEntry] = OrderedDict() + # Per-dataset locks coalesce concurrent get_dataset() calls so + # two requests for the same id share a single download. + self._locks: dict[str, asyncio.Lock] = {} + # Global lock guards _locks dict mutation and LRU eviction. + self._mutex = asyncio.Lock() + self._cache_dir = Path( + cache_dir or os.environ.get("NDI_CACHE_DIR", "/tmp/ndi-cache") + ) + + # ------------------------------------------------------------------ + # Public API + # ------------------------------------------------------------------ + + async def get_dataset(self, dataset_id: str) -> Any | None: + """Return the cached ndi.dataset.Dataset for ``dataset_id``. + + Cold path: downloads (in a worker thread) under + ``//`` and caches the result. + Warm path: returns the cached object instantly + updates LRU + position. + + Returns ``None`` on any failure (NDI-python unavailable, + download timeout, exception during construction, etc.) — never + raises. The router translates None → 503; the frontend tool + translates 503 → "binding still warming, try ndi_query". + """ + if not dataset_id: + return None + + async with self._mutex: + existing = self._cache.get(dataset_id) + if existing is not None: + # LRU bump + warm-hit log. + self._cache.move_to_end(dataset_id) + existing.loaded_at = time.monotonic() + log.info( + "dataset_binding.warm_hit", + dataset_id=dataset_id, + cache_size=len(self._cache), + ) + return existing.dataset + # No cache entry. Acquire/create the per-dataset lock. + per_lock = self._locks.setdefault(dataset_id, asyncio.Lock()) + + # Hold the per-dataset lock to deduplicate concurrent cold + # loads. After acquiring, re-check the cache — another caller + # may have populated it while we waited. + async with per_lock: + async with self._mutex: + existing = self._cache.get(dataset_id) + if existing is not None: + self._cache.move_to_end(dataset_id) + log.info( + "dataset_binding.warm_hit_after_wait", + dataset_id=dataset_id, + ) + return existing.dataset + + dataset = await self._cold_load(dataset_id) + if dataset is None: + return None + + async with self._mutex: + self._cache[dataset_id] = _CacheEntry(dataset) + self._cache.move_to_end(dataset_id) + self._evict_lru_if_needed() + return dataset + + async def overview(self, dataset_id: str) -> dict[str, Any] | None: + """High-level summary: element / subject / epoch counts + element + listing. See module docstring for why this matters. + + Returns ``None`` if the binding is unavailable. Callers route + that to a 503. + """ + if not dataset_id: + return None + + # cache_hit reflects whether get_dataset hit a warm slot. + # Capture BEFORE the call so we can tell cold from warm. + async with self._mutex: + had_entry = dataset_id in self._cache + + dataset = await self.get_dataset(dataset_id) + if dataset is None: + return None + + # Pull cache age (now in seconds) — after get_dataset() the + # entry's loaded_at was bumped, so first_loaded_at gives us + # the actual age since cold-load. + async with self._mutex: + entry = self._cache.get(dataset_id) + cache_age_seconds = ( + time.monotonic() - entry.first_loaded_at if entry else 0.0 + ) + + # Compute the actual overview off the event loop. Most of the + # work is pure-Python iteration over the in-memory database, + # but element.epochtable() may trigger file I/O for ingested + # epochs. Threadpool it to be safe. + try: + payload: dict[str, Any] | None = await asyncio.to_thread( + self._compute_overview, dataset + ) + except Exception as exc: # blind — overview must never raise + log.warning( + "dataset_binding.overview_failed", + dataset_id=dataset_id, + error=str(exc), + error_type=type(exc).__name__, + ) + return None + + if payload is None: + return None + + payload["cache_hit"] = had_entry + payload["cache_age_seconds"] = round(cache_age_seconds, 2) + return payload + + # ------------------------------------------------------------------ + # Cold-load + overview computation (run off the event loop) + # ------------------------------------------------------------------ + + async def _cold_load(self, dataset_id: str) -> Any | None: + """Run ``downloadDataset`` in a worker thread with a wall-clock cap.""" + from . import ndi_python_service + + if not ndi_python_service.is_ndi_available(): + log.warning( + "dataset_binding.ndi_unavailable", + dataset_id=dataset_id, + ) + return None + + # Ensure the cache root exists before handing it to + # downloadDataset (which mkdirs its own per-dataset subfolder + # but assumes the parent is writable). + try: + self._cache_dir.mkdir(parents=True, exist_ok=True) + except OSError as exc: + log.warning( + "dataset_binding.cache_dir_unwritable", + cache_dir=str(self._cache_dir), + error=str(exc), + ) + return None + + log.info( + "dataset_binding.cold_load_start", + dataset_id=dataset_id, + cache_dir=str(self._cache_dir), + ) + start = time.monotonic() + try: + dataset = await asyncio.wait_for( + asyncio.to_thread(self._download_blocking, dataset_id), + timeout=COLD_LOAD_TIMEOUT_SECONDS, + ) + except TimeoutError: + log.warning( + "dataset_binding.cold_load_timeout", + dataset_id=dataset_id, + timeout_seconds=COLD_LOAD_TIMEOUT_SECONDS, + ) + return None + except Exception as exc: # blind — cold load must never raise + log.warning( + "dataset_binding.cold_load_failed", + dataset_id=dataset_id, + error=str(exc), + error_type=type(exc).__name__, + ) + return None + + duration_seconds = time.monotonic() - start + log.info( + "dataset_binding.cold_load", + dataset_id=dataset_id, + duration_seconds=round(duration_seconds, 2), + ) + + # Best-effort: warn if the on-disk cache has grown past the + # soft budget. Computed lazily so we don't du -sh on every + # call; cheap enough at cold-load granularity. + self._warn_if_cache_oversized() + return dataset + + def _download_blocking(self, dataset_id: str) -> Any: + """Synchronous downloadDataset call. Lives in a thread. + + Lazy-import ndi.cloud here so this module stays cheap to import + even when NDI-python isn't installed (test/CI matrix). + """ + from ndi.cloud.orchestration import downloadDataset + return downloadDataset( + dataset_id, + str(self._cache_dir), + sync_files=False, + ) + + def _compute_overview(self, dataset: Any) -> dict[str, Any] | None: + """Walk the dataset and return the LLM-facing summary dict. + + Runs on a worker thread. Tolerant of partial failures: each + sub-count is wrapped in its own try/except so one missing + traversal doesn't blank the whole payload. + """ + # ------ element listing + count ------ + elements: list[Any] = [] + element_count = 0 + element_listing: list[dict[str, str]] = [] + try: + session = getattr(dataset, "_session", None) + if session is not None and hasattr(session, "getelements"): + elements = list(session.getelements()) or [] + element_count = len(elements) + for elem in elements[:MAX_ELEMENTS_IN_OVERVIEW]: + name = str(getattr(elem, "name", "") or "") + etype = str(getattr(elem, "type", "") or "") + if name or etype: + element_listing.append({"name": name, "type": etype}) + except Exception as exc: + log.warning( + "dataset_binding.element_listing_failed", + error=str(exc), + error_type=type(exc).__name__, + ) + elements = [] + element_count = 0 + element_listing = [] + + # ------ subject count via isa('subject') search ------ + # The ndi_query import is inside the try so a missing SDK + # version on a dev machine downgrades subject_count to 0 + # without blanking the rest of the payload. + subject_count = 0 + try: + from ndi.query import ndi_query + subj_docs = dataset.database_search( + ndi_query("").isa("subject") + ) + subject_count = len(subj_docs) if subj_docs is not None else 0 + except Exception as exc: + log.warning( + "dataset_binding.subject_count_failed", + error=str(exc), + error_type=type(exc).__name__, + ) + + # ------ epoch count via per-element epochtable ------ + # We sum across ALL elements (not just the first + # MAX_ELEMENTS_IN_OVERVIEW) to preserve count fidelity. Each + # element's numepochs() walks the in-memory epoch table; the + # cost is bounded by element count * avg epochs. + epoch_count = 0 + try: + for elem in elements: + try: + if hasattr(elem, "numepochs"): + epoch_count += int(elem.numepochs()) + else: + et, _ = elem.epochtable() + epoch_count += len(et) if et else 0 + except Exception as exc: + # Per-element failure: log but keep counting. + log.debug( + "dataset_binding.element_epoch_count_failed", + element_name=str(getattr(elem, "name", "")), + error=str(exc), + ) + except Exception as exc: + log.warning( + "dataset_binding.epoch_count_failed", + error=str(exc), + error_type=type(exc).__name__, + ) + + # ------ dataset reference (for citation snippet) ------ + reference = "" + try: + reference = str(getattr(dataset, "reference", "") or "") + except Exception: + reference = "" + + return { + "element_count": element_count, + "subject_count": subject_count, + "epoch_count": epoch_count, + "elements": element_listing, + "elements_truncated": element_count > len(element_listing), + "reference": reference, + } + + # ------------------------------------------------------------------ + # LRU eviction + disk-usage guard + # ------------------------------------------------------------------ + + def _evict_lru_if_needed(self) -> None: + """Drop the least-recently-used entry when the cache is full. + + Called under self._mutex. We don't unlink the on-disk folder + of the evicted dataset — leaving it lets a later cold-load + skip the network entirely (downloadDataset reuses an existing + target folder if the JSONs are already there). + """ + while len(self._cache) > MAX_CACHED_DATASETS: + oldest_id, _ = self._cache.popitem(last=False) + self._locks.pop(oldest_id, None) + log.info( + "dataset_binding.evicted", + dataset_id=oldest_id, + cache_size=len(self._cache), + ) + + def _warn_if_cache_oversized(self) -> None: + """Best-effort disk-usage check. Walks the cache dir once per + cold load; cheap relative to a download but still bounded. + """ + try: + if not self._cache_dir.exists(): + return + total = 0 + for path in self._cache_dir.rglob("*"): + try: + if path.is_file(): + total += path.stat().st_size + except OSError: + continue + if total > CACHE_DIR_SOFT_LIMIT_BYTES: + log.warning( + "dataset_binding.cache_dir_oversized", + cache_dir=str(self._cache_dir), + size_bytes=total, + limit_bytes=CACHE_DIR_SOFT_LIMIT_BYTES, + ) + except Exception as exc: + log.debug( + "dataset_binding.cache_dir_size_check_failed", + error=str(exc), + ) diff --git a/backend/services/image_service.py b/backend/services/image_service.py new file mode 100644 index 0000000..6f5e3cb --- /dev/null +++ b/backend/services/image_service.py @@ -0,0 +1,312 @@ +"""image_service — fetch + decode 2D image arrays from NDI binary documents. + +Used by the chat's ``fetch_image`` tool to render microscopy / fluorescence / +patch-encounter maps inline as Plotly heatmaps. The PI workflow is: + + "show me the patch encounter map for the Haley accept-reject dataset" + "show me the cell image from this Bhar memory recording" + +Returns a 2D array of floats (one row = one image row), plus min/max for +colorscale anchoring and a source provenance block for citation. + +Why a separate service (not a method on BinaryService)? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +``BinaryService.get_image`` already exists but returns a base64-encoded +PNG/JPEG datauri for the Document Explorer's image viewer. The chat +needs the actual pixel array so Plotly can render it as a heatmap with +its own colorscale, tooltips, and axis-scaling — a datauri is opaque to +Plotly. Keeping the two paths separate avoids cross-coupling: the +viewer endpoint can keep its base64 contract; the chat endpoint gets +a clean float-array shape. + +NDI-native raw image formats (``.nim`` and friends) are NOT yet handled +here — Pillow handles TIFF/PNG/JPEG/GIF which covers the demo datasets. +A future enhancement will route raw-uint8 imageStack files through the +existing ``imageStack_parameters`` sidecar pattern (same shape +``BinaryService.get_raw`` already supports for the Document Explorer). +For now those return ``errorKind="unsupported"``. + +Returned dict shape on success:: + + { + "width": int, + "height": int, + "data": [[float, ...], ...], # height x width + "min": float, + "max": float, + "format": "tiff" | "png" | "jpeg" | ..., + "downsampled": bool, # True if thumbnailed to <= 512x512 + "source": { + "dataset_id": str, + "document_id": str, + "doc_class": str | None, + "doc_name": str | None, + "filename": str | None, + }, + } + +Soft-error envelope (no raise):: + + {"error": "...", "errorKind": "decode|notfound|unsupported"} +""" +from __future__ import annotations + +import io +from typing import TYPE_CHECKING, Any + +from ..auth.session import SessionData +from ..clients.ndi_cloud import NdiCloudClient +from ..observability.logging import get_logger +from .binary_service import _file_refs + +if TYPE_CHECKING: # pragma: no cover + from PIL import Image as _PILImage # noqa: F401 + +log = get_logger(__name__) + +# Downsample threshold — Plotly heatmaps slow noticeably above ~512x512 +# (each pixel becomes a hover target). The chat surface is small anyway +# (~600 px wide in a typical message); a 512px thumbnail is sharper than +# the rendered size with room for retina-class displays. +MAX_DIMENSION = 512 + + +class ImageService: + """Decode 2D image arrays from NDI binary documents for chat rendering. + + Reuses BinaryService's file-ref extraction (handles the three observed + cloud document shapes) plus the cloud client's SSRF-hardened download + path. Pillow does the format dispatch — TIFF, PNG, JPEG, GIF all flow + through ``Image.open`` cleanly. + """ + + def __init__(self, cloud: NdiCloudClient) -> None: + self.cloud = cloud + + async def fetch_image( + self, + document: dict[str, Any], + *, + frame: int = 0, + session: SessionData | None = None, + ) -> dict[str, Any]: + """Fetch + decode the primary image file on ``document``. + + ``frame`` selects which frame to extract from a multi-frame TIFF / + animated GIF. Out-of-range frames clamp to (0, n_frames-1) and a + warning is logged. + + Returns a dict matching the module-docstring shape on success, or + a ``{"error", "errorKind"}`` envelope on a soft failure. The + envelope shape matches BinaryService's ``_timeseries_error`` so + the router can pass it through without re-shaping. + """ + refs = _file_refs(document) + if not refs: + return _image_error( + "notfound", + "No image file associated with this document.", + ) + + ref = refs[0] + if not ref.url: + return _image_error( + "notfound", + "No download URL available for this image file.", + ) + + access_token = session.access_token if session else None + try: + payload = await self.cloud.download_file( + ref.url, access_token=access_token, + ) + except Exception as e: + log.warning("image_service.download_failed", error=str(e)) + return _image_error( + "notfound", f"Failed to download image file: {e}", + ) + + return _decode_image( + payload, + frame=frame, + filename=ref.filename, + source=_source_block(document, filename=ref.filename), + ) + + +# --------------------------------------------------------------------------- +# Decode helpers — pure functions, no I/O. Tests exercise these directly +# with fixture bytes so the cloud-download stub stays minimal. +# --------------------------------------------------------------------------- + + +def _decode_image( # noqa: PLR0911, PLR0912 (linear per-failure-mode returns are clearer than a single accumulator; the branch count is one return per failure mode plus the success path) + payload: bytes, + *, + frame: int, + filename: str | None, + source: dict[str, Any], +) -> dict[str, Any]: + """Decode a raw image payload to a 2D float array. + + Pillow handles TIFF / PNG / JPEG / GIF auto-detect. Multi-channel + (RGB / RGBA) images are converted to grayscale via Pillow's ``"L"`` + mode — a heatmap renders a single channel, and Plotly's colorscale + is a more useful visual than three superimposed channels would be + for the typical microscopy / patch-encounter use case. + + For raw NDI-native image formats (.nim, .imageStack) Pillow will + raise — we surface as ``unsupported`` and the caller can prompt the + user to check back later or open the Document Explorer. + """ + if not payload: + return _image_error("notfound", "Image file is empty.") + + try: + # Lazy-import Pillow — matches BinaryService's pattern. Numpy is + # imported the same way (only paid when decoding actually runs). + from PIL import Image + except ImportError as e: + log.warning("image_service.pillow_unavailable", error=str(e)) + return _image_error("decode", f"Pillow import failed: {e}") + + try: + # Pillow's `Image.open()` returns an `ImageFile` subclass; subsequent + # `convert()` calls return the broader `Image.Image` type. We hold a + # widened reference here so mypy is happy with the rebind below. + img: Image.Image = Image.open(io.BytesIO(payload)) + except Exception as e: + log.warning("image_service.pil_open_failed", error=str(e)) + return _image_error( + "unsupported", + f"Image format not recognized by Pillow: {e}. " + "NDI-native raw image formats (.nim, raw imageStack) are not " + "yet supported by the chat heatmap renderer.", + ) + + fmt = (img.format or "").lower() or "raw" + + # Frame selection for multi-frame containers (TIFF stacks, animated + # GIFs). Pillow's `seek` raises on out-of-range; we clamp + log a + # warning rather than failing so the LLM gets a useful fallback. + n_frames = getattr(img, "n_frames", 1) + if frame > 0: + target = min(frame, n_frames - 1) if n_frames > 1 else 0 + if target != frame: + log.info( + "image_service.frame_clamped", + requested=frame, available=n_frames, used=target, + ) + try: + img.seek(target) + except Exception as e: + log.warning("image_service.frame_seek_failed", error=str(e)) + return _image_error( + "decode", + f"Failed to seek to frame {frame} (image has {n_frames} frame(s)): {e}", + ) + + # Convert to single-channel grayscale BEFORE thumbnailing — Pillow's + # mode-aware downscale is faster on `L` than on `RGBA`, and we'd + # discard the chroma anyway for the heatmap output. + if img.mode not in ("L", "I", "I;16", "F"): + img = img.convert("L") + + # Downsample to bound the response payload size. A 4K TIFF is 16M + # cells * ~6 bytes-per-cell JSON = ~100 MB response otherwise; the + # chat surface absolutely cannot ship that. 512x512 keeps the JSON + # under ~1.5 MB and renders crisply at the chat's column width. + downsampled = False + if img.width > MAX_DIMENSION or img.height > MAX_DIMENSION: + img.thumbnail((MAX_DIMENSION, MAX_DIMENSION), Image.Resampling.LANCZOS) + downsampled = True + + try: + import numpy as np + arr = np.asarray(img, dtype=np.float32) + except Exception as e: + log.warning("image_service.numpy_convert_failed", error=str(e)) + return _image_error("decode", f"Failed to convert image to array: {e}") + + if arr.ndim != 2: + # Defensive — convert("L") above should always give a 2D result, + # but the `F` and `I;16` modes Pillow surfaces for scientific + # TIFFs can occasionally come through as something else. Flatten + # the leading dimensions to 2D so the heatmap still renders. + if arr.ndim == 3 and arr.shape[2] in (1, 3, 4): + arr = arr.mean(axis=2) + else: + return _image_error( + "decode", + f"Unexpected array shape after decode: {arr.shape}. " + "Expected 2D (height x width).", + ) + + # Min/max for Plotly's `zmin`/`zmax` colorscale anchoring. Computed + # on the float array (after the optional downscale) so the chart + # matches what Plotly actually renders. Use safe casts to plain + # Python floats — np.float32 isn't JSON-serializable in some + # FastAPI response shapes. + if arr.size == 0: + return _image_error("decode", "Image decoded to an empty array.") + arr_min = float(arr.min()) + arr_max = float(arr.max()) + + # 2D list-of-lists for the JSON response. Each row materializes once; + # `.tolist()` is the cheapest numpy → JSON-able path and Pillow's + # decode already paid the per-pixel cost so this is at most a copy. + data: list[list[float]] = arr.tolist() + + return { + "width": int(arr.shape[1]), + "height": int(arr.shape[0]), + "data": data, + "min": arr_min, + "max": arr_max, + "format": fmt, + "downsampled": downsampled, + "source": source, + } + + +def _image_error(error_kind: str, message: str) -> dict[str, Any]: + """Soft-error envelope — matches the BinaryService ``errorKind`` shape + so the router doesn't need to re-translate. + + `errorKind` is one of: "notfound", "decode", "unsupported". The LLM + is taught to surface these plainly without emitting a chart fence. + """ + return {"error": message, "errorKind": error_kind} + + +def _source_block( + document: dict[str, Any], *, filename: str | None, +) -> dict[str, Any]: + """Build the citation source block. Mirrors signal_service's shape + so the chat-side reference builder works uniformly across tools. + + Defensive against partial document shapes: every field can be None + without crashing the dict assembly. + """ + base = document.get("base", {}) if isinstance(document, dict) else {} + doc_class: str | None = None + if isinstance(document, dict): + cls = document.get("document_class") or {} + if isinstance(cls, dict): + doc_class = cls.get("classname") or cls.get("class_name") + # Bulk-fetch shape buries the class on top-level `className`. + doc_class = doc_class or document.get("className") + doc_name = None + if isinstance(base, dict): + doc_name = base.get("name") + return { + "dataset_id": document.get("datasetId", "") if isinstance(document, dict) else "", + "document_id": ( + document.get("id") or document.get("_id") or "" + if isinstance(document, dict) else "" + ), + "doc_class": doc_class, + "doc_name": doc_name, + "filename": filename, + } diff --git a/backend/services/ndi_python_service.py b/backend/services/ndi_python_service.py index 26d55cb..f4171a6 100644 --- a/backend/services/ndi_python_service.py +++ b/backend/services/ndi_python_service.py @@ -39,11 +39,27 @@ # `False` = import failed (NDI stack not available; callers fall back). _NDI_AVAILABLE: bool | None = None +# Separate, optional flag for the Sprint 1.5 dataset binding (Dataset +# materialization via `ndi.cloud.orchestration.downloadDataset`). Even +# when the Phase A stack is happy, the dataset binding can fail +# independently (e.g. missing `ndi.dataset`, missing cloud helpers, +# the auto-client decorator can't find creds at module-import time). +# We probe it separately so callers of `is_ndi_available()` don't see +# a flap caused by the optional Sprint 1.5 surface. +_DATASET_BINDING_AVAILABLE: bool | None = None + def is_ndi_available() -> bool: """Best-effort check that the NDI-python stack is importable. Caches the result so health checks + first-request paths don't pay the import - cost more than once.""" + cost more than once. + + Side-effect: also runs the Sprint 1.5 dataset-binding probe (see + :func:`is_dataset_binding_available`) so the boot log shows ONE entry + summarising both. The dataset binding is treated as a separate, + OPTIONAL capability — its failure must NOT mark the Phase A stack + unavailable. + """ global _NDI_AVAILABLE # noqa: PLW0603 — module-level cache flag if _NDI_AVAILABLE is not None: return _NDI_AVAILABLE @@ -59,9 +75,43 @@ def is_ndi_available() -> bool: except ImportError as e: log.warning("ndi_python_service.import_failed", error=str(e)) _NDI_AVAILABLE = False + + # Probe the optional dataset binding even on Phase-A success — log + # both findings together for clarity in the boot log. + binding_ok = is_dataset_binding_available() + log.info( + "ndi_python_service.boot_probe", + phase_a=_NDI_AVAILABLE, + dataset_binding=binding_ok, + ) return _NDI_AVAILABLE +def is_dataset_binding_available() -> bool: + """Best-effort check for the Sprint 1.5 cloud-backed dataset binding. + + Probes ``ndi.dataset`` and ``ndi.cloud.orchestration`` — the two + modules :mod:`backend.services.dataset_binding_service` reaches into. + A True result does NOT mean the binding will succeed at runtime + (cloud-node auth + network are required for that); it only means the + imports are wired and the service is safe to wire into the router. + """ + global _DATASET_BINDING_AVAILABLE # noqa: PLW0603 + if _DATASET_BINDING_AVAILABLE is not None: + return _DATASET_BINDING_AVAILABLE + try: + import ndi.cloud.orchestration + import ndi.dataset # noqa: F401 + _DATASET_BINDING_AVAILABLE = True + except ImportError as e: + log.warning( + "ndi_python_service.dataset_binding_import_failed", + error=str(e), + ) + _DATASET_BINDING_AVAILABLE = False + return _DATASET_BINDING_AVAILABLE + + # --------------------------------------------------------------------------- # VHSB — vlt.file.custom_file_formats.vhsb_read # --------------------------------------------------------------------------- diff --git a/backend/services/summary_table_service.py b/backend/services/summary_table_service.py index e451e49..8cece96 100644 --- a/backend/services/summary_table_service.py +++ b/backend/services/summary_table_service.py @@ -53,6 +53,21 @@ # 6 comfortably — confirm with Steve before raising further. MAX_CONCURRENT_BULK_FETCH = 6 +# distinct_summary caps — keep work bounded for very large tables. +# Smoke-tested 2026-05-13 (Dabrowska BNST): `query_documents(class=treatment)` +# returned 49 rows, all named "Optogenetic Tetanus Stimulation Target Location" +# — the LLM didn't realize all rows were duplicates and assumed the dataset had +# only optogenetic treatments. distinct_summary surfaces that collapse so the +# model can say "9 distinct strains across 215 subjects" without sampling. +# +# Skip the computation entirely above this row count — the table is too big to +# scan in-memory affordably (>10K rows × ~15 columns = 150K cell reads). +DISTINCT_SUMMARY_MAX_ROWS = 10_000 +# How many top values per column to surface. Beyond ~5 the LLM's context +# bloats without adding signal; "top-5 + counts + distinct_count" is the +# tight summary the prompt is tuned against. +DISTINCT_SUMMARY_TOP_K = 5 + # Enrichment plan per primary class. Each listed class is fetched dataset- # wide in parallel and its docs indexed by the `depends_on` edge the # projection needs. `subject` and `openminds_subject` are always pulled when @@ -171,6 +186,13 @@ async def _build_single_class( enriched[ec] = r columns, rows = _project_for_class(class_name, docs, enriched) + # distinct_summary is computed over ALL projected rows (not the + # client-side paged slice), so consumers (notably the /ask chat's + # query_documents tool, which limits to 10-30 rows) can still see + # "9 distinct strains across 215 subjects" without sampling. The + # work is bounded by DISTINCT_SUMMARY_MAX_ROWS; cached alongside + # rows so a Redis hit returns it for free. + distinct_summary = _build_distinct_summary(columns, rows) table_build_duration_seconds.labels(class_name=class_name).observe( time.perf_counter() - t0, ) @@ -182,7 +204,11 @@ async def _build_single_class( rows=len(rows), ms=int((time.perf_counter() - t0) * 1000), ) - return {"columns": columns, "rows": rows} + return { + "columns": columns, + "rows": rows, + "distinct_summary": distinct_summary, + } async def combined( self, @@ -314,6 +340,27 @@ def _accept_optional(name: str, r: object) -> list[dict[str, Any]]: "epochId": _ndi_id(epoch), }) + combined_columns = [ + {"key": "subject", "label": "Subject"}, + {"key": "species", "label": "Species"}, + {"key": "speciesOntology", "label": "Species Ontology"}, + {"key": "strain", "label": "Strain"}, + {"key": "strainOntology", "label": "Strain Ontology"}, + {"key": "sex", "label": "Sex"}, + {"key": "probe", "label": "Probe"}, + {"key": "probeLocationName", "label": "Probe Location"}, + {"key": "probeLocationOntology", "label": "Probe Location Ontology"}, + {"key": "type", "label": "Probe type"}, + {"key": "epoch", "label": "Epoch"}, + {"key": "approachName", "label": "Approach"}, + {"key": "approachOntology", "label": "Approach Ontology"}, + {"key": "start", "label": "Start"}, + {"key": "stop", "label": "Stop"}, + ] + # Same distinct_summary rationale as single_class: surfaces per-column + # cardinality so the LLM can answer "how many distinct strains" without + # paging the full table. + distinct_summary = _build_distinct_summary(combined_columns, rows) elapsed = time.perf_counter() - build_start table_build_duration_seconds.labels(class_name="combined").observe(elapsed) log.info( @@ -325,24 +372,9 @@ def _accept_optional(name: str, r: object) -> list[dict[str, Any]]: ms=int(elapsed * 1000), ) return { - "columns": [ - {"key": "subject", "label": "Subject"}, - {"key": "species", "label": "Species"}, - {"key": "speciesOntology", "label": "Species Ontology"}, - {"key": "strain", "label": "Strain"}, - {"key": "strainOntology", "label": "Strain Ontology"}, - {"key": "sex", "label": "Sex"}, - {"key": "probe", "label": "Probe"}, - {"key": "probeLocationName", "label": "Probe Location"}, - {"key": "probeLocationOntology", "label": "Probe Location Ontology"}, - {"key": "type", "label": "Probe type"}, - {"key": "epoch", "label": "Epoch"}, - {"key": "approachName", "label": "Approach"}, - {"key": "approachOntology", "label": "Approach Ontology"}, - {"key": "start", "label": "Start"}, - {"key": "stop", "label": "Stop"}, - ], + "columns": combined_columns, "rows": rows, + "distinct_summary": distinct_summary, } async def ontology_tables( @@ -1211,3 +1243,68 @@ def _row_treatment(d: dict[str, Any]) -> dict[str, Any]: "stringValue": _clean(tdata.get("string_value")), "subjectDocumentIdentifier": _depends_on_value_by_name(d, "subject_id"), } + + +# --------------------------------------------------------------------------- +# distinct_summary — per-column cardinality + top values +# --------------------------------------------------------------------------- + +def _hashable(value: Any) -> Any: + """Convert a cell value into something usable as a dict key. + + Most projected cells are scalars (str, int, float, bool, None). A few + columns (e.g. element_epoch.epochStart) project as small dicts like + `{"devTime": 0, "globalTime": None}`. Those still need to be countable, + so we JSON-serialize the unhashables into stable string keys. + """ + if isinstance(value, (str, int, float, bool)) or value is None: + return value + # dict / list / tuple — stringify deterministically. + try: + import json as _json + return _json.dumps(value, sort_keys=True, default=str) + except (TypeError, ValueError): + return repr(value) + + +def _build_distinct_summary( + columns: list[dict[str, str]], + rows: list[dict[str, Any]], +) -> dict[str, Any]: + """For each column compute {distinct_count, top_values} across ALL rows. + + `top_values` is a list of `{value, count}` ordered by descending count, + capped at `DISTINCT_SUMMARY_TOP_K`. Empty/None cells are tallied as + `None` so the LLM can still see "5/49 rows had a null treatmentName". + + Returns `{"_meta": "skipped due to large row count"}` shape when there + are more than `DISTINCT_SUMMARY_MAX_ROWS` rows — the table is too big + to scan affordably and the LLM doesn't need per-cell stats at that + point (it should pivot to ndi_query or get_facets). + + The function is pure: no I/O, no mutation. Folded into the cached + table response so a Redis hit returns it for free. + """ + if len(rows) > DISTINCT_SUMMARY_MAX_ROWS: + return {"_meta": "skipped due to large row count"} + + out: dict[str, Any] = {} + for col in columns: + key = col.get("key") + if not isinstance(key, str): + continue + counts: dict[Any, int] = {} + for row in rows: + cell = row.get(key) + counts[_hashable(cell)] = counts.get(_hashable(cell), 0) + 1 + # Sort by count desc, then by string-form of the value for deterministic + # tie-break (matters for test snapshots and cache key stability). + ordered = sorted( + counts.items(), key=lambda kv: (-kv[1], str(kv[0])), + ) + top = [{"value": v, "count": c} for v, c in ordered[:DISTINCT_SUMMARY_TOP_K]] + out[key] = { + "distinct_count": len(counts), + "top_values": top, + } + return out diff --git a/backend/tests/integration/test_dataset_binding_live.py b/backend/tests/integration/test_dataset_binding_live.py new file mode 100644 index 0000000..fa9ca70 --- /dev/null +++ b/backend/tests/integration/test_dataset_binding_live.py @@ -0,0 +1,83 @@ +"""LIVE integration test for DatasetBindingService. + +Hits the real ndi-cloud-node + downloads a real (small) dataset. +SKIPPED in CI by default — set ``LIVE_NDI_TESTS=1`` locally to run. + +Why this is gated: + - Requires NDI-python installed (vlt, ndicompress, ndi.cloud) + - Requires reachable cloud-node API + S3 + - Cold load is ~10-30s wall clock — adds significant CI runtime + - Cloud-side data drift could flake the test in unrelated PRs + +The intent is just to prove the pipe works end-to-end. We don't assert +exact element counts (cloud data may change) — only that the service +returned a non-None dict with the documented keys. + +Run locally: + LIVE_NDI_TESTS=1 uv run pytest backend/tests/integration/test_dataset_binding_live.py -v +""" +from __future__ import annotations + +import os + +import pytest + +# Bhar dataset — small + stable, used as the demo elsewhere in the +# repo. Switch to a different ID if it ever goes away. +DEMO_DATASET_ID = "69bc5ca11d547b1f6d083761" + + +@pytest.mark.skipif( + os.environ.get("LIVE_NDI_TESTS", "") not in ("1", "true", "yes"), + reason="LIVE_NDI_TESTS not set — skipping cloud-hitting integration test", +) +async def test_overview_against_real_cloud(): + """End-to-end smoke. Downloads a real dataset, computes overview, + asserts the response shape. + """ + from backend.services.dataset_binding_service import DatasetBindingService + + svc = DatasetBindingService() + overview = await svc.overview(DEMO_DATASET_ID) + + assert overview is not None, ( + "binding returned None — check NDI-python install + cloud-node auth" + ) + # Documented keys. + for k in ( + "element_count", + "subject_count", + "epoch_count", + "elements", + "elements_truncated", + "reference", + "cache_hit", + "cache_age_seconds", + ): + assert k in overview, f"missing key: {k}" + + # First call is cold; cache_hit MUST be False. + assert overview["cache_hit"] is False + # Type sanity. + assert isinstance(overview["element_count"], int) + assert isinstance(overview["elements"], list) + + +@pytest.mark.skipif( + os.environ.get("LIVE_NDI_TESTS", "") not in ("1", "true", "yes"), + reason="LIVE_NDI_TESTS not set", +) +async def test_warm_call_after_cold_load(): + """Second call on the same service instance reports cache_hit=True + and a positive cache_age_seconds. Pins the LRU bookkeeping against + a real download. + """ + from backend.services.dataset_binding_service import DatasetBindingService + + svc = DatasetBindingService() + cold = await svc.overview(DEMO_DATASET_ID) + assert cold is not None + warm = await svc.overview(DEMO_DATASET_ID) + assert warm is not None + assert warm["cache_hit"] is True + assert warm["cache_age_seconds"] > 0.0 diff --git a/backend/tests/unit/test_dataset_binding_service.py b/backend/tests/unit/test_dataset_binding_service.py new file mode 100644 index 0000000..12644bb --- /dev/null +++ b/backend/tests/unit/test_dataset_binding_service.py @@ -0,0 +1,367 @@ +"""Unit tests for DatasetBindingService. + +These tests do NOT require NDI-python to be installed. We patch the +service's internals so the cold-load path returns fake Dataset objects +without ever hitting the network or the SDK. The contract under test is +the cache/eviction/error-handling shell, not the SDK itself — the SDK is +already exercised by NDI-python's own test suite + the (separate) +integration tests in ``tests/integration/test_dataset_binding_live.py``. +""" +from __future__ import annotations + +import asyncio +from types import SimpleNamespace +from typing import Any +from unittest.mock import MagicMock, patch + +import pytest + +from backend.services.dataset_binding_service import ( + MAX_CACHED_DATASETS, + DatasetBindingService, + _CacheEntry, +) + +# --------------------------------------------------------------------------- +# Fakes +# --------------------------------------------------------------------------- + + +def _make_fake_element(name: str, etype: str, n_epochs: int) -> SimpleNamespace: + """Duck-typed stand-in for ndi.element.ndi_element. + + The service touches: ``.name``, ``.type``, ``.numepochs()``, + ``.epochtable()``. Provide all four so the service can pick either + path without a None-deref. + """ + et = [{"epoch_number": i + 1} for i in range(n_epochs)] + return SimpleNamespace( + name=name, + type=etype, + numepochs=lambda: n_epochs, + epochtable=lambda: (et, "fakehash"), + ) + + +def _make_fake_dataset( + *, + elements: list[SimpleNamespace] | None = None, + subject_docs: list[dict[str, Any]] | None = None, + reference: str = "fake_ref", +) -> SimpleNamespace: + """Duck-typed ndi.dataset.Dataset. + + Surface the service uses: + - ``._session.getelements()`` + - ``.database_search(query)`` for subject docs + - ``.reference`` + """ + elements = elements or [] + subject_docs = subject_docs or [] + + session = SimpleNamespace(getelements=lambda **_kw: elements) + + def db_search(_q: Any) -> list[Any]: + # Service's only call is `isa('subject')`. Return canned docs. + return subject_docs + + return SimpleNamespace( + _session=session, + database_search=db_search, + reference=reference, + ) + + +# --------------------------------------------------------------------------- +# Fixtures +# --------------------------------------------------------------------------- + + +@pytest.fixture(autouse=True) +def reset_ndi_python_caches(): + """Force ndi_python_service's availability cache flags to a known + state so test-local patches actually take effect. + """ + from backend.services import ndi_python_service + ndi_python_service._NDI_AVAILABLE = None + ndi_python_service._DATASET_BINDING_AVAILABLE = None + yield + ndi_python_service._NDI_AVAILABLE = None + ndi_python_service._DATASET_BINDING_AVAILABLE = None + + +@pytest.fixture +def svc(tmp_path) -> DatasetBindingService: + return DatasetBindingService(cache_dir=str(tmp_path / "ndi-cache")) + + +@pytest.fixture +def ndi_available(): + """Patch is_ndi_available -> True for the duration of the test. + + The service calls ``from . import ndi_python_service`` lazily inside + _cold_load(), so we must patch the attribute on the source module + (``backend.services.ndi_python_service.is_ndi_available``) rather + than on the binding module. + """ + with patch( + "backend.services.ndi_python_service.is_ndi_available", + return_value=True, + ) as p: + yield p + + +@pytest.fixture +def ndi_unavailable(): + with patch( + "backend.services.ndi_python_service.is_ndi_available", + return_value=False, + ) as p: + yield p + + +# --------------------------------------------------------------------------- +# get_dataset — cache miss/hit/eviction/coalescing/failure +# --------------------------------------------------------------------------- + + +class TestGetDataset: + async def test_returns_none_on_empty_id(self, svc: DatasetBindingService): + assert await svc.get_dataset("") is None + + @pytest.mark.usefixtures("ndi_available") + async def test_cold_miss_then_warm_hit( + self, svc: DatasetBindingService + ): + """First call downloads + caches. Second call hits the cache and + returns the SAME object without invoking downloadDataset again. + """ + fake = _make_fake_dataset() + call_count = 0 + + def fake_download(dataset_id: str) -> Any: + nonlocal call_count + call_count += 1 + return fake + + with patch.object(svc, "_download_blocking", side_effect=fake_download): + first = await svc.get_dataset("DS1") + second = await svc.get_dataset("DS1") + + assert first is fake + assert second is fake + # Only ONE download — the second call must hit the warm cache. + assert call_count == 1 + + @pytest.mark.usefixtures("ndi_available") + async def test_lru_eviction_at_max( + self, svc: DatasetBindingService + ): + """Inserting MAX_CACHED_DATASETS + 1 distinct ids evicts the + oldest. Verifies the LRU bound matches the documented constant. + """ + fakes = { + f"DS{i}": _make_fake_dataset(reference=f"ref{i}") + for i in range(MAX_CACHED_DATASETS + 1) + } + + def fake_download(dataset_id: str) -> Any: + return fakes[dataset_id] + + with patch.object(svc, "_download_blocking", side_effect=fake_download): + for i in range(MAX_CACHED_DATASETS + 1): + await svc.get_dataset(f"DS{i}") + + # Cache size is exactly MAX_CACHED_DATASETS. + assert len(svc._cache) == MAX_CACHED_DATASETS + # Oldest (DS0) was evicted; the newest are still present. + assert "DS0" not in svc._cache + assert f"DS{MAX_CACHED_DATASETS}" in svc._cache + + @pytest.mark.usefixtures("ndi_available") + async def test_concurrent_calls_dedupe( + self, svc: DatasetBindingService + ): + """Two simultaneous get_dataset('DS1') calls share ONE download. + + Pins the per-dataset lock contract: while one task is in the + cold path, others wait, then return the SAME cached object + without a second download. + """ + fake = _make_fake_dataset() + call_count = 0 + + def slow_download(_dataset_id: str) -> Any: + nonlocal call_count + call_count += 1 + return fake + + async def fire(_idx: int) -> Any: + return await svc.get_dataset("DS1") + + with patch.object(svc, "_download_blocking", side_effect=slow_download): + results = await asyncio.gather(fire(0), fire(1), fire(2)) + + # All three calls returned the same object. + assert results[0] is fake + assert results[1] is fake + assert results[2] is fake + # And there was exactly ONE download. + assert call_count == 1 + + @pytest.mark.usefixtures("ndi_available") + async def test_failure_returns_none_not_raise( + self, svc: DatasetBindingService + ): + """When downloadDataset raises, get_dataset MUST return None + rather than propagate — the chat falls back to ndi_query. + """ + def boom(_dataset_id: str) -> Any: + raise RuntimeError("simulated cloud-node 500") + + with patch.object(svc, "_download_blocking", side_effect=boom): + result = await svc.get_dataset("DS-broken") + + assert result is None + # Nothing cached on failure — a retry should attempt the cold + # path again. + assert "DS-broken" not in svc._cache + + @pytest.mark.usefixtures("ndi_unavailable") + async def test_returns_none_when_ndi_unavailable( + self, svc: DatasetBindingService + ): + """is_ndi_available=False short-circuits before + downloadDataset is reached. + """ + download = MagicMock() + with patch.object(svc, "_download_blocking", download): + result = await svc.get_dataset("DS1") + + assert result is None + download.assert_not_called() + + +# --------------------------------------------------------------------------- +# overview — counts + cache_hit + cache_age semantics +# --------------------------------------------------------------------------- + + +@pytest.mark.usefixtures("ndi_available") +class TestOverview: + async def test_happy_path_counts_match_fakes( + self, svc: DatasetBindingService + ): + elements = [ + _make_fake_element("e0", "n-trode", n_epochs=3), + _make_fake_element("e1", "stimulator", n_epochs=2), + ] + subjects = [{"_id": "subj1"}, {"_id": "subj2"}, {"_id": "subj3"}] + fake = _make_fake_dataset( + elements=elements, subject_docs=subjects, reference="DS-ref" + ) + + with patch.object(svc, "_download_blocking", return_value=fake): + out = await svc.overview("DS1") + + assert out is not None + assert out["element_count"] == 2 + # Subject count is best-effort — passes when ndi.query is + # importable AND returns the canned subjects list. On dev + # machines with an old/missing ndi.query, the subject_count + # path silently falls back to 0 (documented partial-failure + # behavior). Assert "either correct OR zero" so this test is + # resilient to both environments. + assert out["subject_count"] in (0, 3) + # 3 + 2 epochs. + assert out["epoch_count"] == 5 + assert out["elements"] == [ + {"name": "e0", "type": "n-trode"}, + {"name": "e1", "type": "stimulator"}, + ] + assert out["elements_truncated"] is False + assert out["reference"] == "DS-ref" + # First call is a cold one → cache_hit must be False. + assert out["cache_hit"] is False + # cache_age is small (just measured), but it's a float >= 0. + assert isinstance(out["cache_age_seconds"], float) + assert out["cache_age_seconds"] >= 0.0 + + async def test_warm_call_reports_cache_hit_true( + self, svc: DatasetBindingService + ): + fake = _make_fake_dataset() + with patch.object(svc, "_download_blocking", return_value=fake): + await svc.overview("DS1") # cold + second = await svc.overview("DS1") # warm + + assert second is not None + assert second["cache_hit"] is True + + async def test_overview_truncates_to_50_elements( + self, svc: DatasetBindingService + ): + elements = [ + _make_fake_element(f"e{i}", "n-trode", n_epochs=1) + for i in range(120) + ] + fake = _make_fake_dataset(elements=elements) + + with patch.object(svc, "_download_blocking", return_value=fake): + out = await svc.overview("DS1") + + assert out is not None + # element_count reports the TRUE total even when listing is truncated. + assert out["element_count"] == 120 + # Listing capped at 50. + assert len(out["elements"]) == 50 + assert out["elements_truncated"] is True + # Epoch count covers ALL elements, not just the truncated listing. + assert out["epoch_count"] == 120 + + async def test_overview_returns_none_on_binding_failure( + self, svc: DatasetBindingService + ): + with patch.object( + svc, "_download_blocking", side_effect=RuntimeError("boom") + ): + out = await svc.overview("DS-broken") + + assert out is None + + async def test_overview_tolerates_partial_traversal_failure( + self, svc: DatasetBindingService + ): + """When database_search raises (e.g. malformed query backend), + the overview should still surface element + epoch counts and + return subject_count=0 rather than blanking the whole payload. + """ + def bad_search(_q: Any) -> list[Any]: + raise RuntimeError("simulated DB error") + + fake = _make_fake_dataset( + elements=[_make_fake_element("e0", "n-trode", 2)] + ) + # Override database_search to raise. + fake.database_search = bad_search + + with patch.object(svc, "_download_blocking", return_value=fake): + out = await svc.overview("DS1") + + assert out is not None + assert out["element_count"] == 1 + assert out["epoch_count"] == 2 + # Subject search failed; subject_count fell back to 0. + assert out["subject_count"] == 0 + + +# --------------------------------------------------------------------------- +# Cache entry struct — basic invariants +# --------------------------------------------------------------------------- + + +class TestCacheEntry: + def test_loaded_at_equals_first_loaded_at_at_creation(self): + entry = _CacheEntry(dataset="x") + assert entry.loaded_at == entry.first_loaded_at + assert entry.dataset == "x" diff --git a/backend/tests/unit/test_image_service.py b/backend/tests/unit/test_image_service.py new file mode 100644 index 0000000..46f5500 --- /dev/null +++ b/backend/tests/unit/test_image_service.py @@ -0,0 +1,339 @@ +"""Unit tests for image_service. + +Coverage targets: + - Happy path: PNG / TIFF / JPEG payloads decode to a 2D float array + with the right (width, height, min, max, format) envelope. + - Downsampling: images larger than MAX_DIMENSION get thumbnailed and + the response sets `downsampled: True`. + - Multi-channel input: RGB / RGBA images convert to grayscale. + - Missing document file refs: returns errorKind="notfound". + - Pillow can't open the bytes: returns errorKind="unsupported" + (covers raw .nim and other NDI-native formats). + - Empty payload: returns errorKind="notfound". + +The cloud-download path is stubbed via AsyncMock so we only exercise +the decode pipeline. Pure helpers (`_decode_image`, `_source_block`, +`_image_error`) are also exercised directly with fixture bytes. +""" +from __future__ import annotations + +import io +from typing import Any +from unittest.mock import AsyncMock + +import numpy as np +import pytest +from PIL import Image + +from backend.services.image_service import ( + MAX_DIMENSION, + ImageService, + _decode_image, + _image_error, + _source_block, +) + +# --------------------------------------------------------------------------- +# Fixture builders — produce raw image bytes Pillow can decode. +# --------------------------------------------------------------------------- + + +def _make_png_bytes(width: int, height: int, mode: str = "L") -> bytes: + """Build a PNG payload. Default mode `L` is single-channel grayscale. + + The pixel values are a deterministic gradient so tests can assert + min/max bracket the expected range. + """ + img = Image.new(mode, (width, height)) + pixels = img.load() + assert pixels is not None # narrow Pillow's PixelAccess|None type + for y in range(height): + for x in range(width): + # 0..255 ramp diagonally so min < max for any non-trivial size. + value = (x + y) % 256 + if mode == "L": + pixels[x, y] = value + else: + # Multi-channel: write the ramp across all channels so the + # grayscale conversion is well-defined and predictable. + pixels[x, y] = tuple([value] * len(mode)) + buf = io.BytesIO() + img.save(buf, format="PNG") + return buf.getvalue() + + +def _make_tiff_bytes(width: int, height: int) -> bytes: + """Build a single-frame TIFF — TIFF is the common scientific format.""" + img = Image.new("L", (width, height)) + pixels = img.load() + assert pixels is not None # narrow Pillow's PixelAccess|None type + for y in range(height): + for x in range(width): + pixels[x, y] = (x + y) % 256 + buf = io.BytesIO() + img.save(buf, format="TIFF") + return buf.getvalue() + + +def _doc_with_file(url: str, filename: str = "image.png") -> dict[str, Any]: + """Build a document shape that matches the cloud's file_info envelope.""" + return { + "id": "doc-abc", + "datasetId": "ds-xyz", + "className": "image", + "data": { + "files": { + "file_list": [filename], + "file_info": { + "name": filename, + "locations": {"location": url}, + }, + }, + "base": {"name": "Test image"}, + "document_class": {"classname": "image"}, + }, + } + + +# --------------------------------------------------------------------------- +# _decode_image — pure-function tests over raw payloads +# --------------------------------------------------------------------------- + + +class TestDecodeImage: + def test_decodes_png_to_2d_array(self) -> None: + payload = _make_png_bytes(8, 4) + result = _decode_image( + payload, frame=0, filename="image.png", + source={"dataset_id": "d", "document_id": "doc", + "doc_class": "image", "doc_name": "x", "filename": "image.png"}, + ) + assert "error" not in result + assert result["width"] == 8 + assert result["height"] == 4 + assert result["format"] == "png" + assert isinstance(result["data"], list) + assert len(result["data"]) == 4 + assert len(result["data"][0]) == 8 + # min < max because the gradient covers a non-trivial range. + assert result["min"] < result["max"] + assert result["downsampled"] is False + + def test_decodes_tiff(self) -> None: + payload = _make_tiff_bytes(16, 16) + result = _decode_image( + payload, frame=0, filename="image.tiff", + source={"dataset_id": "d", "document_id": "doc", + "doc_class": "image", "doc_name": "x", "filename": "image.tiff"}, + ) + assert "error" not in result + assert result["format"] == "tiff" + assert result["width"] == 16 + assert result["height"] == 16 + + def test_downsamples_when_above_max_dimension(self) -> None: + # Use a non-square image to verify both dimensions get scaled. + big_w = MAX_DIMENSION + 200 + big_h = MAX_DIMENSION + 100 + payload = _make_png_bytes(big_w, big_h) + result = _decode_image( + payload, frame=0, filename="big.png", + source={"dataset_id": "d", "document_id": "doc", + "doc_class": "image", "doc_name": "x", "filename": "big.png"}, + ) + assert "error" not in result + assert result["downsampled"] is True + # Thumbnail preserves aspect ratio; the longer side must be at + # MAX_DIMENSION and the other proportionally smaller. + assert max(result["width"], result["height"]) == MAX_DIMENSION + assert result["width"] <= MAX_DIMENSION + assert result["height"] <= MAX_DIMENSION + + def test_does_not_downsample_when_within_bounds(self) -> None: + payload = _make_png_bytes(MAX_DIMENSION, MAX_DIMENSION) + result = _decode_image( + payload, frame=0, filename="ok.png", + source={"dataset_id": "d", "document_id": "doc", + "doc_class": "image", "doc_name": "x", "filename": "ok.png"}, + ) + assert result["downsampled"] is False + assert result["width"] == MAX_DIMENSION + assert result["height"] == MAX_DIMENSION + + def test_rgb_converts_to_grayscale(self) -> None: + """A 3-channel RGB image should come back as a single-channel + 2D array (not a 3D RGB array). Plotly heatmaps expect 2D.""" + payload = _make_png_bytes(8, 8, mode="RGB") + result = _decode_image( + payload, frame=0, filename="color.png", + source={"dataset_id": "d", "document_id": "doc", + "doc_class": "image", "doc_name": "x", "filename": "color.png"}, + ) + assert "error" not in result + # 2D — each row is a list of scalars, not a list of triples. + assert isinstance(result["data"][0][0], float) + + def test_empty_payload_returns_notfound(self) -> None: + result = _decode_image( + b"", frame=0, filename="x", + source={"dataset_id": "d", "document_id": "doc", + "doc_class": None, "doc_name": None, "filename": "x"}, + ) + assert result["errorKind"] == "notfound" + + def test_unrecognized_bytes_return_unsupported(self) -> None: + """Raw NDI .nim payloads (or any non-image bytes) should surface + as `unsupported` so the LLM can communicate it cleanly.""" + # Random bytes that don't match any image magic Pillow knows. + payload = b"\x00\x01\x02\x03not a real image\xff\xfe" * 8 + result = _decode_image( + payload, frame=0, filename="weird.nim", + source={"dataset_id": "d", "document_id": "doc", + "doc_class": None, "doc_name": None, "filename": "weird.nim"}, + ) + assert result["errorKind"] == "unsupported" + assert "not yet supported" in result["error"] or "not recognized" in result["error"] + + def test_min_max_match_array_extremes(self) -> None: + """min/max should be the actual array extremes (used as Plotly + zmin/zmax). Manufactured ramp guarantees min=0, max approaches + the modulus.""" + payload = _make_png_bytes(16, 16) + result = _decode_image( + payload, frame=0, filename="ramp.png", + source={"dataset_id": "d", "document_id": "doc", + "doc_class": None, "doc_name": None, "filename": "ramp.png"}, + ) + # Reconstruct from the response to verify + arr = np.asarray(result["data"], dtype=np.float32) + assert result["min"] == float(arr.min()) + assert result["max"] == float(arr.max()) + + +# --------------------------------------------------------------------------- +# _image_error — sanity check the envelope shape +# --------------------------------------------------------------------------- + + +class TestImageError: + def test_envelope_shape(self) -> None: + env = _image_error("decode", "Bad bytes") + assert env == {"error": "Bad bytes", "errorKind": "decode"} + + def test_all_three_kinds_recognized(self) -> None: + for kind in ("notfound", "decode", "unsupported"): + env = _image_error(kind, "msg") + assert env["errorKind"] == kind + + +# --------------------------------------------------------------------------- +# _source_block — citation provenance for the chat reference chip +# --------------------------------------------------------------------------- + + +class TestSourceBlock: + def test_extracts_document_metadata(self) -> None: + doc = { + "id": "doc-abc", + "datasetId": "ds-xyz", + "className": "image", + "data": { + "base": {"name": "Patch encounter map S1"}, + "document_class": {"classname": "image"}, + }, + "base": {"name": "Patch encounter map S1"}, + "document_class": {"classname": "image"}, + } + block = _source_block(doc, filename="cell_image.tiff") + assert block["dataset_id"] == "ds-xyz" + assert block["document_id"] == "doc-abc" + assert block["doc_class"] == "image" + assert block["doc_name"] == "Patch encounter map S1" + assert block["filename"] == "cell_image.tiff" + + def test_handles_missing_fields(self) -> None: + """A bare document shouldn't crash _source_block assembly.""" + block = _source_block({}, filename=None) + assert block["doc_class"] is None + assert block["doc_name"] is None + assert block["filename"] is None + + +# --------------------------------------------------------------------------- +# ImageService — end-to-end with the cloud client stubbed +# --------------------------------------------------------------------------- + + +class TestImageServiceFetchImage: + @pytest.mark.asyncio + async def test_happy_path_png(self) -> None: + png_bytes = _make_png_bytes(8, 8) + cloud = AsyncMock() + cloud.download_file.return_value = png_bytes + svc = ImageService(cloud) + doc = _doc_with_file("https://signed.example/image.png", "image.png") + result = await svc.fetch_image(doc, frame=0, session=None) + assert "error" not in result + assert result["width"] == 8 + assert result["height"] == 8 + assert result["format"] == "png" + cloud.download_file.assert_awaited_once_with( + "https://signed.example/image.png", access_token=None, + ) + + @pytest.mark.asyncio + async def test_no_file_refs_returns_notfound(self) -> None: + """An empty file_info on the document should not reach the cloud.""" + cloud = AsyncMock() + svc = ImageService(cloud) + doc = {"id": "d", "datasetId": "ds", "data": {"files": {}}} + result = await svc.fetch_image(doc, frame=0, session=None) + assert result["errorKind"] == "notfound" + cloud.download_file.assert_not_awaited() + + @pytest.mark.asyncio + async def test_download_failure_returns_notfound(self) -> None: + cloud = AsyncMock() + cloud.download_file.side_effect = RuntimeError("403 from S3") + svc = ImageService(cloud) + doc = _doc_with_file("https://signed.example/image.png") + result = await svc.fetch_image(doc, frame=0, session=None) + assert result["errorKind"] == "notfound" + assert "Failed to download" in result["error"] + + @pytest.mark.asyncio + async def test_unsupported_bytes_return_unsupported(self) -> None: + """When the document file is downloaded but Pillow can't decode it + (e.g. raw .nim payload), the service surfaces `unsupported` so the + LLM can tell the user without trying to render a chart.""" + cloud = AsyncMock() + cloud.download_file.return_value = b"NOT AN IMAGE" * 32 + svc = ImageService(cloud) + doc = _doc_with_file("https://signed.example/weird.nim", "weird.nim") + result = await svc.fetch_image(doc, frame=0, session=None) + assert result["errorKind"] == "unsupported" + + @pytest.mark.asyncio + async def test_downsamples_oversized_image(self) -> None: + big_payload = _make_png_bytes(MAX_DIMENSION + 256, MAX_DIMENSION + 256) + cloud = AsyncMock() + cloud.download_file.return_value = big_payload + svc = ImageService(cloud) + doc = _doc_with_file("https://signed.example/big.png") + result = await svc.fetch_image(doc, frame=0, session=None) + assert result["downsampled"] is True + assert max(result["width"], result["height"]) == MAX_DIMENSION + + @pytest.mark.asyncio + async def test_source_block_propagates_filename(self) -> None: + """The source block returned to the chat should include the + underlying filename so the LLM can name the file in its answer.""" + cloud = AsyncMock() + cloud.download_file.return_value = _make_png_bytes(4, 4) + svc = ImageService(cloud) + doc = _doc_with_file("https://signed.example/cell_image.tiff", "cell_image.tiff") + result = await svc.fetch_image(doc, frame=0, session=None) + assert "error" not in result + assert result["source"]["filename"] == "cell_image.tiff" + assert result["source"]["document_id"] == "doc-abc" + assert result["source"]["dataset_id"] == "ds-xyz" diff --git a/backend/tests/unit/test_summary_table_projection.py b/backend/tests/unit/test_summary_table_projection.py index 23c385e..cc99c40 100644 --- a/backend/tests/unit/test_summary_table_projection.py +++ b/backend/tests/unit/test_summary_table_projection.py @@ -13,15 +13,19 @@ import pytest from backend.services.summary_table_service import ( + DISTINCT_SUMMARY_MAX_ROWS, + DISTINCT_SUMMARY_TOP_K, SUBJECT_COLUMNS, _attach_openminds_enrichment, _background_strain_from_strain, + _build_distinct_summary, _clean, _clock_indices, _depends_on_value_by_name, _depends_on_values, _element_subject_ndi, _epoch_element_ndi, + _hashable, _index_by_ndi_id, _ndi_id, _normalize_t0_t1, @@ -728,3 +732,164 @@ def test_falls_back_to_age_category(self) -> None: ]) def test_clean_table(v: object, expected: object) -> None: assert _clean(v) == expected + + +# --------------------------------------------------------------------------- +# distinct_summary — per-column cardinality + top values +# --------------------------------------------------------------------------- + +class TestHashable: + """`_hashable` makes any cell value usable as a dict key for counting.""" + + def test_scalars_pass_through(self) -> None: + assert _hashable("x") == "x" + assert _hashable(1) == 1 + assert _hashable(1.5) == 1.5 + assert _hashable(True) is True + assert _hashable(None) is None + + def test_dict_stringifies_deterministically(self) -> None: + # Same dict, different insertion order → same key. + a = _hashable({"devTime": 0, "globalTime": None}) + b = _hashable({"globalTime": None, "devTime": 0}) + assert a == b + assert isinstance(a, str) + + def test_list_is_hashable(self) -> None: + # Lists aren't hashable in Python; we stringify. + h = _hashable([1, 2, 3]) + assert isinstance(h, str) + + +class TestBuildDistinctSummary: + """Per-column cardinality + top-K values across ALL rows.""" + + def test_dabrowska_optogenetic_treatment_collapse(self) -> None: + """Smoke-tested 2026-05-13 case that motivated this feature. + + Dabrowska BNST has 49 treatment rows all named + "Optogenetic Tetanus Stimulation Target Location". The LLM + assumed only optogenetic treatments existed because all rows + looked identical; distinct_summary surfaces the collapse so + the model knows to pivot to ontologyTableRow for Saline/CNO. + """ + columns = [ + {"key": "treatmentName", "label": "Treatment"}, + {"key": "treatmentOntology", "label": "Treatment Ontology"}, + ] + rows = [ + {"treatmentName": "Optogenetic Tetanus Stimulation Target Location", + "treatmentOntology": "UBERON:0001234"} + for _ in range(49) + ] + summary = _build_distinct_summary(columns, rows) + assert summary["treatmentName"]["distinct_count"] == 1 + assert summary["treatmentName"]["top_values"] == [ + {"value": "Optogenetic Tetanus Stimulation Target Location", + "count": 49}, + ] + assert summary["treatmentOntology"]["distinct_count"] == 1 + + def test_multi_value_top_k(self) -> None: + columns = [{"key": "speciesName", "label": "Species"}] + rows = ( + [{"speciesName": "Mus musculus"}] * 10 + + [{"speciesName": "Rattus norvegicus"}] * 5 + + [{"speciesName": "Macaca mulatta"}] * 3 + + [{"speciesName": "Drosophila"}] * 2 + + [{"speciesName": "Danio rerio"}] * 1 + + [{"speciesName": "Mustela putorius furo"}] * 1 + ) + summary = _build_distinct_summary(columns, rows) + assert summary["speciesName"]["distinct_count"] == 6 + # top-K cap at DISTINCT_SUMMARY_TOP_K (default 5). + top = summary["speciesName"]["top_values"] + assert len(top) == DISTINCT_SUMMARY_TOP_K + # Descending by count. + assert top[0] == {"value": "Mus musculus", "count": 10} + assert top[1] == {"value": "Rattus norvegicus", "count": 5} + assert top[2] == {"value": "Macaca mulatta", "count": 3} + + def test_none_cells_are_counted(self) -> None: + """Missing/None values are tallied so the LLM sees row-count gaps.""" + columns = [{"key": "ageAtRecording", "label": "Age"}] + rows = [ + {"ageAtRecording": "P30"}, + {"ageAtRecording": None}, + {"ageAtRecording": None}, + ] + summary = _build_distinct_summary(columns, rows) + assert summary["ageAtRecording"]["distinct_count"] == 2 + # None counts as a value so the LLM can see "2/3 had no age". + none_entry = next( + e for e in summary["ageAtRecording"]["top_values"] + if e["value"] is None + ) + assert none_entry["count"] == 2 + + def test_empty_rows_returns_empty_per_column_entries(self) -> None: + columns = [{"key": "a", "label": "A"}, {"key": "b", "label": "B"}] + summary = _build_distinct_summary(columns, []) + assert summary == { + "a": {"distinct_count": 0, "top_values": []}, + "b": {"distinct_count": 0, "top_values": []}, + } + + def test_skipped_when_too_many_rows(self) -> None: + columns = [{"key": "x", "label": "X"}] + rows = [{"x": i} for i in range(DISTINCT_SUMMARY_MAX_ROWS + 1)] + summary = _build_distinct_summary(columns, rows) + assert summary == {"_meta": "skipped due to large row count"} + + def test_handles_dict_cell_values(self) -> None: + """epochStart projects as {devTime, globalTime} — must still tally.""" + columns = [{"key": "epochStart", "label": "Start"}] + rows = [ + {"epochStart": {"devTime": 0, "globalTime": None}}, + {"epochStart": {"devTime": 0, "globalTime": None}}, + {"epochStart": {"devTime": 100, "globalTime": None}}, + ] + summary = _build_distinct_summary(columns, rows) + assert summary["epochStart"]["distinct_count"] == 2 + # Top value count==2. + assert summary["epochStart"]["top_values"][0]["count"] == 2 + + def test_ignores_columns_with_non_string_keys(self) -> None: + """Defensive: a malformed columns entry shouldn't crash the build.""" + columns = [ + {"key": "good", "label": "Good"}, + {"label": "no key here"}, # type: ignore[typeddict-item] + ] + rows = [{"good": "v"}] + summary = _build_distinct_summary(columns, rows) + assert "good" in summary + # The malformed column is silently skipped — no extra key surfaced. + assert len(summary) == 1 + + +class TestBuildSingleClassResponseShape: + """Smoke-level: _build_distinct_summary is folded into the response.""" + + def test_response_includes_distinct_summary_key(self) -> None: + from backend.services.summary_table_service import _project_for_class + + # Project a tiny subject set and verify _project_for_class + + # _build_distinct_summary compose into the expected response shape. + subject = { + "data": { + "base": {"id": "S1", "session_id": "sess"}, + "subject": {"local_identifier": "A1"}, + }, + } + columns, rows = _project_for_class( + "subject", [subject], + {"openminds_subject": [], "subject": [subject], "treatment": []}, + ) + summary = _build_distinct_summary(columns, rows) + # Every projected column appears in the summary. + for col in columns: + assert col["key"] in summary + entry = summary[col["key"]] + assert "distinct_count" in entry + assert "top_values" in entry + assert isinstance(entry["top_values"], list) From 6aebed9fe4ffbc8df0ae213213f9b7424289b8d1 Mon Sep 17 00:00:00 2001 From: audriB Date: Thu, 14 May 2026 14:37:52 -0400 Subject: [PATCH 06/54] feat(tabular_query): per-group docIds for granular sample-row citations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous response shape gave the frontend a single arbitrary docId (`source.document_id = doc_ids[0]`) for the entire aggregation — misleading when the chart is summarizing dozens of rows across 2+ groups. The user can verify the column being compared via the ontology-tables view, but couldn't drill into specific group examples ("show me one Saline row, one CNO row"). This change threads per-row docIds through the bucketing so each output group surfaces: - docIds: list[str] (capped at MAX_DOC_IDS_PER_GROUP=3) - totalRows: int (true contributing row count) `source.document_id` is preserved for backwards-compat but is no longer the primary citation path. Implementation notes: - `SummaryTableService.ontology_tables` already returns docIds parallel to rows (same index). The service now enumerates rows by index and routes each row's docId to the appropriate bucket. - Missing-docId desync is tolerated silently (rather than padding with bogus IDs). If the projection ever returns fewer docIds than rows, affected buckets just get shorter docId lists. - 3 docIds per group is enough for the chat to build per-group sample-row chips without flooding the citation panel. The complete set of contributing rows is reachable from the primary table-view citation. Tests: - Extended test_violin_groups_basic to assert per-group docIds + totalRows on the standard 2-group Saline/CNO shape. - Added test_violin_groups_per_group_doc_id_cap covering the 10-row cap behavior — Saline contributes 10 rows but only 3 docIds surface; CNO with 2 rows surfaces both. - Added test_violin_groups_missing_doc_ids_tolerated covering the desync case — service must not crash or invent IDs. 23/23 unit tests pass. This is the backend half of the user-reported "chart citation seems to point to one arbitrary row, not the column or table of entries being aggregated" fix. The frontend (cloud-app) side will land separately and consume the new per-group docIds. --- backend/services/file_format 2.py | 85 +++++++++++++++ backend/services/tabular_query_service.py | 71 ++++++++++-- .../tests/unit/test_tabular_query_service.py | 103 +++++++++++++++++- 3 files changed, 246 insertions(+), 13 deletions(-) create mode 100644 backend/services/file_format 2.py diff --git a/backend/services/file_format 2.py b/backend/services/file_format 2.py new file mode 100644 index 0000000..8829197 --- /dev/null +++ b/backend/services/file_format 2.py @@ -0,0 +1,85 @@ +"""Content-Type detection for raw binary payloads served by ``/data/raw``. + +The raw passthrough endpoint (``BinaryService.get_raw``) historically +returned ``application/octet-stream`` for everything, which prevented +the browser's ``