perf(summary): use dataset-record fields when /document-class-counts times out by audriB · Pull Request #104 · Waltham-Data-Science/ndi-data-browser-v2

audriB · 2026-04-27T15:31:43Z

Why

Smoke-test pass after PRs #101/#102/#103 found that for the largest production datasets (Jess Haley 78k docs, Sophie Griswold 101k docs) the `/document-class-counts` endpoint still exhausts its 20s stage-1 deadline on every synthesis attempt. The cloud's Mongo aggregation on `className` without proper indexes simply doesn't fit in the budget.

Net effect: every cron warm cycle landed a degraded summary with all zero counts AND null per-class facts (because `counts.subjects=0` short-circuited stage 2's openminds_subject fanout). The DatasetSummaryCard rendered "0 sessions, 0 subjects, …, Not applicable" everywhere, even though the dataset record itself reports `numberOfSubjects=1656` and `documentCount=78687`.

The frontend band-aid (Waltham-Data-Science/ndi-cloud-app#91) papers over this on the rendering side by enriching the degraded summary with record fields. This PR is the real fix at the backend: when stage-1 counts times out, synthesize a counts envelope from the dataset record's raw fields and gate stage 2 from those fields directly. Stage 2 still attempts per-class fanouts; the species/brainRegions/strains/sexes ndiqueries succeed in isolation (only `/document-class-counts` is pathologically slow on these datasets).

Before vs after

```
Before:
counts: { sessions: 0, subjects: 0, probes: 0, elements: 0,
epochs: 0, totalDocuments: 0 }
species: null ← stage 2 short-circuited
brainRegions: null ← stage 2 short-circuited
extractionWarnings: ["class counts query failed: ..."]

After (Jess Haley):
counts: { sessions: 0, subjects: 1656, probes: 0, elements: 0,
epochs: 0, totalDocuments: 78687 }
species: [Caenorhabditis elegans, Escherichia coli] ← REAL stage-2 data
brainRegions: [whole nervous system] ← REAL stage-2 data
strains/sexes: real data when available
extractionWarnings: ["class counts query failed: ..."]
```

Worst-case timing budget

Stage	Time
stage 1 (counts + dataset metadata, parallel)	20s (counts hits deadline; metadata succeeds fast)
stage 2 (3 per-class ndiqueries in parallel)	25s (each bounded by per-class deadline)
ontology resolution	~2s
total	~47s (well under Railway's 88s ceiling)

Caveat

Per-class counts (sessions/probes/elements/epochs) stay 0 when stage-1 counts times out — the dataset record doesn't expose these. They display as 0 with the "X warnings" tooltip explaining the underlying cause. A future iteration could compute these from `/dataset/:id/documents?class=...&pageSize=1` envelopes (4 extra calls per degraded synthesis) — not worth it for fields the user rarely notices vs the now-restored species/brainRegions facts.

Coverage

`test_stage_1_counts_timeout_still_runs_stage_2_via_record_fields`: end-to-end pin asserting stage 2 attempts and succeeds despite counts timeout
`test_safe_record_int_handles_all_input_shapes`: defensive helper accepts any input shape and degrades to 0

Test plan

`pytest backend/tests`: 557 passed, 1 skipped (opentelemetry)
`ruff check backend/`: clean
`mypy backend/`: 56 source files, no issues
Smoke-test on Railway post-deploy: confirm Jess Haley summary returns species + brainRegions populated despite counts-timeout warning

🤖 Generated with Claude Code

…times out Smoke-test pass after PRs #101/#102/#103 found that for the largest production datasets (Jess Haley 78k docs, Sophie Griswold 101k docs) the /document-class-counts endpoint still exhausts its 20s deadline on every synthesis attempt — the cloud's Mongo aggregation on className without proper indexes simply doesn't fit in the budget. Net effect: every cron warm cycle landed a DEGRADED summary with all zero counts AND null per-class facts (because counts.subjects=0 short-circuited stage 2's openminds_subject fanout). The DatasetSummaryCard rendered "0 sessions, 0 subjects, ..., Not applicable" everywhere, even though the dataset record itself reports `numberOfSubjects=1656` and `documentCount=78687`. The frontend band-aid (Waltham-Data-Science/ndi-cloud-app#91) hides this UX-side by enriching the degraded summary with record fields client-side. This PR is the real fix at the BACKEND: when stage-1 counts times out, synthesize a counts envelope from the dataset record's `numberOfSubjects` + `documentCount` and gate stage 2 from those record fields directly. Stage 2 still attempts the per-class fanouts (openminds_subject for species/strains/sexes, probe_location for brainRegions, element for probeTypes), bounded by their existing 25s per-class deadlines. ## Worst-case timing stage 1: 20s (counts + dataset record in parallel — dataset succeeds fast; counts hits the deadline) stage 2: 25s (3 classes in parallel — each bounded by per-class deadline; ndiquery for openminds_subject can succeed even when class-counts is slow because they hit different cloud endpoints) ontology: ~2s total: ~47s (well under Railway's 88s ceiling) ## What the user sees, before vs after Before: counts: { sessions: 0, subjects: 0, probes: 0, elements: 0, epochs: 0, totalDocuments: 0 } species: null brainRegions: null extractionWarnings: ["class counts query failed: ..."] After (Jess Haley): counts: { sessions: 0, subjects: 1656, probes: 0, elements: 0, epochs: 0, totalDocuments: 78687 } species: [Caenorhabditis elegans, Escherichia coli] ← FROM cloud brainRegions: [whole nervous system] ← FROM cloud strains/sexes: real data when available extractionWarnings: ["class counts query failed: ..."] ↑ subjects + totalDocuments come from dataset record; species/brainRegions/strains/sexes come from REAL stage-2 ndiquery+bulk_fetch (the per-class queries succeed in isolation; only /document-class-counts is pathologically slow). ## Caveat: per-class counts (sessions/probes/elements/epochs) stay 0 The dataset record doesn't expose these fields, so when stage-1 counts times out we can't populate them. They display as 0 with the "X warnings" tooltip explaining the underlying cause. A future iteration could optionally compute these from `/dataset/:id/documents?class=...&pageSize=1` reading the response envelope's `total` field, but that's 4 extra cloud calls per degraded synthesis — not worth it for fields the user rarely notices vs the now-restored species/brainRegions facts. ## Coverage - test_stage_1_counts_timeout_still_runs_stage_2_via_record_fields: end-to-end pin asserting that with counts timed out + record fields populated, stage 2 attempts and succeeds, species comes back populated. - test_safe_record_int_handles_all_input_shapes: defensive helper accepts any input shape (None, dict-with-null, dict-with-string, negative int, missing key) and degrades to 0. ## Pairs with frontend Combined with ndi-cloud-app#91 (record-fallback enrichment) + ndi-cloud-app#92 (progressive document loading), the user-perceived loading experience for large datasets is now: • Hero band renders instantly (raw record fields) • Summary card renders with real subjects + totalDocuments + species + brainRegions within ~25-45s of cold synthesis • Document Explorer shows first 50 rows immediately, more as user scrolls • Subsequent viewers within 24h get sub-second cache-hit responses (PR #103 differential TTL holds full successes longer) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(summary): use dataset-record fields when /document-class-counts times out#104

perf(summary): use dataset-record fields when /document-class-counts times out#104
audriB wants to merge 1 commit into
mainfrom
perf/use-record-fields-on-counts-timeout

audriB commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

audriB commented Apr 27, 2026

Why

Before vs after

Worst-case timing budget

Caveat

Coverage

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant