perf(summary): use dataset-record fields when /document-class-counts times out#104
Open
audriB wants to merge 1 commit into
Open
perf(summary): use dataset-record fields when /document-class-counts times out#104audriB wants to merge 1 commit into
audriB wants to merge 1 commit into
Conversation
…times out Smoke-test pass after PRs #101/#102/#103 found that for the largest production datasets (Jess Haley 78k docs, Sophie Griswold 101k docs) the /document-class-counts endpoint still exhausts its 20s deadline on every synthesis attempt — the cloud's Mongo aggregation on className without proper indexes simply doesn't fit in the budget. Net effect: every cron warm cycle landed a DEGRADED summary with all zero counts AND null per-class facts (because counts.subjects=0 short-circuited stage 2's openminds_subject fanout). The DatasetSummaryCard rendered "0 sessions, 0 subjects, ..., Not applicable" everywhere, even though the dataset record itself reports `numberOfSubjects=1656` and `documentCount=78687`. The frontend band-aid (Waltham-Data-Science/ndi-cloud-app#91) hides this UX-side by enriching the degraded summary with record fields client-side. This PR is the real fix at the BACKEND: when stage-1 counts times out, synthesize a counts envelope from the dataset record's `numberOfSubjects` + `documentCount` and gate stage 2 from those record fields directly. Stage 2 still attempts the per-class fanouts (openminds_subject for species/strains/sexes, probe_location for brainRegions, element for probeTypes), bounded by their existing 25s per-class deadlines. ## Worst-case timing stage 1: 20s (counts + dataset record in parallel — dataset succeeds fast; counts hits the deadline) stage 2: 25s (3 classes in parallel — each bounded by per-class deadline; ndiquery for openminds_subject can succeed even when class-counts is slow because they hit different cloud endpoints) ontology: ~2s total: ~47s (well under Railway's 88s ceiling) ## What the user sees, before vs after Before: counts: { sessions: 0, subjects: 0, probes: 0, elements: 0, epochs: 0, totalDocuments: 0 } species: null brainRegions: null extractionWarnings: ["class counts query failed: ..."] After (Jess Haley): counts: { sessions: 0, subjects: 1656, probes: 0, elements: 0, epochs: 0, totalDocuments: 78687 } species: [Caenorhabditis elegans, Escherichia coli] ← FROM cloud brainRegions: [whole nervous system] ← FROM cloud strains/sexes: real data when available extractionWarnings: ["class counts query failed: ..."] ↑ subjects + totalDocuments come from dataset record; species/brainRegions/strains/sexes come from REAL stage-2 ndiquery+bulk_fetch (the per-class queries succeed in isolation; only /document-class-counts is pathologically slow). ## Caveat: per-class counts (sessions/probes/elements/epochs) stay 0 The dataset record doesn't expose these fields, so when stage-1 counts times out we can't populate them. They display as 0 with the "X warnings" tooltip explaining the underlying cause. A future iteration could optionally compute these from `/dataset/:id/documents?class=...&pageSize=1` reading the response envelope's `total` field, but that's 4 extra cloud calls per degraded synthesis — not worth it for fields the user rarely notices vs the now-restored species/brainRegions facts. ## Coverage - test_stage_1_counts_timeout_still_runs_stage_2_via_record_fields: end-to-end pin asserting that with counts timed out + record fields populated, stage 2 attempts and succeeds, species comes back populated. - test_safe_record_int_handles_all_input_shapes: defensive helper accepts any input shape (None, dict-with-null, dict-with-string, negative int, missing key) and degrades to 0. ## Pairs with frontend Combined with ndi-cloud-app#91 (record-fallback enrichment) + ndi-cloud-app#92 (progressive document loading), the user-perceived loading experience for large datasets is now: • Hero band renders instantly (raw record fields) • Summary card renders with real subjects + totalDocuments + species + brainRegions within ~25-45s of cold synthesis • Document Explorer shows first 50 rows immediately, more as user scrolls • Subsequent viewers within 24h get sub-second cache-hit responses (PR #103 differential TTL holds full successes longer) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Smoke-test pass after PRs #101/#102/#103 found that for the largest production datasets (Jess Haley 78k docs, Sophie Griswold 101k docs) the `/document-class-counts` endpoint still exhausts its 20s stage-1 deadline on every synthesis attempt. The cloud's Mongo aggregation on `className` without proper indexes simply doesn't fit in the budget.
Net effect: every cron warm cycle landed a degraded summary with all zero counts AND null per-class facts (because `counts.subjects=0` short-circuited stage 2's openminds_subject fanout). The DatasetSummaryCard rendered "0 sessions, 0 subjects, …, Not applicable" everywhere, even though the dataset record itself reports `numberOfSubjects=1656` and `documentCount=78687`.
The frontend band-aid (Waltham-Data-Science/ndi-cloud-app#91) papers over this on the rendering side by enriching the degraded summary with record fields. This PR is the real fix at the backend: when stage-1 counts times out, synthesize a counts envelope from the dataset record's raw fields and gate stage 2 from those fields directly. Stage 2 still attempts per-class fanouts; the species/brainRegions/strains/sexes ndiqueries succeed in isolation (only `/document-class-counts` is pathologically slow on these datasets).
Before vs after
```
Before:
counts: { sessions: 0, subjects: 0, probes: 0, elements: 0,
epochs: 0, totalDocuments: 0 }
species: null ← stage 2 short-circuited
brainRegions: null ← stage 2 short-circuited
extractionWarnings: ["class counts query failed: ..."]
After (Jess Haley):
counts: { sessions: 0, subjects: 1656, probes: 0, elements: 0,
epochs: 0, totalDocuments: 78687 }
species: [Caenorhabditis elegans, Escherichia coli] ← REAL stage-2 data
brainRegions: [whole nervous system] ← REAL stage-2 data
strains/sexes: real data when available
extractionWarnings: ["class counts query failed: ..."]
```
Worst-case timing budget
Caveat
Per-class counts (sessions/probes/elements/epochs) stay 0 when stage-1 counts times out — the dataset record doesn't expose these. They display as 0 with the "X warnings" tooltip explaining the underlying cause. A future iteration could compute these from `/dataset/:id/documents?class=...&pageSize=1` envelopes (4 extra calls per degraded synthesis) — not worth it for fields the user rarely notices vs the now-restored species/brainRegions facts.
Coverage
Test plan
🤖 Generated with Claude Code