Skip to content

Add data-reuse summaries for datasetPapers#16

Draft
jesshaley wants to merge 11 commits into
dataset-brainstormfrom
claude/summarize-dataset-papers-iyiD5
Draft

Add data-reuse summaries for datasetPapers#16
jesshaley wants to merge 11 commits into
dataset-brainstormfrom
claude/summarize-dataset-papers-iyiD5

Conversation

@jesshaley
Copy link
Copy Markdown
Collaborator

@jesshaley jesshaley commented Apr 23, 2026

Summary

Adds one Markdown summary per paper in datasetPapers/ under a new datasetPapers/summaries/ directory. Each summary is scoped to information required for someone else to reuse the data: subjects, treatments, stimuli, recording/acquisition, derived variables, data/code availability, and notes on which DID/NDI document types the paper stress-tests.

Summaries added

  • 11494.full.pdf11494_van_hooser_2013_tree_shrew_V1.md — LGN→V1 receptive-field transformation in tree shrew (in vivo ephys, carbon fiber / tetrode, drifting gratings + bars).
  • CELREP115768_grabs 1..1.pdfCELREP115768_francesconi_2025_BNST_AVP_OT.md — BNST AVP/OT, rat slice ephys + optogenetics + FPS/EPM behavior. Data on NDI-Cloud (DOI 10.63884/ndic.2025.jyxfer8m); code on Zenodo.
  • Construction and Implementation...pdfJoVE_reikersdorfer_2022_carbon_fiber_MEA.md — JoVE carbon-fiber MEA fabrication / implantation protocol (mouse chronic + ferret acute).
  • Extracellular vesicles...pdfbhar_2025_celegans_EV_LTAM.mdC. elegans / C. briggsae cross-species LTAM transfer via extracellular vesicles; behavior + LC-MS + imaging.
  • elife-103191-v1.pdfeLife_103191_haley_2025_celegans_foraging.mdC. elegans accept–reject patch foraging. Data on NDI-Cloud (DOI 10.63884/ndic.2025.pb77mj2s); code on GitHub.
  • elife-106513-v2.pdfeLife_106513_griswold_2025_ferret_premature_vision.md — Ferret V1 after premature eye opening. Data on NDI-Cloud (DOI 10.63884/ndic.2025.28xb47y1).
  • elife-45968-v2.pdfeLife_45968_mukherjee_2019_GC_optogenetic_taste.md — Rat GC opto-trode + EMG + IOC with ArchT perturbation timing.

Cross-paper schema observations

Themes surfaced across the seven summaries that the current DID/NDI schemas may not fully support:

  • Multi-species subject — mouse, rat, ferret, tree shrew, mouse + ferret in one protocol, and C. elegans / C. briggsae populations (group-level subject rather than individual).
  • Probe-construction provenance — fiber diameter, parylene coat, gold electroplating parameters, bundle geometry, impedance history.
  • Factorial / covaried stimulus sets — e.g., orientation × SF × contrast with shared blank control.
  • Optogenetic perturbation — wavelength, power at tip, pulse pattern, burst structure, onset-relative-to-event.
  • Multi-stream sessions — simultaneous single-unit + EMG + intra-oral taste + laser in one trial.
  • Data-access provenance — multiple papers use NDI-Cloud with DOIs; one paper uses institutional request-only access (Brandeis LTS).
  • Per-session environment covariates — room temperature, humidity, animal age, bacterial growth time.
  • Analysis/model documents — Bayesian fits with priors + R̂, mixed-effects models with fixed/random structure, HMM change-point models.

Test plan

  • Review each summary against the source PDF for accuracy.
  • Use the "Relevance for DID/NDI schema design" sections to scope concrete schema updates on dataset-brainstorm.

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

@jesshaley jesshaley changed the title Add data-reuse summaries for datasetPapers (in progress) Add data-reuse summaries for datasetPapers Apr 23, 2026
claude added 5 commits April 23, 2026 16:33
Summarises common structural themes across the 7 datasetPapers,
proposes a minimal set of ~13 NDI document types, and outlines
a three-layer validation strategy (structural / class-hierarchical
/ ontology-backed) with a main tradeoff and open questions.

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
Adds five edge cases that strain the core proposal:
1. Plate-as-measured-object (Bhar, Haley) - proposes new 'substrate'
   document type and time-bounded 'placement_on' relationship
2. Cross-species exchange (Bhar) - species-agnostic substrate + per-species
   subject_group
3. Longitudinal reuse across many sessions (Reikersdorfer 11-month
   chronic recordings) - persistent placement, days_since_implant
4. Multi-stage cyclic protocols with rest periods (Bhar training)
5. Data shared across papers (Mukherjee re-analysis of Sadacca 2016)

Bumps core document count from 13 to 14 with the addition of 'substrate'
and adds two new open design questions.

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
Splits the single 'treatment' document type into a protocol/event pair:
- treatment_protocol: reusable, versioned recipe (no timestamps); composes
  via depends_on so outer paradigms can reference inner cycles
- treatment_event: actual execution with clock times + optional deviations;
  naturally a specialisation of epoch

Includes a worked example of Bhar's 5-cycle IAA+heat training and shows
how the pattern covers water restriction, in-vitro drug baths, premature
exposure, acclimation, and post-op drug schedules across the other papers.
Also deduplicates the 'Open questions' list and updates type counts.

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
After inspecting all 9 openMINDS module repos directly, most of the
document types this doc was proposing already exist in openMINDS with
matching semantics:

- core:Subject + core:SubjectState (with descendedFrom state transitions)
- core:SubjectGroup + core:SubjectGroupState (answers the populational-
  subject question directly)
- core:Protocol + core:ProtocolExecution (the treatment_protocol /
  treatment_event split, already in place)
- core:Device + core:DeviceUsage (the apparatus / placement split)
- ephys:ElectrodeArray + ElectrodeArrayUsage, PipetteUsage, Channel,
  Recording, RecordingActivity
- stimulation:Stimulus + StimulationActivity
- specimenPrep:TissueSampleSlicing, CranialWindowPreparation,
  DevicePlacement
- sands:CoordinatePoint, AnatomicalTargetPosition, CommonCoordinateFramework
- 113 controlledTerms categories
- Digital identifiers (DOI, RRID, ORCID, etc.) as first-class docs

Also notes the PROV-style "everything is an Activity with typed inputs
and outputs" pattern, which gives a provenance DAG for free.

Revises the minimal NDI document set from ~15 types to ~10 NDI-native
types that cover only what openMINDS does not: sample-aligned epochs,
spike-sorted units, fit/tuning-curve/HMM outputs, syncgraph, syncrule
mapping, factor_design, and the did_uid scheme. Adds a reader's note
at the top redirecting to this analysis. Proposes concrete next step:
build openMINDS-backed example instances for one paper (Griswold 2025)
before writing any more NDI schemas.

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
Full hand-authored metadata + data-description document set for the
Griswold & Van Hooser 2025 eLife paper. Mixes openMINDS types for the
subject/protocol/device/stimulus layer (38 documents) with NDI-native
types for the gaps openMINDS does not cover (18 documents: session,
factor_design, recording, spikesort_output, tuning curves, fits,
selectivity indices, epoch, mixed-effects analysis).

Companion files:
- README.md: how to read the example, file inventory, scope
- QUERIES.md: 5 worked queries traced through the document graph
- FINDINGS.md: 9 findings about what is awkward, missing, or works

Three of five queries expose friction (multi-hop state-chain traversal,
missing analysis provenance, factorial-stimulus modelling gap). Findings
map onto concrete recommendations for the schema and for consumer-
library helpers. Sets up the companion Bhar 2025 stress-test example.

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants