Add data-reuse summaries for datasetPapers by jesshaley · Pull Request #16 · Waltham-Data-Science/DID-schema

jesshaley · 2026-04-23T16:14:56Z

Summary

Adds one Markdown summary per paper in datasetPapers/ under a new datasetPapers/summaries/ directory. Each summary is scoped to information required for someone else to reuse the data: subjects, treatments, stimuli, recording/acquisition, derived variables, data/code availability, and notes on which DID/NDI document types the paper stress-tests.

Summaries added

11494.full.pdf → 11494_van_hooser_2013_tree_shrew_V1.md — LGN→V1 receptive-field transformation in tree shrew (in vivo ephys, carbon fiber / tetrode, drifting gratings + bars).
CELREP115768_grabs 1..1.pdf → CELREP115768_francesconi_2025_BNST_AVP_OT.md — BNST AVP/OT, rat slice ephys + optogenetics + FPS/EPM behavior. Data on NDI-Cloud (DOI 10.63884/ndic.2025.jyxfer8m); code on Zenodo.
Construction and Implementation...pdf → JoVE_reikersdorfer_2022_carbon_fiber_MEA.md — JoVE carbon-fiber MEA fabrication / implantation protocol (mouse chronic + ferret acute).
Extracellular vesicles...pdf → bhar_2025_celegans_EV_LTAM.md — C. elegans / C. briggsae cross-species LTAM transfer via extracellular vesicles; behavior + LC-MS + imaging.
elife-103191-v1.pdf → eLife_103191_haley_2025_celegans_foraging.md — C. elegans accept–reject patch foraging. Data on NDI-Cloud (DOI 10.63884/ndic.2025.pb77mj2s); code on GitHub.
elife-106513-v2.pdf → eLife_106513_griswold_2025_ferret_premature_vision.md — Ferret V1 after premature eye opening. Data on NDI-Cloud (DOI 10.63884/ndic.2025.28xb47y1).
elife-45968-v2.pdf → eLife_45968_mukherjee_2019_GC_optogenetic_taste.md — Rat GC opto-trode + EMG + IOC with ArchT perturbation timing.

Cross-paper schema observations

Themes surfaced across the seven summaries that the current DID/NDI schemas may not fully support:

Multi-species subject — mouse, rat, ferret, tree shrew, mouse + ferret in one protocol, and C. elegans / C. briggsae populations (group-level subject rather than individual).
Probe-construction provenance — fiber diameter, parylene coat, gold electroplating parameters, bundle geometry, impedance history.
Factorial / covaried stimulus sets — e.g., orientation × SF × contrast with shared blank control.
Optogenetic perturbation — wavelength, power at tip, pulse pattern, burst structure, onset-relative-to-event.
Multi-stream sessions — simultaneous single-unit + EMG + intra-oral taste + laser in one trial.
Data-access provenance — multiple papers use NDI-Cloud with DOIs; one paper uses institutional request-only access (Brandeis LTS).
Per-session environment covariates — room temperature, humidity, animal age, bacterial growth time.
Analysis/model documents — Bayesian fits with priors + R̂, mixed-effects models with fixed/random structure, HMM change-point models.

Test plan

Review each summary against the source PDF for accuracy.
Use the "Relevance for DID/NDI schema design" sections to scope concrete schema updates on dataset-brainstorm.

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Summaries cover subjects, procedures, recording/acquisition, key variables, and metadata needed to reuse the data, plus notes on DID/NDI schema relevance. Remaining five papers still to summarize. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

All seven datasetPapers summaries are now complete. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Summarises common structural themes across the 7 datasetPapers, proposes a minimal set of ~13 NDI document types, and outlines a three-layer validation strategy (structural / class-hierarchical / ontology-backed) with a main tradeoff and open questions. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Adds five edge cases that strain the core proposal: 1. Plate-as-measured-object (Bhar, Haley) - proposes new 'substrate' document type and time-bounded 'placement_on' relationship 2. Cross-species exchange (Bhar) - species-agnostic substrate + per-species subject_group 3. Longitudinal reuse across many sessions (Reikersdorfer 11-month chronic recordings) - persistent placement, days_since_implant 4. Multi-stage cyclic protocols with rest periods (Bhar training) 5. Data shared across papers (Mukherjee re-analysis of Sadacca 2016) Bumps core document count from 13 to 14 with the addition of 'substrate' and adds two new open design questions. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Splits the single 'treatment' document type into a protocol/event pair: - treatment_protocol: reusable, versioned recipe (no timestamps); composes via depends_on so outer paradigms can reference inner cycles - treatment_event: actual execution with clock times + optional deviations; naturally a specialisation of epoch Includes a worked example of Bhar's 5-cycle IAA+heat training and shows how the pattern covers water restriction, in-vitro drug baths, premature exposure, acclimation, and post-op drug schedules across the other papers. Also deduplicates the 'Open questions' list and updates type counts. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

After inspecting all 9 openMINDS module repos directly, most of the document types this doc was proposing already exist in openMINDS with matching semantics: - core:Subject + core:SubjectState (with descendedFrom state transitions) - core:SubjectGroup + core:SubjectGroupState (answers the populational- subject question directly) - core:Protocol + core:ProtocolExecution (the treatment_protocol / treatment_event split, already in place) - core:Device + core:DeviceUsage (the apparatus / placement split) - ephys:ElectrodeArray + ElectrodeArrayUsage, PipetteUsage, Channel, Recording, RecordingActivity - stimulation:Stimulus + StimulationActivity - specimenPrep:TissueSampleSlicing, CranialWindowPreparation, DevicePlacement - sands:CoordinatePoint, AnatomicalTargetPosition, CommonCoordinateFramework - 113 controlledTerms categories - Digital identifiers (DOI, RRID, ORCID, etc.) as first-class docs Also notes the PROV-style "everything is an Activity with typed inputs and outputs" pattern, which gives a provenance DAG for free. Revises the minimal NDI document set from ~15 types to ~10 NDI-native types that cover only what openMINDS does not: sample-aligned epochs, spike-sorted units, fit/tuning-curve/HMM outputs, syncgraph, syncrule mapping, factor_design, and the did_uid scheme. Adds a reader's note at the top redirecting to this analysis. Proposes concrete next step: build openMINDS-backed example instances for one paper (Griswold 2025) before writing any more NDI schemas. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Full hand-authored metadata + data-description document set for the Griswold & Van Hooser 2025 eLife paper. Mixes openMINDS types for the subject/protocol/device/stimulus layer (38 documents) with NDI-native types for the gaps openMINDS does not cover (18 documents: session, factor_design, recording, spikesort_output, tuning curves, fits, selectivity indices, epoch, mixed-effects analysis). Companion files: - README.md: how to read the example, file inventory, scope - QUERIES.md: 5 worked queries traced through the document graph - FINDINGS.md: 9 findings about what is awkward, missing, or works Three of five queries expose friction (multi-hop state-chain traversal, missing analysis provenance, factorial-stimulus modelling gap). Findings map onto concrete recommendations for the schema and for consumer- library helpers. Sets up the companion Bhar 2025 stress-test example. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

claude added 6 commits April 23, 2026 16:14

Add data-reuse summary for carbon fiber MEA JoVE protocol

19cb9c9

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Add data-reuse summary for Bhar 2025 C. elegans EV LTAM paper

758d63d

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Add data-reuse summary for Haley 2024 C. elegans foraging eLife paper

861df16

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Add data-reuse summary for Griswold 2025 ferret premature vision paper

faf9c8b

https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

Add data-reuse summary for Mukherjee 2019 GC optogenetic taste paper

f92df41

All seven datasetPapers summaries are now complete. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6

jesshaley changed the title ~~Add data-reuse summaries for datasetPapers (in progress)~~ Add data-reuse summaries for datasetPapers Apr 23, 2026

claude added 5 commits April 23, 2026 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data-reuse summaries for datasetPapers#16

Add data-reuse summaries for datasetPapers#16
jesshaley wants to merge 11 commits into
dataset-brainstormfrom
claude/summarize-dataset-papers-iyiD5

jesshaley commented Apr 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jesshaley commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summaries added

Cross-paper schema observations

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jesshaley commented Apr 23, 2026 •

edited

Loading