Add data-reuse summaries for datasetPapers#16
Draft
jesshaley wants to merge 11 commits into
Draft
Conversation
Summaries cover subjects, procedures, recording/acquisition, key variables, and metadata needed to reuse the data, plus notes on DID/NDI schema relevance. Remaining five papers still to summarize. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
All seven datasetPapers summaries are now complete. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
Summarises common structural themes across the 7 datasetPapers, proposes a minimal set of ~13 NDI document types, and outlines a three-layer validation strategy (structural / class-hierarchical / ontology-backed) with a main tradeoff and open questions. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
Adds five edge cases that strain the core proposal: 1. Plate-as-measured-object (Bhar, Haley) - proposes new 'substrate' document type and time-bounded 'placement_on' relationship 2. Cross-species exchange (Bhar) - species-agnostic substrate + per-species subject_group 3. Longitudinal reuse across many sessions (Reikersdorfer 11-month chronic recordings) - persistent placement, days_since_implant 4. Multi-stage cyclic protocols with rest periods (Bhar training) 5. Data shared across papers (Mukherjee re-analysis of Sadacca 2016) Bumps core document count from 13 to 14 with the addition of 'substrate' and adds two new open design questions. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
Splits the single 'treatment' document type into a protocol/event pair: - treatment_protocol: reusable, versioned recipe (no timestamps); composes via depends_on so outer paradigms can reference inner cycles - treatment_event: actual execution with clock times + optional deviations; naturally a specialisation of epoch Includes a worked example of Bhar's 5-cycle IAA+heat training and shows how the pattern covers water restriction, in-vitro drug baths, premature exposure, acclimation, and post-op drug schedules across the other papers. Also deduplicates the 'Open questions' list and updates type counts. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
After inspecting all 9 openMINDS module repos directly, most of the document types this doc was proposing already exist in openMINDS with matching semantics: - core:Subject + core:SubjectState (with descendedFrom state transitions) - core:SubjectGroup + core:SubjectGroupState (answers the populational- subject question directly) - core:Protocol + core:ProtocolExecution (the treatment_protocol / treatment_event split, already in place) - core:Device + core:DeviceUsage (the apparatus / placement split) - ephys:ElectrodeArray + ElectrodeArrayUsage, PipetteUsage, Channel, Recording, RecordingActivity - stimulation:Stimulus + StimulationActivity - specimenPrep:TissueSampleSlicing, CranialWindowPreparation, DevicePlacement - sands:CoordinatePoint, AnatomicalTargetPosition, CommonCoordinateFramework - 113 controlledTerms categories - Digital identifiers (DOI, RRID, ORCID, etc.) as first-class docs Also notes the PROV-style "everything is an Activity with typed inputs and outputs" pattern, which gives a provenance DAG for free. Revises the minimal NDI document set from ~15 types to ~10 NDI-native types that cover only what openMINDS does not: sample-aligned epochs, spike-sorted units, fit/tuning-curve/HMM outputs, syncgraph, syncrule mapping, factor_design, and the did_uid scheme. Adds a reader's note at the top redirecting to this analysis. Proposes concrete next step: build openMINDS-backed example instances for one paper (Griswold 2025) before writing any more NDI schemas. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
Full hand-authored metadata + data-description document set for the Griswold & Van Hooser 2025 eLife paper. Mixes openMINDS types for the subject/protocol/device/stimulus layer (38 documents) with NDI-native types for the gaps openMINDS does not cover (18 documents: session, factor_design, recording, spikesort_output, tuning curves, fits, selectivity indices, epoch, mixed-effects analysis). Companion files: - README.md: how to read the example, file inventory, scope - QUERIES.md: 5 worked queries traced through the document graph - FINDINGS.md: 9 findings about what is awkward, missing, or works Three of five queries expose friction (multi-hop state-chain traversal, missing analysis provenance, factorial-stimulus modelling gap). Findings map onto concrete recommendations for the schema and for consumer- library helpers. Sets up the companion Bhar 2025 stress-test example. https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds one Markdown summary per paper in
datasetPapers/under a newdatasetPapers/summaries/directory. Each summary is scoped to information required for someone else to reuse the data: subjects, treatments, stimuli, recording/acquisition, derived variables, data/code availability, and notes on which DID/NDI document types the paper stress-tests.Summaries added
11494.full.pdf→11494_van_hooser_2013_tree_shrew_V1.md— LGN→V1 receptive-field transformation in tree shrew (in vivo ephys, carbon fiber / tetrode, drifting gratings + bars).CELREP115768_grabs 1..1.pdf→CELREP115768_francesconi_2025_BNST_AVP_OT.md— BNST AVP/OT, rat slice ephys + optogenetics + FPS/EPM behavior. Data on NDI-Cloud (DOI 10.63884/ndic.2025.jyxfer8m); code on Zenodo.Construction and Implementation...pdf→JoVE_reikersdorfer_2022_carbon_fiber_MEA.md— JoVE carbon-fiber MEA fabrication / implantation protocol (mouse chronic + ferret acute).Extracellular vesicles...pdf→bhar_2025_celegans_EV_LTAM.md— C. elegans / C. briggsae cross-species LTAM transfer via extracellular vesicles; behavior + LC-MS + imaging.elife-103191-v1.pdf→eLife_103191_haley_2025_celegans_foraging.md— C. elegans accept–reject patch foraging. Data on NDI-Cloud (DOI 10.63884/ndic.2025.pb77mj2s); code on GitHub.elife-106513-v2.pdf→eLife_106513_griswold_2025_ferret_premature_vision.md— Ferret V1 after premature eye opening. Data on NDI-Cloud (DOI 10.63884/ndic.2025.28xb47y1).elife-45968-v2.pdf→eLife_45968_mukherjee_2019_GC_optogenetic_taste.md— Rat GC opto-trode + EMG + IOC with ArchT perturbation timing.Cross-paper schema observations
Themes surfaced across the seven summaries that the current DID/NDI schemas may not fully support:
subject— mouse, rat, ferret, tree shrew, mouse + ferret in one protocol, and C. elegans / C. briggsae populations (group-level subject rather than individual).Test plan
dataset-brainstorm.https://claude.ai/code/session_01VZLnPtLsHG3Jbg42uf2Xw6