Align with BIDS nomenclature and resync from schema (#21)#27
Conversation
Use BIDS terms consistently and treat schema.json as the single source
of truth for entity/suffix/extension/datatype lists.
Nomenclature:
- Rename data_type column/key to datatype (BIDS is one word).
- Distinguish entity key (sub, schema .name) from entity name (subject,
the column header); drop the "display name" misnomer in code and docs.
- Custom-entity JSON fields are now key/name/pattern (was
name/display_name/pattern); internal arrays CUSTOM_ENTITY_{KEYS,NAMES}.
- Fix docstring examples that passed entity keys ("sub,ses") which
silently fail the column matcher.
Schema resync:
- generate_entity_patterns.sh now emits every machine-generated block
(entity patterns, key/name order, suffixes, extensions, datatype regex).
- Add 4 missing entities in canonical order: template, cohort, atlas, scale.
- Add missing datatype emg (emg files previously parsed as datatype=NA).
- Add missing suffixes: description, emg, physioevents.
Verified: bash -n + shellcheck clean, ./test_libBIDS.sh 8/8, datatype=emg
resolves in a clean parse, custom entity key/name produces the right column.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR standardizes BIDS terminology throughout libBIDS.sh: TSV columns use entity names (e.g., ChangesBIDS nomenclature and entity expansion
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR aligns libBIDS.sh with BIDS schema nomenclature and treats schema.json as the single source of truth for schema-derived lists (entities, suffixes, extensions, datatype regex). It also updates docs and the custom-entity JSON template to match the renamed concepts.
Changes:
- Rename the table column/key
data_type→datatype, and standardize docs/examples around BIDS entity keys vs names. - Resync schema-derived lists: add missing entities (
tpl,cohort,atlas,scale), datatype (emg), and suffixes. - Expand
generate_entity_patterns.shto emit all schema-derived blocks needed to updatelibBIDS.shin one regeneration.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates user-facing documentation/examples to use BIDS-correct terminology and datatype. |
| libBIDS.sh | Updates parsing/output schema (datatype, entity-name headers), adds new entities/suffixes, and renames custom-entity JSON fields/arrays. |
| generate_entity_patterns.sh | Regenerates all schema-derived blocks (entity patterns/orders, suffixes/extensions alternations, datatype regex). |
| custom/custom_entities.json.tpl | Updates template fields to key/name/pattern. |
| AGENTS.md | Updates repository guidance/docs to reflect renamed columns and corrected BIDS terminology. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # ./generate_entity_patterns.sh -> datatype regex | ||
| arr[datatype]=$(grep -o -E "(anat|beh|dwi|eeg|emg|fmap|func|ieeg|meg|micr|motion|mrs|nirs|perf|pet|phenotype)" <<<"$(dirname "${path}")" | head -1 || echo "NA") | ||
| arr[derivatives]=$(grep -o 'derivatives/.*/' <<<"${path}" | awk -F/ '{print $2}' || echo "NA") |
| # Suffixes from schema.json | ||
| # jq -r .objects.suffixes.[].value schema.json | paste -s -d'|' | ||
| local suffixes="_@(2PE|ADC|BF|Chimap|CARS|CONF|DIC|DF|FA|FLAIR|FLASH|FLUO|IRT1|M0map|MEGRE|MESE|MP2RAGE|MPE|MPM|MTR|MTRmap|MTS|MTVmap|MTsat|MWFmap|NLO|OCT|PC|PD|PDT2|PDmap|PDw|PLI|R1map|R2map|R2starmap|RB1COR|RB1map|S0map|SEM|SPIM|SR|T1map|T1rho|T1w|T2map|T2star|T2starmap|T2starw|T2w|TB1AFI|TB1DAM|TB1EPI|TB1RFM|TB1SRGE|TB1TFL|TB1map|TEM|UNIT1|VFA|angio|asl|aslcontext|asllabeling|beh|blood|bold|cbv|channels|colFA|coordsystem|defacemask|descriptions|dseg|dwi|eeg|electrodes|epi|events|expADC|fieldmap|headshape|XPCT|ieeg|inplaneT1|inplaneT2|m0scan|magnitude|magnitude1|magnitude2|markers|mask|meg|motion|mrsi|mrsref|nirs|noRF|optodes|pet|phase|phase1|phase2|phasediff|photo|physio|probseg|sbref|scans|sessions|stim|svs|trace|uCT|unloc)" | ||
| local suffixes="_@(2PE|ADC|BF|Chimap|CARS|CONF|DIC|DF|FA|FLAIR|FLASH|FLUO|IRT1|M0map|MEGRE|MESE|MP2RAGE|MPE|MPM|MTR|MTRmap|MTS|MTVmap|MTsat|MWFmap|NLO|OCT|PC|PD|PDT2|PDmap|PDw|PLI|R1map|R2map|R2starmap|RB1COR|RB1map|S0map|SEM|SPIM|SR|T1map|T1rho|T1w|T2map|T2star|T2starmap|T2starw|T2w|TB1AFI|TB1DAM|TB1EPI|TB1RFM|TB1SRGE|TB1TFL|TB1map|TEM|UNIT1|VFA|angio|asl|aslcontext|asllabeling|beh|blood|bold|cbv|channels|colFA|coordsystem|defacemask|description|descriptions|dseg|dwi|eeg|electrodes|emg|epi|events|expADC|fieldmap|headshape|XPCT|ieeg|inplaneT1|inplaneT2|m0scan|magnitude|magnitude1|magnitude2|markers|mask|meg|motion|mrsi|mrsref|nirs|noRF|optodes|pet|phase|phase1|phase2|phasediff|photo|physio|physioevents|probseg|sbref|scans|sessions|stim|svs|trace|uCT|unloc)" |
| printf "derivatives\tdatatype\t%s\tsuffix\textension\tpath\n" "${entities_name_order}" | ||
| for file in "${files[@]}"; do | ||
| declare -A file_info | ||
| _libBIDSsh_parse_filename "${file}" file_info | ||
| for key in derivatives data_type ${entities_order} suffix extension path; do | ||
| for key in derivatives datatype ${entities_order} suffix extension path; do |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
libBIDS.sh (1)
376-379:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMake custom-entity loading truly optional when
jqis unavailable.At Line 447,
libBIDSsh_parse_bids_to_tablealways calls_libBIDSsh_load_custom_entities, but Lines 376-379 immediately return failure ifjqis missing. Underset -e, this breaks core parsing even when no custom JSON is used.As a result, optional custom-entity support becomes a hard runtime dependency.
Suggested fix
_libBIDSsh_load_custom_entities() { @@ - if ! command -v jq >/dev/null 2>&1; then - echo "Error: jq is required for custom entity support" >&2 - return 1 - fi @@ if ((${`#json_files`[@]} == 0)); then return 0 fi if ! command -v jq >/dev/null 2>&1; then echo "Error: jq is required for custom entity support" >&2 return 1 fiBased on learnings: "The library must have zero-dependency core functionality, with
jqoptional for JSON features and custom entity support".Also applies to: 447-447
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libBIDS.sh` around lines 376 - 379, The current jq check exits early causing core parsing to fail; instead make jq optional by changing the check at the top to not return a failure: detect jq and set a flag (e.g. LIBBIDS_HAS_JQ=1 or 0) rather than "return 1", and update _libBIDSsh_load_custom_entities and libBIDSsh_parse_bids_to_table so they first test whether custom entity JSON is actually requested and whether LIBBIDS_HAS_JQ=1 before attempting JSON work; if jq is missing simply skip loading custom entities (or emit a non-fatal warning) and return success so core parsing continues without jq.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@libBIDS.sh`:
- Line 251: The arr[datatype] assignment may match substrings inside longer
folder names; update the logic that computes arr[datatype] (the line setting
arr[datatype] from dirname "${path}") to only match whole path components
instead of arbitrary substrings—e.g., change the grep/regex to look for
component boundaries (use (^|/) and (/|$) around the token list) or split
dirname "${path}" into path segments and test each segment exactly against the
allowed datatypes (anat|beh|...|phenotype), then set arr[datatype] to the first
exact match or "NA".
- Around line 416-424: The loop is accepting literal "null" strings from jq and
silencing JSON parse errors; update the jq pipeline or the loop to reject
entries with missing fields and surface parse failures: change the jq call to
filter out nulls (e.g. jq -r '.entities[] | select(.key!=null and .name!=null
and .pattern!=null) | "\(.key);\(.name);\(.pattern)"' "$json_file" without
redirecting stderr) or add an explicit check inside the while loop to skip
entries where key/name/pattern are empty or equal to "null" before assigning to
CUSTOM_ENTITIES, CUSTOM_ENTITY_KEYS, and CUSTOM_ENTITY_NAMES so malformed
entities are rejected and JSON errors are not hidden.
In `@README.md`:
- Around line 55-58: Add a small reference table to README.md near the TSV
terminology paragraph that maps BIDS entity key (short token, e.g. "sub"),
entity name (full name, e.g. "subject"), display-name (user-facing label if
different), and an example usage (e.g. filename token vs TSV header) so readers
can clearly see key/name/display-name distinctions; update the paragraph that
mentions "entity keys" and "entity names" to reference this table and ensure the
table columns are: key | name | display-name | example usage.
---
Outside diff comments:
In `@libBIDS.sh`:
- Around line 376-379: The current jq check exits early causing core parsing to
fail; instead make jq optional by changing the check at the top to not return a
failure: detect jq and set a flag (e.g. LIBBIDS_HAS_JQ=1 or 0) rather than
"return 1", and update _libBIDSsh_load_custom_entities and
libBIDSsh_parse_bids_to_table so they first test whether custom entity JSON is
actually requested and whether LIBBIDS_HAS_JQ=1 before attempting JSON work; if
jq is missing simply skip loading custom entities (or emit a non-fatal warning)
and return success so core parsing continues without jq.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b619f1f1-2b31-466e-9c8c-cf4ed641483d
📒 Files selected for processing (5)
AGENTS.mdREADME.mdcustom/custom_entities.json.tplgenerate_entity_patterns.shlibBIDS.sh
| # jq -r .objects.datatypes.[].value schema.json | paste -s -d'|' | ||
| arr[data_type]=$(grep -o -E "(anat|beh|dwi|eeg|fmap|func|ieeg|meg|micr|motion|mrs|perf|pet|phenotype|nirs)" <<<"$(dirname "${path}")" | head -1 || echo "NA") | ||
| # ./generate_entity_patterns.sh -> datatype regex | ||
| arr[datatype]=$(grep -o -E "(anat|beh|dwi|eeg|emg|fmap|func|ieeg|meg|micr|motion|mrs|nirs|perf|pet|phenotype)" <<<"$(dirname "${path}")" | head -1 || echo "NA") |
There was a problem hiding this comment.
Datatype inference at Line 251 can produce false positives from substring matches.
The current grep matches anywhere in the directory string, so non-datatype folder names containing tokens like func can incorrectly set arr[datatype], corrupting emitted table values.
Suggested fix
- arr[datatype]=$(grep -o -E "(anat|beh|dwi|eeg|emg|fmap|func|ieeg|meg|micr|motion|mrs|nirs|perf|pet|phenotype)" <<<"$(dirname "${path}")" | head -1 || echo "NA")
+ arr[datatype]="$(
+ awk -F'/' '
+ {
+ for (i = 1; i <= NF; i++) {
+ if ($i ~ /^(anat|beh|dwi|eeg|emg|fmap|func|ieeg|meg|micr|motion|mrs|nirs|perf|pet|phenotype)$/) {
+ print $i; found=1; exit
+ }
+ }
+ }
+ END { if (!found) print "NA" }
+ ' <<<"$(dirname "${path}")"
+ )"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@libBIDS.sh` at line 251, The arr[datatype] assignment may match substrings
inside longer folder names; update the logic that computes arr[datatype] (the
line setting arr[datatype] from dirname "${path}") to only match whole path
components instead of arbitrary substrings—e.g., change the grep/regex to look
for component boundaries (use (^|/) and (/|$) around the token list) or split
dirname "${path}" into path segments and test each segment exactly against the
allowed datatypes (anat|beh|...|phenotype), then set arr[datatype] to the first
exact match or "NA".
| while IFS=';' read -r key name pattern; do | ||
| if [[ -n "$key" && -n "$name" && -n "$pattern" ]]; then | ||
| CUSTOM_ENTITIES["$key"]="$pattern" | ||
| CUSTOM_ENTITY_KEYS+=("$key") | ||
| CUSTOM_ENTITY_NAMES+=("$name") | ||
| CUSTOM_ENTITY_DISPLAY_NAMES+=("$display_name") | ||
| fi | ||
| done < <( | ||
| jq -r '.entities[] | "\(.name);\(.display_name);\(.pattern)"' \ | ||
| jq -r '.entities[] | "\(.key);\(.name);\(.pattern)"' \ | ||
| "$json_file" 2>/dev/null |
There was a problem hiding this comment.
Reject malformed custom entities instead of accepting null fields.
At Line 423, missing JSON fields become literal null values with jq -r; Lines 417-420 then accept them and may register CUSTOM_ENTITIES["null"], causing hard-to-debug header/pattern corruption. 2>/dev/null also hides parse failures.
Suggested fix
- while IFS=';' read -r key name pattern; do
- if [[ -n "$key" && -n "$name" && -n "$pattern" ]]; then
+ while IFS=';' read -r key name pattern; do
+ if [[ -n "$key" && -n "$name" && -n "$pattern" \
+ && "$key" != "null" && "$name" != "null" && "$pattern" != "null" ]]; then
CUSTOM_ENTITIES["$key"]="$pattern"
CUSTOM_ENTITY_KEYS+=("$key")
CUSTOM_ENTITY_NAMES+=("$name")
+ else
+ echo "Error: Invalid custom entity in '$json_file' (expected key/name/pattern)" >&2
+ return 1
fi
done < <(
- jq -r '.entities[] | "\(.key);\(.name);\(.pattern)"' \
- "$json_file" 2>/dev/null
+ jq -r '.entities[] | "\(.key);\(.name);\(.pattern)"' "$json_file"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while IFS=';' read -r key name pattern; do | |
| if [[ -n "$key" && -n "$name" && -n "$pattern" ]]; then | |
| CUSTOM_ENTITIES["$key"]="$pattern" | |
| CUSTOM_ENTITY_KEYS+=("$key") | |
| CUSTOM_ENTITY_NAMES+=("$name") | |
| CUSTOM_ENTITY_DISPLAY_NAMES+=("$display_name") | |
| fi | |
| done < <( | |
| jq -r '.entities[] | "\(.name);\(.display_name);\(.pattern)"' \ | |
| jq -r '.entities[] | "\(.key);\(.name);\(.pattern)"' \ | |
| "$json_file" 2>/dev/null | |
| while IFS=';' read -r key name pattern; do | |
| if [[ -n "$key" && -n "$name" && -n "$pattern" \ | |
| && "$key" != "null" && "$name" != "null" && "$pattern" != "null" ]]; then | |
| CUSTOM_ENTITIES["$key"]="$pattern" | |
| CUSTOM_ENTITY_KEYS+=("$key") | |
| CUSTOM_ENTITY_NAMES+=("$name") | |
| else | |
| echo "Error: Invalid custom entity in '$json_file' (expected key/name/pattern)" >&2 | |
| return 1 | |
| fi | |
| done < <( | |
| jq -r '.entities[] | "\(.key);\(.name);\(.pattern)"' "$json_file" | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@libBIDS.sh` around lines 416 - 424, The loop is accepting literal "null"
strings from jq and silencing JSON parse errors; update the jq pipeline or the
loop to reject entries with missing fields and surface parse failures: change
the jq call to filter out nulls (e.g. jq -r '.entities[] | select(.key!=null and
.name!=null and .pattern!=null) | "\(.key);\(.name);\(.pattern)"' "$json_file"
without redirecting stderr) or add an explicit check inside the while loop to
skip entries where key/name/pattern are empty or equal to "null" before
assigning to CUSTOM_ENTITIES, CUSTOM_ENTITY_KEYS, and CUSTOM_ENTITY_NAMES so
malformed entities are rejected and JSON errors are not hidden.
| The TSV columns use the full BIDS entity **names** (e.g. `subject`), not the entity | ||
| **keys** (the short tokens like `sub` found in filenames). In BIDS schema terms the | ||
| key is the entity's `.name` field (`sub`) and the column header is the entity object | ||
| name (`subject`). |
There was a problem hiding this comment.
Add the required key/name/display-name reference table to README.
The section explains the terminology, but the guideline explicitly requires a reference table in README.md. Please add a small table (key/name/display-name/example usage) near Line 55.
As per coding guidelines: "README.md: Documentation must clearly distinguish between entity keys, entity names, and entity display names, with a reference table in README.md".
Also applies to: 67-67
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@README.md` around lines 55 - 58, Add a small reference table to README.md
near the TSV terminology paragraph that maps BIDS entity key (short token, e.g.
"sub"), entity name (full name, e.g. "subject"), display-name (user-facing label
if different), and an example usage (e.g. filename token vs TSV header) so
readers can clearly see key/name/display-name distinctions; update the paragraph
that mentions "entity keys" and "entity names" to reference this table and
ensure the table columns are: key | name | display-name | example usage.
Source: Coding guidelines
Closes #21.
Audits the library against
bids-schema/bids-specification(vendoredschema.json) and makes code and docs use BIDS-correct terms, while treatingschema.jsonas the single source of truth for the generated lists.Nomenclature
data_typecolumn/key todatatype(BIDS is consistently one word:objects.datatypes)..namesubsubject.display_nameSubjectkey/name/pattern(wasname/display_name/pattern); internal arrays renamed toCUSTOM_ENTITY_{KEYS,NAMES}. Breaking for existingcustom/*.json."sub,ses") which silently fail the column matcher.Schema resync
generate_entity_patterns.shnow emits every machine-generated block (entity patterns, key/name order, suffixes, extensions, datatype regex) so regeneration is one command.template,cohort,atlas,scale(31 → 35).emg— bug fix: emg files previously parsed asdatatype=NA.description,emg,physioevents.Breaking changes
data_type→datatypecolumn (update any scripts filtering/accessing it).name→key,display_name→name).Verification
bash -n+shellcheckclean on both scripts../test_libBIDS.sh→ 8/8 pass.datatype=emgresolves;anat/funcunchanged.key/nameproduces thebodypartcolumn (valuebp-leg).Out of scope (follow-up)
grep … | head -1 || echo "NA"in_libBIDSsh_parse_filenameis fragile underset -o pipefail(SIGPIPE → NA fallback). Pre-existing; considergrep -m1later.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
Documentation
subjectinstead ofsub)data_typecolumn todatatypefor consistencyNew Features