Skip to content

Bring harden-parsing (#28) into master#29

Merged
gdevenyi merged 2 commits into
masterfrom
align-bids-nomenclature
Jun 11, 2026
Merged

Bring harden-parsing (#28) into master#29
gdevenyi merged 2 commits into
masterfrom
align-bids-nomenclature

Conversation

@gdevenyi

Copy link
Copy Markdown
Member

#28 was merged into align-bids-nomenclature after that branch had already merged to master via #27, so the parsing-hardening commit (433229c) never reached master. This PR brings it in ahead of the release.

Contents (the #28 fix):

  • Anchored datatype/derivatives extraction in _libBIDSsh_parse_filename (fixes the study_func/.../anatfunc substring false-match and the grep | head SIGPIPE → NA fallback).
  • generate_entity_patterns.sh emits the anchored datatype regex.
  • Regression tests (suite now 9/9).

🤖 Generated with Claude Code

gdevenyi and others added 2 commits June 11, 2026 12:47
_libBIDSsh_parse_filename derived the datatype and derivatives pipeline
with substring greps piped through head/awk. Two latent defects:

- Substring false-match: `grep -oE "(anat|...|func|...)"` matched the
  datatype anywhere in the path, so e.g. `study_func_proj/sub-01/anat/...`
  was mis-detected as `func` instead of `anat`. The same flaw let a
  directory like `myderivatives/` be read as a derivatives dataset.
- SIGPIPE fragility: `grep ... | head -1 || echo "NA"` can have grep
  killed by SIGPIPE when head closes the pipe; under `set -o pipefail`
  the pipeline then reports failure and silently falls back to `NA`.

Replace both with anchored bash regex on the path (no subshells, no
pipes): datatype is the file's immediate parent directory matched as a
whole component `(^|/)<datatype>$`; the derivatives pipeline is the
component right after `(^|/)derivatives/`. generate_entity_patterns.sh
now emits the anchored datatype regex to keep the generated block in
sync.

Add regression tests covering the substring false-match, emg detection,
missing-datatype NA, derivatives extraction, and the `myderivatives`
non-match. Suite: 9/9 pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Harden datatype/derivatives extraction in filename parser
Copilot AI review requested due to automatic review settings June 11, 2026 17:15
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@gdevenyi, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 26 minutes and 49 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more credits in the billing tab to continue.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b6985c18-d792-44ce-a9c5-7fe41d558afc

📥 Commits

Reviewing files that changed from the base of the PR and between e1a10de and 1b333f2.

📒 Files selected for processing (3)
  • generate_entity_patterns.sh
  • libBIDS.sh
  • test_libBIDS.sh
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch align-bids-nomenclature

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gdevenyi gdevenyi merged commit 44fa25f into master Jun 11, 2026
1 of 2 checks passed
@gdevenyi gdevenyi deleted the align-bids-nomenclature branch June 11, 2026 17:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR brings the parsing-hardening fix from #28 into master ahead of release, improving _libBIDSsh_parse_filename correctness and robustness when extracting datatype and derivatives from paths.

Changes:

  • Reworked datatype extraction to be anchored to the file’s immediate parent directory (whole path component match), avoiding substring false-matches.
  • Reworked derivatives extraction to use an anchored bash regex (no grep|head pipelines), avoiding SIGPIPE/pipefail fragility.
  • Added regression tests covering datatype and derivatives extraction edge cases, and updated the generator script to emit the anchored datatype regex.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
libBIDS.sh Replaces substring/pipe-based extraction with anchored bash-regex parsing for datatype and derivatives.
generate_entity_patterns.sh Updates emitted datatype regex to the anchored form used by the parser.
test_libBIDS.sh Adds a focused regression test for anchored datatype extraction and derivatives pipeline detection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants