Bring harden-parsing (#28) into master#29
Conversation
_libBIDSsh_parse_filename derived the datatype and derivatives pipeline with substring greps piped through head/awk. Two latent defects: - Substring false-match: `grep -oE "(anat|...|func|...)"` matched the datatype anywhere in the path, so e.g. `study_func_proj/sub-01/anat/...` was mis-detected as `func` instead of `anat`. The same flaw let a directory like `myderivatives/` be read as a derivatives dataset. - SIGPIPE fragility: `grep ... | head -1 || echo "NA"` can have grep killed by SIGPIPE when head closes the pipe; under `set -o pipefail` the pipeline then reports failure and silently falls back to `NA`. Replace both with anchored bash regex on the path (no subshells, no pipes): datatype is the file's immediate parent directory matched as a whole component `(^|/)<datatype>$`; the derivatives pipeline is the component right after `(^|/)derivatives/`. generate_entity_patterns.sh now emits the anchored datatype regex to keep the generated block in sync. Add regression tests covering the substring false-match, emg detection, missing-datatype NA, derivatives extraction, and the `myderivatives` non-match. Suite: 9/9 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Harden datatype/derivatives extraction in filename parser
|
Warning Review limit reached
More reviews will be available in 26 minutes and 49 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR brings the parsing-hardening fix from #28 into master ahead of release, improving _libBIDSsh_parse_filename correctness and robustness when extracting datatype and derivatives from paths.
Changes:
- Reworked
datatypeextraction to be anchored to the file’s immediate parent directory (whole path component match), avoiding substring false-matches. - Reworked
derivativesextraction to use an anchored bash regex (nogrep|headpipelines), avoiding SIGPIPE/pipefailfragility. - Added regression tests covering datatype and derivatives extraction edge cases, and updated the generator script to emit the anchored datatype regex.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
libBIDS.sh |
Replaces substring/pipe-based extraction with anchored bash-regex parsing for datatype and derivatives. |
generate_entity_patterns.sh |
Updates emitted datatype regex to the anchored form used by the parser. |
test_libBIDS.sh |
Adds a focused regression test for anchored datatype extraction and derivatives pipeline detection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
#28 was merged into
align-bids-nomenclatureafter that branch had already merged to master via #27, so the parsing-hardening commit (433229c) never reached master. This PR brings it in ahead of the release.Contents (the #28 fix):
_libBIDSsh_parse_filename(fixes thestudy_func/.../anat→funcsubstring false-match and thegrep | headSIGPIPE →NAfallback).generate_entity_patterns.shemits the anchored datatype regex.🤖 Generated with Claude Code