Skip to content

Add Frictionless Data Table Schema for libBIDS.sh output#26

Open
gdevenyi wants to merge 3 commits into
masterfrom
add-frictionless-table-schema
Open

Add Frictionless Data Table Schema for libBIDS.sh output#26
gdevenyi wants to merge 3 commits into
masterfrom
add-frictionless-table-schema

Conversation

@gdevenyi

@gdevenyi gdevenyi commented Mar 20, 2026

Copy link
Copy Markdown
Member

Define a comprehensive table_schema.json following the Frictionless Data Table Schema specification, enriched with metadata from the official BIDS schema (bids-standard/bids-schema). Includes titles, descriptions, format annotations (label/index), and constraints (enum, pattern, required, unique) for all 36 output fields.

Fixes #14

Summary by CodeRabbit

  • Chores
    • Added a new tabular data schema to standardize BIDS-like metadata for derivatives and raw files.
    • Provides built-in validation: required fields, allowed suffixes/extensions, label and numeric format checks, unique path primary key, and missing-value handling to improve data integrity and interoperability.

Define a comprehensive table_schema.json following the Frictionless Data
Table Schema specification, enriched with metadata from the official BIDS
schema (bids-standard/bids-schema). Includes titles, descriptions, format
annotations (label/index), and constraints (enum, pattern, required, unique)
for all 36 output fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a3e70de4-7446-4dce-a4e6-0935d9f69efb

📥 Commits

Reviewing files that changed from the base of the PR and between eda36da and fcfe745.

📒 Files selected for processing (1)
  • table_schema.json
✅ Files skipped from review due to trivial changes (1)
  • table_schema.json

📝 Walkthrough

Walkthrough

Added table_schema.json, a declarative tabular-data-schema JSON file that sets primaryKey: "path", missingValues: ["NA", ""], and declares 30+ validated fields (enums, regexes, numeric constraints) for BIDS-like tabular metadata including suffix, extension, entity labels, and index fields.

Changes

Cohort / File(s) Summary
Table Schema Definition
table_schema.json
New JSON schema (profile: "tabular-data-schema") added. Declares path as the primary key, missingValues: ["NA", ""], required fields data_type, suffix, extension, and ~30+ fields with types and validations (regex for entity labels, enums for categorical fields, numeric minimums for index fields).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 I hopped a schema through the glade,
Rows in order, columns arrayed,
Paths are loyal, tokens bright,
Regex guards them through the night,
A rabbit stamps the table "made."

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a Frictionless Data Table Schema for libBIDS.sh output.
Linked Issues check ✅ Passed The PR addresses issue #14 by providing the table format specification as a Frictionless Data Table Schema with comprehensive field definitions and validation rules.
Out of Scope Changes check ✅ Passed All changes are in-scope: the PR adds only table_schema.json with declarative schema definitions directly relevant to formalizing libBIDS.sh output structure per issue #14.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-frictionless-table-schema
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
table_schema.json (2)

8-14: Consider adding pattern constraint for derivatives.

The derivatives field currently has no pattern constraint. For consistency with other label-type fields and to ensure valid pipeline names, consider adding an alphanumeric pattern constraint.

📋 Proposed enhancement
 {
   "name": "derivatives",
   "title": "Derivative Pipeline",
   "type": "string",
-  "description": "Pipeline name extracted from the derivatives/ folder path. NA if the file is not in a derivatives directory."
+  "description": "Pipeline name extracted from the derivatives/ folder path. NA if the file is not in a derivatives directory.",
+  "constraints": {
+    "pattern": "^[a-zA-Z0-9]+$"
+  }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@table_schema.json` around lines 8 - 14, The "derivatives" field in the schema
has no pattern constraint; update the field object for "derivatives" (inside the
"fields" array) to include a "pattern" property that enforces valid pipeline
names (e.g., an alphanumeric pattern such as "^[A-Za-z0-9_-]+$" or your
preferred variant) and optionally add a "patternMessage" or "error" description
to explain invalid values; ensure the "type": "string" remains and that the new
pattern aligns with other label-type fields in the schema.

45-45: Replace non-standard format values with custom properties or document the extension.

The schema uses "format": "label" (25 fields) and "format": "index" (6 fields) throughout, but these are not standard Frictionless Data Table Schema format values. The official specification defines formats like default, email, uri, binary, and uuid. While non-standard formats may be ignored by Frictionless validators rather than causing errors, using custom properties (e.g., "bidsFormat": "label") or explicitly documenting this extension would improve clarity and standards compliance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@table_schema.json` at line 45, The schema uses non-standard "format" values
("label"/"index") across many field definitions; replace those custom usages
with a namespaced custom property (e.g., "bidsFormat": "label" or "x-format":
"label") wherever the "format" key is currently set, and update any code that
reads these fields to look for the new property instead of "format";
alternatively, if you must keep "format", add a top-level schema extension
comment or metadata entry documenting this non-standard extension so
validators/users know "label" and "index" are custom conventions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@table_schema.json`:
- Around line 358-376: The suffix enum in the JSON schema is missing three valid
BIDS suffixes; update the "enum" array (the suffix enumeration block shown) to
include "description", "emg", and "physioevents" so it matches the official BIDS
suffixes (referenced in src/schema/objects/suffixes.yaml); ensure you add the
three strings to the existing list alongside entries like "descriptions" and
"physio".

---

Nitpick comments:
In `@table_schema.json`:
- Around line 8-14: The "derivatives" field in the schema has no pattern
constraint; update the field object for "derivatives" (inside the "fields"
array) to include a "pattern" property that enforces valid pipeline names (e.g.,
an alphanumeric pattern such as "^[A-Za-z0-9_-]+$" or your preferred variant)
and optionally add a "patternMessage" or "error" description to explain invalid
values; ensure the "type": "string" remains and that the new pattern aligns with
other label-type fields in the schema.
- Line 45: The schema uses non-standard "format" values ("label"/"index") across
many field definitions; replace those custom usages with a namespaced custom
property (e.g., "bidsFormat": "label" or "x-format": "label") wherever the
"format" key is currently set, and update any code that reads these fields to
look for the new property instead of "format"; alternatively, if you must keep
"format", add a top-level schema extension comment or metadata entry documenting
this non-standard extension so validators/users know "label" and "index" are
custom conventions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 18ccbbc4-7844-4932-a3bb-5dd0b923485a

📥 Commits

Reviewing files that changed from the base of the PR and between ad2a0f7 and b19176e.

📒 Files selected for processing (1)
  • table_schema.json

Comment thread table_schema.json
gdevenyi and others added 2 commits March 19, 2026 23:41
… derivatives

- Add three missing BIDS suffixes: description, emg, physioevents
- Add pattern constraint for derivatives field (alphanumeric, hyphens, underscores)
- Rename non-standard "format" to "bidsType" for Frictionless spec compliance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix extension enum: remove leading dots to match actual output
  (code uses ${filename#*.} which produces "nii.gz" not ".nii.gz")
- Change index fields (run, echo, flip, inversion, split, chunk)
  from type string+pattern to type integer+minimum, matching BIDS
  semantics (missingValues handles NA→null before type casting)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Work on spec describing the table format

1 participant