Skip to content

Fix Frictionless data package validation (FK to geography-keys CSV; enable CI validation)#332

Open
ywkim312 wants to merge 1 commit intomainfrom
311-frictionless-validation-breaks-on-data-packages-due-to-shapefile-foreign-keys
Open

Fix Frictionless data package validation (FK to geography-keys CSV; enable CI validation)#332
ywkim312 wants to merge 1 commit intomainfrom
311-frictionless-validation-breaks-on-data-packages-due-to-shapefile-foreign-keys

Conversation

@ywkim312
Copy link
Copy Markdown
Collaborator

@ywkim312 ywkim312 commented Mar 24, 2026

Closes #311

Summary

  • Resolves foreign key validation failures when CSV resources referenced shapefiles (FileResource has no row_stream in frictionless-py) — see Frictionless validation breaks on data packages due to shapefile foreign keys #311.
  • Adds a tabular *-geography-keys.csv resource per geography (from the same get_blank_dataframe() as the main tables) and points foreignKeys at that CSV instead of the shapefile.
  • Removes --skip-validation from the Create Data Packages workflow and drops the DSuite2023-only --skip-foreign-keys workaround.

How to test

  1. Local package build + validation (from repo backend/ with FLASK_APP=oeps):
    pip install -e .
    flask create-data-package -c DSuite2018 --overwrite
    flask create-data-package -c DSuite2023 --overwrite

Omit --skip-validation or --skip-foreign-keys. Confirm the command finishes with exit code 0 and the log ends with a validation summary showing 0 errors (tasks include *-geography-keys resources and geo shapefile resources).

Optional — inspect output

Open the generated folder under backend/.temp/data-packages/ (or your --destination) and check data-package.json: main CSV foreignKeys should reference {county|state|tract|zcta}-geography-keys, not geo-* shapefile names.
Confirm data/*-geography-keys.csv files exist next to the main geography CSVs.
CI

After merge, or on this branch: run the Create Data Packages workflow manually (workflow_dispatch) if package_rules did not change (no automatic run). Confirm the job succeeds.

Note on follow-up fixes

Validation is now enforced in CI instead of skipped. The first few runs after this change may surface new failures (e.g. schema/FK mismatches, data edge cases, or frictionless upgrades) that were previously hidden. If a workflow run fails, use the job logs and the generated error-report.json in the package output (when produced) to diagnose, then open a small follow-up PR to adjust package structure or data as needed. That tightening is expected and improves long-term package quality.

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 24, 2026

Deploy Preview for oeps canceled.

Name Link
🔨 Latest commit f8de340
🔍 Latest deploy log https://app.netlify.com/projects/oeps/deploys/69c3037aac8299000808ef0b

@ywkim312 ywkim312 self-assigned this Mar 24, 2026
@ywkim312 ywkim312 requested a review from Makosak March 24, 2026 21:39
@ywkim312 ywkim312 requested a review from bodom0015 April 2, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Frictionless validation breaks on data packages due to shapefile foreign keys

1 participant