Skip to content

Improve metadata: add resource/field descriptions and fix year field type#14

Merged
olayway merged 3 commits into
mainfrom
improve-metadata
May 19, 2026
Merged

Improve metadata: add resource/field descriptions and fix year field type#14
olayway merged 3 commits into
mainfrom
improve-metadata

Conversation

@olayway
Copy link
Copy Markdown
Contributor

@olayway olayway commented May 19, 2026

Changes

Metadata (datapackage.json / README)

  • Added description to resources gdp and top-economies
  • Added description to all schema fields that were missing one: country, year (top-economies), Country Name, Country Code, Year (gdp)
  • Fixed top-economies.year type from integeryear (values are four-digit calendar years; Frictionless year is the most specific correct type)

License

  • Changed declared license from ODC-PDDL-1.0 to CC-BY-4.0 to match the World Bank upstream source, which publishes under CC BY 4.0
  • Added attribution statement to README.md license section

top-economies.csv automation

  • Added generate_top_economies() to scripts/process.py: reads gdp.csv, filters regional aggregates via the WB Metadata_Country file (falls back to a hardcoded exclusion set of 49 aggregate codes when running locally without fresh cache), selects top-10 countries by latest-year GDP, writes derived file in USD trillions
  • update_datapackage() now keeps view title and resource description year range in sync with generated data on every run
  • Fixed output paths to use script_dir-relative paths so the script works correctly when run from the scripts/ directory (as CI does)
  • Regenerated data/top-economies.csv: extends coverage from 2022 to 2023, reorders by 2023 GDP rank (Germany now no.3, ahead of Japan)

🤖 Generated with Claude Code

olayway and others added 3 commits May 19, 2026 14:35
…type

- Add description to resources "gdp" and "top-economies"
- Add description to all fields missing one (country, year, Country Name, Country Code, Year)
- Fix top-economies.year type from integer → year (values are four-digit calendar years)
- Document in README that top-economies.csv is manually maintained and not regenerated by the automated script

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
License:
- Change declared license from ODC-PDDL-1.0 to CC-BY-4.0 to match the
  World Bank upstream source, which publishes under CC BY 4.0
- Add attribution statement to README license section

top-economies.csv automation:
- Add generate_top_economies() to process.py: reads gdp.csv, filters
  regional aggregates via WB Metadata_Country file (falls back to a
  hardcoded exclusion set when running locally without fresh cache),
  selects top-10 countries by latest-year GDP, writes rows from 2000
  onward in USD trillions
- Fix output paths to use script_dir-relative paths so the script works
  correctly when run from scripts/ (as CI does)
- update_datapackage() now keeps view title and resource description year
  range in sync with the generated data
- Regenerate data/top-economies.csv: extends coverage from 2022 to 2023
  and reorders by 2023 GDP rank (Germany now #3, ahead of Japan)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@olayway olayway merged commit 635b071 into main May 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant