Improve metadata: add resource/field descriptions and fix year field type#14
Merged
Conversation
…type - Add description to resources "gdp" and "top-economies" - Add description to all fields missing one (country, year, Country Name, Country Code, Year) - Fix top-economies.year type from integer → year (values are four-digit calendar years) - Document in README that top-economies.csv is manually maintained and not regenerated by the automated script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
License: - Change declared license from ODC-PDDL-1.0 to CC-BY-4.0 to match the World Bank upstream source, which publishes under CC BY 4.0 - Add attribution statement to README license section top-economies.csv automation: - Add generate_top_economies() to process.py: reads gdp.csv, filters regional aggregates via WB Metadata_Country file (falls back to a hardcoded exclusion set when running locally without fresh cache), selects top-10 countries by latest-year GDP, writes rows from 2000 onward in USD trillions - Fix output paths to use script_dir-relative paths so the script works correctly when run from scripts/ (as CI does) - update_datapackage() now keeps view title and resource description year range in sync with the generated data - Regenerate data/top-economies.csv: extends coverage from 2022 to 2023 and reorders by 2023 GDP rank (Germany now #3, ahead of Japan) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Metadata (datapackage.json / README)
descriptionto resourcesgdpandtop-economiesdescriptionto all schema fields that were missing one:country,year(top-economies),Country Name,Country Code,Year(gdp)top-economies.yeartype frominteger→year(values are four-digit calendar years; Frictionlessyearis the most specific correct type)License
ODC-PDDL-1.0toCC-BY-4.0to match the World Bank upstream source, which publishes under CC BY 4.0README.mdlicense sectiontop-economies.csv automation
generate_top_economies()toscripts/process.py: readsgdp.csv, filters regional aggregates via the WBMetadata_Countryfile (falls back to a hardcoded exclusion set of 49 aggregate codes when running locally without fresh cache), selects top-10 countries by latest-year GDP, writes derived file in USD trillionsupdate_datapackage()now keeps view title and resource description year range in sync with generated data on every runscript_dir-relative paths so the script works correctly when run from thescripts/directory (as CI does)data/top-economies.csv: extends coverage from 2022 to 2023, reorders by 2023 GDP rank (Germany now no.3, ahead of Japan)🤖 Generated with Claude Code