Skip to content

feat: unified Zenodo download — Python public script, request URL support, orchestrator#73

Open
larsvilhuber wants to merge 14 commits intodevelopmentfrom
feature/zenodo-unified-download
Open

feat: unified Zenodo download — Python public script, request URL support, orchestrator#73
larsvilhuber wants to merge 14 commits intodevelopmentfrom
feature/zenodo-unified-download

Conversation

@larsvilhuber
Copy link
Copy Markdown
Member

Summary

  • tools/download_zenodo_public.py (new): pure-Python replacement for download_zenodo_public.sh; accepts record ID, URL, or DOI; generates SHA-256/MD5/metadata manifests in generated/ matching the draft script's format; deprecated shell wrapper retained for external callers
  • tools/download_zenodo_draft.py (updated): added support for Zenodo community-request URLs (/communities/.../requests/{uuid}); resolves UUID to record ID via GET /api/requests/{uuid} before downloading
  • tools/download_zenodo.py (new): orchestrator — parses any Zenodo URL/DOI/ID, routes to the correct downloader, optionally queries Jira for the replication URL when no --zenodo-id is given; --print-id emits zenodo-NNNN for pipeline shell capture
  • bitbucket-pipelines.yml (updated): both 1-populate-from-icpsr and w-big-populate-from-icpsr now call the orchestrator; ZenodoID pipeline variable now accepts full URLs, DOIs, or community request URLs in addition to numeric IDs; URL-to-ID resolution added in parallel processing steps
  • tools/download_from_jira_url.py (updated): public Zenodo call switched from bash shell script to Python script

Test Plan

  • Dry-run against a public record: python3.12 tools/download_zenodo_public.py --dry-run 10848594
  • Orchestrator URL classification covers all input types (numeric ID, /records/, /record/, DOI, /deposit/, /communities/.../requests/)
  • Orchestrator --print-id outputs zenodo-NNNN as last line for pipeline capture
  • download_zenodo_draft.py accepts a community request URL and routes to resolve_request_to_record_id() (requires valid Zenodo token)
  • Both populate pipelines: run with a numeric ZenodoID to verify projectID is set correctly
  • Both populate pipelines: run with a full URL as ZenodoID to verify URL resolution works

Notes

  • The download_zenodo_draft.py CI git block still uses os.system() (pre-existing pattern); this is noted as a known issue for a follow-up
  • Community request URL resolution requires a valid ZENODO_ACCESS_TOKEN; if the token is unavailable during pre-resolution for --print-id, the orchestrator omits the --print-id output rather than printing a wrong UUID

🤖 Generated with Claude Code

larsvilhuber and others added 14 commits April 3, 2026 08:03
…eration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…om __file__

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nodo scripts

Sort metadata CSV rows in save_checksums() (download_zenodo_draft.py) to match
the sorted output produced by save_manifests() in download_zenodo_public.py.
Both scripts now write filename,bytes rows in sorted path order for reproducibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…do downloads

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…L check, use sys.executable

- Fix 1: Remove `or identifier` fallback in --print-id so UUID is not used as record_id when resolution fails
- Fix 2: Use mod.get_access_token() in record_id_from_request() for .env file support
- Fix 3: Tighten gate check to require zenodo.org (or DOI pattern) instead of just 'zenodo' substring
- Fix 4: Replace hardcoded python3.12 with sys.executable; use _script_dir() for jira_get_info.py path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ines

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… in commit step

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant