Add archive.org download support to download.py by ali5ter · Pull Request #25 · ali5ter/publication-library

ali5ter · 2026-04-13T01:23:58Z

Summary

Auto-detects source from URL: archive.org/details/ routes to the new archive.org downloader; all other URLs use the existing World Radio History scrape path unchanged — no behaviour change for existing usage
New --pdf-format {text,image,both} flag (default text) selects between *_text.pdf (Abbyy OCR layer, what convert.py extracts from) and plain image PDFs
New --year-from / --year-to flags filter by year extracted from the filename, so a decade slice like --year-from 1974 --year-to 1989 works without a full-collection download
Adds requirements.txt listing both pymupdf and internetarchive; updates README with archive.org usage examples and a flag reference table

Closes #4.

Test plan

python3 download.py --help shows all new flags
python3 download.py "https://archive.org/details/ElektorMagazine" --year-from 1974 --year-to 1974 --dry-run lists only 1974 _text.pdf files without downloading
python3 download.py "https://archive.org/details/ElektorMagazine" --pdf-format image --dry-run lists only plain .pdf files (no _text suffix)
python3 download.py "https://www.worldradiohistory.com/ETI_Magazine.htm" --dry-run still works as before
markdownlint README.md passes with zero warnings

🤖 Generated with Claude Code

Auto-detect source from URL: archive.org/details/ routes to the new internetarchive-backed downloader; all other URLs use the existing World Radio History scrape path unchanged. New flags: --pdf-format (text/image/both, default text), --year-from, --year-to. Adds requirements.txt and updates README with archive.org usage examples. Closes #4. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ali5ter merged commit b8785d9 into main Apr 13, 2026
1 check passed

ali5ter deleted the feature/archive-org-download branch April 13, 2026 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add archive.org download support to download.py#25

Add archive.org download support to download.py#25
ali5ter merged 1 commit into
mainfrom
feature/archive-org-download

ali5ter commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ali5ter commented Apr 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant