Scripts and notebooks for collecting science fiction texts and metadata from various online sources.
Scrape and download texts from Project Gutenberg bookshelves. Includes a notebook for browsing bookshelves, downloading full texts, and basic cleaning.
Download science fiction collections from the Internet Archive using their Python API. Covers the Project Gutenberg SF collection, Pulp Magazine Archive, and other SF holdings.
- Star_Trek_API.ipynb -- Query the STAPI (Star Trek API) for character and episode data. Uses R via rpy2.
- Wikidata Star Trek.ipynb -- Query Wikidata via SPARQL for Star Trek series, cast, species, and films. Uses Python/SPARQLWrapper.
Query the NYT Books API for bestseller list data. Requires an API key (stored in a local config.py, not tracked).
Requires Python 3.9+. Install dependencies with uv:
uv syncOr with pip:
pip install -e .The Star Trek API notebook (StarTrek/Star_Trek_API.ipynb) additionally requires R and rpy2.
The NYT Books notebook requires a local config.py with your API key (not tracked in git).