Skip to content

SF-Nexus/webscraping-SF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

webscraping-SF

Scripts and notebooks for collecting science fiction texts and metadata from various online sources.

Contents

gutenberg/

Scrape and download texts from Project Gutenberg bookshelves. Includes a notebook for browsing bookshelves, downloading full texts, and basic cleaning.

internetarchive/

Download science fiction collections from the Internet Archive using their Python API. Covers the Project Gutenberg SF collection, Pulp Magazine Archive, and other SF holdings.

StarTrek/

  • Star_Trek_API.ipynb -- Query the STAPI (Star Trek API) for character and episode data. Uses R via rpy2.
  • Wikidata Star Trek.ipynb -- Query Wikidata via SPARQL for Star Trek series, cast, species, and films. Uses Python/SPARQLWrapper.

nytimes_books/

Query the NYT Books API for bestseller list data. Requires an API key (stored in a local config.py, not tracked).

Setup

Requires Python 3.9+. Install dependencies with uv:

uv sync

Or with pip:

pip install -e .

The Star Trek API notebook (StarTrek/Star_Trek_API.ipynb) additionally requires R and rpy2.

The NYT Books notebook requires a local config.py with your API key (not tracked in git).

About

scripts for scraping webarchives

Resources

Stars

Watchers

Forks

Contributors