This is a work in progress, not a finished product.
A Retrieval-Augmented Generation (RAG) system for analyzing James Joyce's Finnegans Wake, currently covering Chapter I.8 (the "Anna Livia Plurabelle" chapter, FW pages 196–216).
James Joyce's Finnegans Wake is one of literature's most complex texts — every page contains dozens of languages, allusions, motifs, and textual variants, meticulously documented in the Fweet annotation database.
This project makes that annotation data queryable through natural language. It currently covers Chapter I.8 and combines two sources — Fweet and Campbell's Skeleton Key — but the architecture is designed to scale: additional chapters, 20+ books of secondary literature, and further interfaces can be added incrementally.
The system combines two complementary sources:
- Fweet annotations (fweet.org) — line-by-line annotations including the original text from the Faber & Faber edition, languages, motifs, river names, clusters, references, and textual variants
- Campbell's A Skeleton Key to Finnegans Wake — narrative commentary and interpretation
Both sources are stored in a ChromaDB vector database and queried simultaneously. Answers are generated by Azure OpenAI (GPT-4.1) with explicit citations to both sources. The interface runs locally via Streamlit and can also be deployed to the cloud.
Planned additions: Further secondary literature and expanded chapter coverage.
The main interface is a Streamlit app:
streamlit run app.pyWhat you can do:
- Enter a FW line range (e.g.
213.11 to 213.36) to get a full line-by-line annotated commentary — every line explained with its original Joyce text, Fweet annotation data, and Campbell's interpretation integrated inline - Add an instruction after a dash to control the output:
213.11 to 213.36 — for each line, generate a short narrative note in the style of Fweet: one plain sentence in parentheses explaining what is literally happening at this moment in the scene. Draw on both the Fweet annotations and Campbell where relevant. Do not speculate beyond what the sources reveal.213.11 to 213.36 — list all river names per line213.11 to 213.36 — identify all multilingual wordplay213.11 to 213.36 — focus on Campbell's interpretation213.11 to 213.36 — explain in German213.11 to 213.36 — explain in Chinese213.11 to 213.36 — explain in Hebrew
- Ask free-form questions: "Who are the two washerwomen?", "What is the significance of the River Liffey?"
The line range can be adjusted to any range within Chapter I.8 (pp. 196–216). You can ask anything — be creative.
Line-by-line narrative notes in the style of Fweet (line range mode):
River names per line (line range mode):
finnegans-wake/
├── app.py # Streamlit web interface (main entry point)
├── setup_chromadb.py # Build ChromaDB from JSON sources (Render build)
├── src/ # Data pipeline scripts
│ ├── parse_fweet_html.py # Fweet HTML parser + I.8 batch runner
│ ├── chunk_campbell.py # Split Campbell text into page-level chunks
│ └── load_data.py # Load all sources into ChromaDB (scalable)
├── data/
│ ├── fweet_i8.json # Parsed Fweet annotations
│ └── campbell_chunks.json # Campbell chunks
├── render.yaml # Render deployment config
├── requirements.txt
├── .gitignore
└── README.md
git clone https://github.com/wbcpa/finnegans-wake.git
cd finnegans-wakepip install -r requirements.txtCreate a .env file in the root directory:
AZURE_OPENAI_KEY=your_key_here
AZURE_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_DEPLOYMENT_NAME=your-deployment-name
AZURE_API_VERSION=2025-01-01-preview
Fweet HTML pages (requires Fweet account):
- Save pages 196–216 from fweet.org into
data/fweet_html/ - Run:
python src/parse_fweet_html.py
Campbell text (requires the book):
- Extract text from the PDF, place
campbell_fixed.txtindata/books/ - Run:
python src/chunk_campbell.py
python src/load_data.py| Source | Coverage | Access |
|---|---|---|
| Fweet (fweet.org) | FW 196–216 (Chapter I.8) | Requires account |
| Campbell, A Skeleton Key to Finnegans Wake | Full book | Requires book |
Configured for Render via render.yaml. The build command runs setup_chromadb.py to populate ChromaDB from the committed JSON files, then starts Streamlit as the web service.
- Expand Fweet coverage beyond Chapter I.8
- Add 20+ books of secondary literature (McHugh, Glasheen, and others)
- Web-based query interface — see
app.py - Multilingual output (German, Chinese, Hebrew, and any other language)
This project is intended purely for personal academic research and private use. Campbell's text is not included in this repository — only derived data structures generated from a legally obtained copy.
This project builds on the extraordinary annotation work collected at fweet.org.
Built in collaboration with Claude (Anthropic) — used throughout as a coding and research assistant.

