Skip to content

wbcpa/finnegans-wake

Repository files navigation

Finnegans Wake RAG System

This is a work in progress, not a finished product.

A Retrieval-Augmented Generation (RAG) system for analyzing James Joyce's Finnegans Wake, currently covering Chapter I.8 (the "Anna Livia Plurabelle" chapter, FW pages 196–216).

Motivation

James Joyce's Finnegans Wake is one of literature's most complex texts — every page contains dozens of languages, allusions, motifs, and textual variants, meticulously documented in the Fweet annotation database.

This project makes that annotation data queryable through natural language. It currently covers Chapter I.8 and combines two sources — Fweet and Campbell's Skeleton Key — but the architecture is designed to scale: additional chapters, 20+ books of secondary literature, and further interfaces can be added incrementally.

Overview

The system combines two complementary sources:

  • Fweet annotations (fweet.org) — line-by-line annotations including the original text from the Faber & Faber edition, languages, motifs, river names, clusters, references, and textual variants
  • Campbell's A Skeleton Key to Finnegans Wake — narrative commentary and interpretation

Both sources are stored in a ChromaDB vector database and queried simultaneously. Answers are generated by Azure OpenAI (GPT-4.1) with explicit citations to both sources. The interface runs locally via Streamlit and can also be deployed to the cloud.

Planned additions: Further secondary literature and expanded chapter coverage.

Web Interface

The main interface is a Streamlit app:

streamlit run app.py

What you can do:

  • Enter a FW line range (e.g. 213.11 to 213.36) to get a full line-by-line annotated commentary — every line explained with its original Joyce text, Fweet annotation data, and Campbell's interpretation integrated inline
  • Add an instruction after a dash to control the output:
    • 213.11 to 213.36 — for each line, generate a short narrative note in the style of Fweet: one plain sentence in parentheses explaining what is literally happening at this moment in the scene. Draw on both the Fweet annotations and Campbell where relevant. Do not speculate beyond what the sources reveal.
    • 213.11 to 213.36 — list all river names per line
    • 213.11 to 213.36 — identify all multilingual wordplay
    • 213.11 to 213.36 — focus on Campbell's interpretation
    • 213.11 to 213.36 — explain in German
    • 213.11 to 213.36 — explain in Chinese
    • 213.11 to 213.36 — explain in Hebrew
  • Ask free-form questions: "Who are the two washerwomen?", "What is the significance of the River Liffey?"

The line range can be adjusted to any range within Chapter I.8 (pp. 196–216). You can ask anything — be creative.

Screenshots

Line-by-line narrative notes in the style of Fweet (line range mode):

Line range mode

River names per line (line range mode):

River names query

Project Structure

finnegans-wake/
├── app.py                      # Streamlit web interface (main entry point)
├── setup_chromadb.py           # Build ChromaDB from JSON sources (Render build)
├── src/                        # Data pipeline scripts
│   ├── parse_fweet_html.py     # Fweet HTML parser + I.8 batch runner
│   ├── chunk_campbell.py       # Split Campbell text into page-level chunks
│   └── load_data.py            # Load all sources into ChromaDB (scalable)
├── data/
│   ├── fweet_i8.json           # Parsed Fweet annotations
│   └── campbell_chunks.json    # Campbell chunks
├── render.yaml                 # Render deployment config
├── requirements.txt
├── .gitignore
└── README.md

Setup

1. Clone the repository

git clone https://github.com/wbcpa/finnegans-wake.git
cd finnegans-wake

2. Install dependencies

pip install -r requirements.txt

3. Configure environment variables

Create a .env file in the root directory:

AZURE_OPENAI_KEY=your_key_here
AZURE_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_DEPLOYMENT_NAME=your-deployment-name
AZURE_API_VERSION=2025-01-01-preview

4. Prepare the data

Fweet HTML pages (requires Fweet account):

  • Save pages 196–216 from fweet.org into data/fweet_html/
  • Run: python src/parse_fweet_html.py

Campbell text (requires the book):

  • Extract text from the PDF, place campbell_fixed.txt in data/books/
  • Run: python src/chunk_campbell.py

5. Load data into ChromaDB

python src/load_data.py

Data Sources

Source Coverage Access
Fweet (fweet.org) FW 196–216 (Chapter I.8) Requires account
Campbell, A Skeleton Key to Finnegans Wake Full book Requires book

Deployment

Configured for Render via render.yaml. The build command runs setup_chromadb.py to populate ChromaDB from the committed JSON files, then starts Streamlit as the web service.

Planned Extensions

  • Expand Fweet coverage beyond Chapter I.8
  • Add 20+ books of secondary literature (McHugh, Glasheen, and others)
  • Web-based query interface — see app.py
  • Multilingual output (German, Chinese, Hebrew, and any other language)

License

This project is intended purely for personal academic research and private use. Campbell's text is not included in this repository — only derived data structures generated from a legally obtained copy.

Acknowledgements

This project builds on the extraordinary annotation work collected at fweet.org.

Built in collaboration with Claude (Anthropic) — used throughout as a coding and research assistant.

Releases

No releases published

Packages

 
 
 

Contributors

Languages