Finnegans Wake RAG System

This is a work in progress, not a finished product.

A Retrieval-Augmented Generation (RAG) system for analyzing James Joyce's Finnegans Wake, currently covering Chapter I.8 (the "Anna Livia Plurabelle" chapter, FW pages 196–216).

Motivation

James Joyce's Finnegans Wake is one of literature's most complex texts — every page contains dozens of languages, allusions, motifs, and textual variants, meticulously documented in the Fweet annotation database.

This project makes that annotation data queryable through natural language. It currently covers Chapter I.8 and combines two sources — Fweet and Campbell's Skeleton Key — but the architecture is designed to scale: additional chapters, 20+ books of secondary literature, and further interfaces can be added incrementally.

Overview

The system combines two complementary sources:

Fweet annotations (fweet.org) — line-by-line annotations including the original text from the Faber & Faber edition, languages, motifs, river names, clusters, references, and textual variants
Campbell's A Skeleton Key to Finnegans Wake — narrative commentary and interpretation

Both sources are stored in a ChromaDB vector database and queried simultaneously. Answers are generated by Azure OpenAI (GPT-4.1) with explicit citations to both sources. The interface runs locally via Streamlit and can also be deployed to the cloud.

Planned additions: Further secondary literature and expanded chapter coverage.

Web Interface

The main interface is a Streamlit app:

streamlit run app.py

What you can do:

Enter a FW line range (e.g. 213.11 to 213.36) to get a full line-by-line annotated commentary — every line explained with its original Joyce text, Fweet annotation data, and Campbell's interpretation integrated inline
Add an instruction after a dash to control the output:
- 213.11 to 213.36 — for each line, generate a short narrative note in the style of Fweet: one plain sentence in parentheses explaining what is literally happening at this moment in the scene. Draw on both the Fweet annotations and Campbell where relevant. Do not speculate beyond what the sources reveal.
- 213.11 to 213.36 — list all river names per line
- 213.11 to 213.36 — identify all multilingual wordplay
- 213.11 to 213.36 — focus on Campbell's interpretation
- 213.11 to 213.36 — explain in German
- 213.11 to 213.36 — explain in Chinese
- 213.11 to 213.36 — explain in Hebrew
Ask free-form questions: "Who are the two washerwomen?", "What is the significance of the River Liffey?"

The line range can be adjusted to any range within Chapter I.8 (pp. 196–216). You can ask anything — be creative.

Screenshots

Line-by-line narrative notes in the style of Fweet (line range mode):

River names per line (line range mode):

Project Structure

finnegans-wake/
├── app.py                      # Streamlit web interface (main entry point)
├── setup_chromadb.py           # Build ChromaDB from JSON sources (Render build)
├── src/                        # Data pipeline scripts
│   ├── parse_fweet_html.py     # Fweet HTML parser + I.8 batch runner
│   ├── chunk_campbell.py       # Split Campbell text into page-level chunks
│   └── load_data.py            # Load all sources into ChromaDB (scalable)
├── data/
│   ├── fweet_i8.json           # Parsed Fweet annotations
│   └── campbell_chunks.json    # Campbell chunks
├── render.yaml                 # Render deployment config
├── requirements.txt
├── .gitignore
└── README.md

Setup

1. Clone the repository

git clone https://github.com/wbcpa/finnegans-wake.git
cd finnegans-wake

2. Install dependencies

pip install -r requirements.txt

3. Configure environment variables

Create a .env file in the root directory:

AZURE_OPENAI_KEY=your_key_here
AZURE_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_DEPLOYMENT_NAME=your-deployment-name
AZURE_API_VERSION=2025-01-01-preview

4. Prepare the data

Fweet HTML pages (requires Fweet account):

Save pages 196–216 from fweet.org into data/fweet_html/
Run: python src/parse_fweet_html.py

Campbell text (requires the book):

Extract text from the PDF, place campbell_fixed.txt in data/books/
Run: python src/chunk_campbell.py

5. Load data into ChromaDB

python src/load_data.py

Data Sources

Source	Coverage	Access
Fweet (fweet.org)	FW 196–216 (Chapter I.8)	Requires account
Campbell, A Skeleton Key to Finnegans Wake	Full book	Requires book

Deployment

Configured for Render via render.yaml. The build command runs setup_chromadb.py to populate ChromaDB from the committed JSON files, then starts Streamlit as the web service.

Planned Extensions

Expand Fweet coverage beyond Chapter I.8
Add 20+ books of secondary literature (McHugh, Glasheen, and others)
Web-based query interface — see app.py
Multilingual output (German, Chinese, Hebrew, and any other language)

License

This project is intended purely for personal academic research and private use. Campbell's text is not included in this repository — only derived data structures generated from a legally obtained copy.

Acknowledgements

This project builds on the extraordinary annotation work collected at fweet.org.

Built in collaboration with Claude (Anthropic) — used throughout as a coding and research assistant.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Finnegans Wake RAG System

Motivation

Overview

Web Interface

Screenshots

Project Structure

Setup

1. Clone the repository

2. Install dependencies

3. Configure environment variables

4. Prepare the data

5. Load data into ChromaDB

Data Sources

Deployment

Planned Extensions

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
github_assets		github_assets
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
app.py		app.py
render.yaml		render.yaml
requirements.txt		requirements.txt
setup_chromadb.py		setup_chromadb.py

Folders and files

Latest commit

History

Repository files navigation

Finnegans Wake RAG System

Motivation

Overview

Web Interface

Screenshots

Project Structure

Setup

1. Clone the repository

2. Install dependencies

3. Configure environment variables

4. Prepare the data

5. Load data into ChromaDB

Data Sources

Deployment

Planned Extensions

License

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages