- Robust Error Handling: All commands now provide clear error messages and feedback using
click.echo, including REPL and CLI operations. - Progress Bars Restored: Book processing and batch operations show real-time progress bars for better user feedback.
- Intelligent Batch Sizing: Batch size for embedding is now memory-aware and can be overridden via config.
- Config Override: Easily override batch size and other settings in
config.ymlor via CLI commands. - REPL Ask Command Fixes: The REPL's
askcommand now safely handles cases when no book is focused, with clear user feedback. - Focused Book Logic: You can focus/unfocus on a specific book for targeted queries; status and feedback are shown in the REPL.
- Parallel Processing: Directory and batch operations use parallel workers for faster ingestion (configurable in
config.yml). - Command Aliases: Short aliases for all major commands (
qfor ask,afor add,bfor batch,lfor list,sfor search,cfor clear,llfor detailed list). - Config Management: Use
config-show,config-set, andconfig-getto view and update configuration at runtime. - Enhanced Help & Status: REPL help and status commands show current focus, book count, and system health.
- Verbose Mode: Use
ask --verbosefor detailed context and debug info in answers. - Graceful Fallbacks: If parallel processing fails, the system falls back to sequential mode automatically.
- Session Persistence: REPL maintains command history and session state for seamless workflow.
A comprehensive Python application that processes ebooks using local Ollama AI models and creates a searchable knowledge base of your entire collection through advanced RAG (Retrieval Augmented Generation) techniques.
This project showcases the potential of human-AI collaboration in building sophisticated RAG systems:
- Concept & Vision: Original idea, architecture decisions, and quality control by Anthony Dawson
- Implementation & Design: Feature development, technical architecture, and system design created through collaboration with AI assistance (Claude - Anthropic)
- Development Process: Demonstrates effective patterns for human-AI collaboration in software development
This transparent approach highlights how AI tools can accelerate development while human expertise drives vision, requirements, and integration decisions.
- Intelligent Searchable Knowledge Base: Ask questions about your entire book collection using natural language
- Semantic Search: Find content by meaning and context, not just keywords
- Cross-Book Analysis: Compare themes, characters, and insights across multiple books
- AI-Powered Search Enhancement: Dynamically generates related search terms for better context retrieval
- Contextual Responses: Get detailed answers with proper citations from your specific books
- Persistent Memory: Build and maintain a growing vector database of all your processed books
- Multiple Format Support: EPUB, PDF, MOBI, AZW, AZW3, TXT, DOCX
- Local AI Processing: Uses Ollama models for complete privacy and control
- Intelligent Chunking: Smart text segmentation for optimal processing
- Batch Processing: Process entire directories of ebooks efficiently
- Flexible Output: JSON, Markdown, or plain text output formats
- Command Line Interface: Easy-to-use CLI with comprehensive configuration management
- Progress Tracking: Real-time progress bars and detailed processing statistics
See NEXT_FEATURES.md for detailed roadmap including:
- Interactive REPL Interface: Seamless command-line interaction without repeated CLI calls
- Auto-Discovery System: Scan directories for ebooks, build catalogs without processing
- Book-Specific Targeting: Focus conversations on specific books vs. entire library
- Advanced Context Memory: Graph-based conversation memory with Redis integration
- Adaptive AI Persona: AI personality that evolves with your reading patterns
For technical implementation details, see IMPLEMENTATION_DETAILS.md.
- Python 3.9+ installed on your system
- uv for dependency management (recommended) or pip
- Ollama installed and running
- At least one Ollama model downloaded
# Visit https://ollama.ai/ for installation instructions
# Or use package managers:
# macOS
brew install ollama
# Windows - Download from website
# Linux
curl https://ollama.ai/install.sh | shollama pull llama2 # Recommended
ollama pull mistral # Alternative
ollama pull codellama # For code analysis# Install uv (if not already installed)
pip install uvOption A: uv Installation (Recommended)
# Clone the repository
git clone https://github.com/anthonypdawson/ai-ebook-processor.git
cd ai-ebook-processor
# Install dependencies with uv
uv install
# 🆕 If you want to use PyTorch with CUDA 12.9 support, install with:
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129This will install PyTorch and related libraries with CUDA 12.9 support. If you do not have a compatible NVIDIA GPU or do not need CUDA, you can skip this step.
q <question>: Quick alias foraska <file/dir>: Quick alias foraddb <dirs...>: Quick alias forbatchl: Quick alias forlists <query>: Quick alias forsearchc: Quick alias forclearll: Quick alias for detailed list
- Use
focus <search>to target a specific book for queries - Use
unfocusto clear book focus and search all books - Use
statusto show current focus and book info
- All commands provide clear error messages and feedback
- Progress bars are shown for book processing and batch operations
- If a command fails, a helpful message is displayed
- REPL ask command handles unfocused state gracefully
uv run python -m ai_ebook_processor --help uv run python -m ai_ebook_processor rag add-book "path/to/book.epub"
**Option B: Package Installation**
```bash
# Clone the repository
git clone https://github.com/anthonypdawson/ai-ebook-processor.git
cd ai-ebook-processor
# Install the package
pip install -e .
# Use it anywhere
python -m ai_ebook_processor --help
python -m ai_ebook_processor rag add-book "path/to/book.epub"
Option C: Direct Usage
# Install dependencies only
pip install -r requirements.txt
# Run using module syntax
python -m ai_ebook_processor --help
python -m ai_ebook_processor rag add-book "path/to/book.epub"For even easier usage, wrapper scripts are included that work from anywhere on your system:
Windows:
# Works from any directory - uses uv automatically if available
~/src/ai-ebook-processor/scripts/ebook-processor.bat --help
~/src/ai-ebook-processor/scripts/ebook-processor.bat rag add-book "book.epub"
# Or use Python wrapper (cross-platform)
python ~/src/ai-ebook-processor/scripts/ebook-processor.py --helpLinux/Mac/Windows (with bash):
# Works from any directory - uses uv automatically if available
~/src/ai-ebook-processor/scripts/ebook-processor --help
~/src/ai-ebook-processor/scripts/ebook-processor rag add-book "book.epub"Features of the wrapper scripts:
- 🌍 Work from any directory on your system
- 🎯 Automatically use uv if available, with fallback to virtual env
- 📦 Set up proper Python paths automatically
- 🔄 Intelligent environment detection
ollama serve# Using uv (recommended)
uv run python -m ai_ebook_processor replREPL Session Example:
🤖 AI Ebook Processor REPL
Type 'help' for available commands, 'exit' to quit
[~] ebook> cd ~/Documents/Books
[Documents/Books] ebook> ls
📁 Fiction/
📁 Non-Fiction/
📚 book1.epub
📚 book2.pdf
[Documents/Books] ebook> add Fiction/
Processing 15 books... ✓ Successfully added 15/15 books
[Documents/Books] ebook> ask What are the main themes in my collection?
Answer:
──────────────────────────────────────────────────
Based on your fiction collection, the main themes include...
[Documents/Books] ebook> search "time travel"
Search results (3 found):
──────────────────────────────────────────────────
1. The Time Machine (relevance: 0.92)
A scientist invents a machine that allows him to travel through time...# Using uv (recommended)
uv run python -m ai_ebook_processor rag add-book "path/to/your/book.epub"
uv run python -m ai_ebook_processor rag add-directory "path/to/ebooks/"
uv run python -m ai_ebook_processor rag ask "What are the main themes in my collection?"# Using uv
uv run python -m ai_ebook_processor process-file "path/to/your/book.epub"
uv run python -m ai_ebook_processor process-file "path/to/your/book.epub"
# Process all ebooks in a directory
uv run python -m ai_ebook_processor process-directory "path/to/ebooks/"# Show all available commands
uv run python -m ai_ebook_processor --help
# Configuration management
uv run python -m ai_ebook_processor config-show # Show current config
uv run python -m ai_ebook_processor config-set ollama.model llama2 # Set default model
uv run python -m ai_ebook_processor models # List available models
# Discover books without processing
uv run python -m ai_ebook_processor discover "path/to/ebooks/" # Find all ebooks in directory# Import books
uv run python -m ai_ebook_processor rag add-book "book.epub" # Add single book
uv run python -m ai_ebook_processor rag add-book "book.pdf" --fast # Fast import (skip AI analysis)
uv run python -m ai_ebook_processor rag add-directory "path/" # Add entire directory
# Query your collection
uv run python -m ai_ebook_processor rag ask "What themes appear in my books?"
uv run python -m ai_ebook_processor rag search "artificial intelligence"
uv run python -m ai_ebook_processor rag stats # Show database statistics# Use wrapper scripts from anywhere (recommended for convenience)
~/src/ai-ebook-processor/scripts/ebook-processor rag add-book "book.epub" # Unix/bash
~/src/ai-ebook-processor/scripts/ebook-processor.bat rag add-book "book.epub" # Windows
python ~/src/ai-ebook-processor/scripts/ebook-processor.py rag add-book "book.epub" # Cross-platform
# Module execution (from project directory)
python -m ai_ebook_processor rag add-book "book.epub" # Package module execution
# After pip install -e . (from anywhere)
ebook-processor rag add-book "book.epub" # If installed as packageThe REPL (Read-Eval-Print Loop) provides a seamless interactive experience for managing your ebook collection. No more typing long commands repeatedly!
# Using uv (recommended)
uv run ebook-processor repl
# Using Python module
python -m ai_ebook_processor repl
# Using wrapper scripts
~/src/ai-ebook-processor/scripts/ebook-processor repl- Session Persistence: Current directory and command history maintained
- Tab Completion: Commands and file paths with intelligent completion
- Command History: Navigate previous commands with ↑/↓ arrows
- Directory Navigation: Built-in
cd,pwd,lscommands - Command Aliases: Short aliases for frequently used commands
File System Navigation:
[~] ebook> cd ~/Documents/Books # Change directory
[Documents/Books] ebook> pwd # Show current directory
[Documents/Books] ebook> ls # List contents with ebook highlighting
📁 Fiction/
📁 Non-Fiction/
📚 book1.epub
📚 book2.pdfRAG Operations:
# Add books (supports tab completion)
[Books] ebook> add book1.epub # Add single book
[Books] ebook> add Fiction/ # Add entire directory
[Books] ebook> add . # Add all books in current directory
# Query your collection
[Books] ebook> ask What are the main themes in my collection?
[Books] ebook> q Who is the protagonist in my fantasy books? # Short alias
# Search and discover
[Books] ebook> search "time travel" # Search for specific content
[Books] ebook> list # List all books in RAG system
[Books] ebook> l # Short alias for listConvenience Features:
[Books] ebook> help # Show all commands
[Books] ebook> clear # Clear screen
[Books] ebook> exit # Exit REPLInitial Setup:
uv run ebook-processor repl
🤖 AI Ebook Processor REPL
Type 'help' for available commands, 'exit' to quit
[~] ebook> cd ~/Documents/Calibre Library
[Calibre Library] ebook> ls
📁 Author Name/
📁 Another Author/
...
[Calibre Library] ebook> add .
Processing 127 books... ✓ Successfully added 115/127 books
[Calibre Library] ebook> list
Books in RAG system (115 total):
──────────────────────────────────────────────────
1. The Great Gatsby
Author: F. Scott Fitzgerald
Chunks: 45
2. 1984
Author: George Orwell
Chunks: 62
...Interactive Analysis:
[Calibre Library] ebook> ask What genres do I read most?
Answer:
──────────────────────────────────────────────────
Based on your collection, you primarily read:
1. Science Fiction (32% of collection)
2. Mystery/Thriller (28% of collection)
3. Literary Fiction (22% of collection)
...
📚 Sources:
1. Dune - Frank Herbert
2. The Girl with the Dragon Tattoo - Stieg Larsson
3. To Kill a Mockingbird - Harper Lee
[Calibre Library] ebook> search "artificial intelligence"
Search results (8 found):
──────────────────────────────────────────────────
1. Neuromancer (relevance: 0.94)
The matrix has its roots in primitive arcade games...
2. I, Robot (relevance: 0.89)
A robot may not injure a human being or, through inaction...Save time with short command aliases:
q→ask(query)a→addl→lists→searchc→clearll→list(detailed)
# Add a book to your searchable knowledge base
python -m ai_ebook_processor rag add-book "/path/to/book.epub"
# Add entire directory
python -m ai_ebook_processor rag add-directory "/path/to/ebooks" --max-files 10
# Ask questions about your collection
python -m ai_ebook_processor rag ask "What are the main themes in my books?"
# Search for specific content
python -m ai_ebook_processor rag search "artificial intelligence"from ai_ebook_processor.core.processor import EbookProcessorApp
from ai_ebook_processor.rag.system import EnhancedEbookProcessor
# Traditional processing
app = EbookProcessorApp(model_name="llama2")
result = app.process_single_ebook(
"path/to/book.epub",
processing_type="summary",
output_dir="output"
)
# 🆕 Enhanced processing with RAG
enhanced = EnhancedEbookProcessor(model_name="llama2")
# Process and add to knowledge base
result = enhanced.process_and_store("path/to/book.epub")
# Ask questions about your collection
answer = enhanced.ask_about_collection(
"What are the main themes across my philosophy books?"
)
print(answer)summary: Creates concise summaries of the contentanalysis: Analyzes themes, concepts, and key informationextraction: Extracts important facts, data, and insightsquestions: Generates study questions and answerscritique: Provides thoughtful critique and analysissimplify: Explains complex concepts in simpler terms
# Show available models
python -m ai_ebook_processor models
# Discover ebooks in a directory
python -m ai_ebook_processor discover /path/to/ebooks
# Works from any directory - uses uv automatically if available
🎯 Automatically use uv if available, with fallback to virtual env
# Using uv (recommended)
uv run python -m ai_ebook_processor repl
# Using uv (recommended)
uv run python -m ai_ebook_processor repl
--recursive \
--max-files 10
# Show configuration
python -m ai_ebook_processor config-show
# Set configuration values
python -m ai_ebook_processor config-set ollama.model "mistral"
python -m ai_ebook_processor config-set processing.chunk_size 5000The application uses a YAML configuration file (config.yml) that's automatically created with defaults:
ollama:
model: llama2
host: http://localhost:11434
temperature: 0.7
processing:
chunk_size: 4000
chunk_overlap: 200
output_format: markdown
save_chunks: false
processing_mode: summary
output:
directory: output
create_report: trueai-ebook-processor/
├── ai_ebook_processor/ # Main Python package (all source code)
│ ├── cli/ # CLI commands and REPL interface
│ │ ├── commands.py # CLI command definitions
│ │ ├── repl.py # Interactive REPL shell
│ │ └── ... # Other CLI modules
│ ├── core/ # Core processing logic
│ │ ├── processor.py # Main processor class
│ │ ├── pipeline.py # Text processing pipeline
│ │ ├── parallel.py # Parallel processing utilities
│ │ └── ... # Other core modules
│ ├── models/ # Model integrations (Ollama, etc.)
│ │ └── ollama.py # Ollama model integration
│ ├── rag/ # RAG system and logic
│ │ ├── system.py # RAG system core
│ │ ├── timing.py # Timing utilities
│ │ └── ... # Other RAG modules
│ ├── readers/ # Ebook format readers
│ │ └── ebook_reader.py # EPUB/PDF/etc. reader
│ ├── utils/ # Utility modules
│ │ ├── config.py # Config management
│ │ ├── fast_mode.py # Fast processing mode
│ │ └── ... # Other utilities
│ ├── __main__.py # Entry point for module execution
│ └── __init__.py # Package initializer
├── scripts/ # CLI and convenience scripts
│ ├── ebook-processor # Bash wrapper script
│ ├── ebook-processor.bat # Windows batch wrapper
│ ├── ebook-processor.py # Python wrapper script
│ └── ... # Other scripts
├── config/ # Configuration files
│ └── config.yml # Main YAML config
├── output/ # Processed results and reports
├── ebook_db/ # Vector database for processed books
├── pyproject.toml # Project metadata and dependencies
├── README.md # Project documentation
└── NEXT_FEATURES.md # Roadmap and upcoming features
| Format | Extension | Description |
|---|---|---|
| EPUB | .epub |
Electronic publication format |
.pdf |
Portable Document Format | |
| MOBI | .mobi |
Amazon Kindle format |
| AZW/AZW3 | .azw, .azw3 |
Amazon Kindle formats |
| Plain Text | .txt |
Plain text files |
| Word Document | .docx |
Microsoft Word documents |
Support for Amazon Kindle formats (.mobi, .azw, .azw3) is experimental. Many MOBI files are image-based, DRM-protected, or use complex structures that may not be reliably processed.
Best Practice: For consistent results, convert Kindle files to EPUB or PDF using Calibre or similar tools before processing with AI Ebook Processor.
- DRM-protected files cannot be processed.
- Image-heavy MOBI files may yield little or no text output.
- EPUB and PDF formats are recommended for best results.
You can provide custom prompts for processing:
custom_prompt = """
Please analyze this text for:
1. Main themes and concepts
2. Key arguments or plot points
3. Important quotes or data
4. Overall significance
Text: {text}
"""
result = app.process_single_ebook(
"book.epub",
custom_prompt=custom_prompt
)Customize how text is processed:
app.configure_processing(
chunk_size=3000, # Maximum characters per chunk
chunk_overlap=150, # Overlap between chunks
output_format='markdown', # json, txt, or markdown
save_chunks=True # Save individual chunk results
)# Process only specific formats
results = app.process_directory(
"/path/to/ebooks",
file_extensions=['.epub', '.pdf']
)
# Process with custom configuration
app.configure_processing(chunk_size=2000, output_format='json')
results = app.process_multiple_ebooks(ebook_list)The application creates structured output including:
- Individual Results: One file per processed ebook
- Processing Statistics: Detailed stats about the processing session
- Combined Report: Summary of all processed books
- Error Logs: Information about any processing failures
{
"metadata": {
"title": "Book Title",
"author": "Author Name",
"format": "EPUB"
},
"chunk_info": {
"total_chunks": 15,
"successful_chunks": 15
},
"combined_result": "Processed content...",
"processing_stats": {
"processing_time": 45.2,
"success_rate": 1.0
}
}# Book Title
**Author:** Author Name
**Processed:** 2024-01-15T10:30:00
## Analysis Results
Processed content appears here...-
"No Ollama models found"
- Make sure Ollama is running:
ollama serve - Install a model:
ollama pull llama2
- Make sure Ollama is running:
-
"Error connecting to Ollama"
- Check if Ollama is running on the correct host/port
- Verify the model name in configuration
-
"No text content extracted"
- File might be corrupted or DRM-protected
- Try a different file format
- Check file permissions
-
Slow processing
- Reduce chunk size in configuration
- Use a faster model (e.g.,
phiinstead ofllama2) - Process fewer files at once
- Use smaller chunk sizes for faster processing
- Choose appropriate models for your hardware
- Process files in smaller batches for large collections
- Monitor system resources during processing
ollama: Ollama Python clientebooklib: EPUB file processingPyMuPDF: PDF file processingpython-docx: Word document processingchardet: Character encoding detectiontqdm: Progress barsclick: Command line interfacepyyaml: YAML configuration files
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is provided as-is for educational and personal use.
- Ollama for local AI model hosting
- ebooklib for EPUB processing
- PyMuPDF for PDF processing
- All the open-source contributors who made this possible
Note: This tool is designed for processing your own ebook collection. Respect copyright laws and DRM restrictions when using this software.