A production-ready Retrieval-Augmented Generation (RAG) system for querying Laravel documentation using local LLMs. Built with Ollama, ChromaDB, and LangChain for fast, version-aware documentation lookup.
- Local LLM Inference: Uses Ollama with Gemma 2B for privacy and speed
- Efficient Embeddings: nomic-embed-text (768 dimensions) optimized for retrieval
- Persistent Vector Store: ChromaDB with disk persistence
- Version-Aware: Support for multiple Laravel versions with metadata tracking
- Smart Chunking: Adaptive chunking with configurable size limits and overlap
- Multiple Interfaces: CLI, Interactive mode, and REST API
- Docker-Based: Fully containerized with M1 Mac support
- Production Ready: Comprehensive logging, error handling, and monitoring
- Quick Start Guide - Get up and running in 10 minutes
- API Reference - Complete REST API documentation
- Architecture - System design and components
- Deployment - Production deployment guide
- Troubleshooting - Common issues and solutions
All documentation is in the documentation/ directory.
- Docker and Docker Compose
- MacBook Pro M1 (or adjust platform in docker-compose.yml)
- 16GB RAM minimum
- 10GB free disk space
-
Clone and setup:
cd /path/to/laravel-rag make setupThis will:
- Start Docker services
- Pull required Ollama models (gemma:2b and nomic-embed-text)
- Initialize the system
-
Extract Laravel documentation:
make extract
Downloads Laravel v12 docs from GitHub (default version).
-
Index documentation:
make index
Generates embeddings and stores in ChromaDB (~2-5 minutes for v12).
-
Query documentation:
make query Q="How do I create an Eloquent model?"
# Extract documentation
docker-compose exec rag-app python -m src.cli.main extract --version 12
# Index documentation
docker-compose exec rag-app python -m src.cli.main index
# Query (one-off)
docker-compose exec rag-app python -m src.cli.main query "What is middleware?" --show-sources
# Interactive mode
docker-compose exec rag-app python -m src.cli.main interactive
# Check system status
docker-compose exec rag-app python -m src.cli.main check
# View statistics
docker-compose exec rag-app python -m src.cli.main stats# Setup and management
make setup # Initial setup
make start # Start services
make stop # Stop services
make restart # Restart services
make logs # View logs
make clean # Clean all data (WARNING: destructive)
# Documentation
make extract # Extract docs
make index # Index docs
make reindex # Force re-index
# Querying
make query Q="your question"
make interactive
make stats
make check
# API
make api-test # Test API endpointsStart the API (runs automatically with docker-compose):
make startAPI available at http://localhost:8000
API Documentation: http://localhost:8000/docs
Endpoints:
-
POST /query- Query documentationcurl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -d '{ "question": "How do I use migrations?", "include_sources": true, "temperature": 0.7 }'
-
GET /search?q=query- Search without LLM generationcurl "http://localhost:8000/search?q=eloquent&top_k=5" -
GET /stats- Vector store statistics -
GET /versions- Available Laravel versions -
GET /health- Health check
make interactiveProvides a conversational interface:
Question: How do I create a model?
Answer: To create an Eloquent model in Laravel...
Question: exit
Create .env from .env.example:
# Ollama Configuration
OLLAMA_HOST=http://localhost:11434
LLM_MODEL=gemma:2b
EMBEDDING_MODEL=nomic-embed-text
# ChromaDB
CHROMA_PERSIST_DIR=./chromadb
CHROMA_COLLECTION_NAME=laravel_docs
# Laravel Documentation
LARAVEL_VERSION=12
DOCS_CACHE_DIR=./docs
# RAG Settings
TOP_K=5
RESPONSE_TIMEOUT=30
LOG_LEVEL=INFOEdit config/system.yaml for advanced settings:
- Model parameters
- Chunking strategy
- Performance tuning
- Logging configuration
-
Memory: ~3-4GB total
- Ollama + Gemma 2B: 2-3GB
- nomic-embed-text: ~500MB
- ChromaDB: ~100-200MB
-
Storage:
- Laravel v12 raw docs: ~5-10MB
- Embeddings: ~100-200MB
- ChromaDB overhead: ~50MB
-
Query Performance:
- Response time: <3 seconds (target)
- Embedding generation: ~100ms per query
- Vector search: <50ms
- LLM generation: 1-2 seconds
- Batch Size: Adjust
--batch-sizefor indexing based on available memory - Top-K: Lower
TOP_Kfor faster responses, higher for more context - Temperature: Lower (0.3-0.5) for factual answers, higher (0.7-1.0) for creative
- Context Window: Configured in
system.yaml
laravel-rag/
├── docker-compose.yml # Docker orchestration
├── Dockerfile # Python app container
├── requirements.txt # Python dependencies
├── Makefile # Convenience commands
├── setup.sh # Setup script
├── .env.example # Environment template
├── config/
│ └── system.yaml # System configuration
├── src/
│ ├── config.py # Configuration management
│ ├── extraction/ # Document extraction
│ │ ├── docs_fetcher.py # Git clone & fetch
│ │ └── markdown_parser.py # H2-based chunking
│ ├── indexing/ # Embedding & storage
│ │ ├── embeddings.py # Ollama embeddings
│ │ └── vector_store.py # ChromaDB integration
│ ├── retrieval/ # RAG chain
│ │ └── rag_chain.py # LangChain RAG
│ ├── api/ # REST API
│ │ └── main.py # FastAPI application
│ ├── cli/ # CLI interface
│ │ └── main.py # Click commands
│ └── utils/
│ └── logger.py # Logging setup
├── data/ # Application data
├── chromadb/ # Vector store persistence
├── sources/ # Laravel documentation sources cache
└── logs/ # Application logs
Index multiple Laravel versions:
# Extract and index v11
docker-compose exec rag-app python -m src.cli.main extract --version 11
docker-compose exec rag-app python -m src.cli.main index --version 11
# Query specific version
docker-compose exec rag-app python -m src.cli.main query \
"How do migrations work?" --version 11Modify src/extraction/markdown_parser.py to customize chunking strategy.
View logs:
make logs # All services
docker-compose logs -f rag-app # Application only
docker-compose logs -f ollama # Ollama onlyLog files:
- Application:
logs/laravel-rag.log - Docker:
docker-compose logs
Issue: Models not found
# Check Ollama
docker exec laravel-rag-ollama ollama list
# Pull manually
docker exec laravel-rag-ollama ollama pull gemma:2b
docker exec laravel-rag-ollama ollama pull nomic-embed-textIssue: ChromaDB errors
# Clear and re-index
make clean
make setup
make extract
make indexIssue: Out of memory
- Reduce
batch_sizeduring indexing - Check Docker resource limits
- Restart Docker
Note: Embedding warnings
You may see warnings like init: embeddings required but some input tokens were not marked as outputs -> overriding. These are harmless internal messages from the nomic-embed-text model and can be safely ignored. They do not affect embedding quality or system performance. See documentation/TROUBLESHOOTING.md for details.
docker-compose exec rag-app pytest tests/ -v- Follow the modular structure in
src/ - Use the logger:
from src.utils.logger import app_logger as logger - Update configuration in
src/config.py - Add CLI commands in
src/cli/main.py - Add API endpoints in
src/api/main.py
# Format code
docker-compose exec rag-app black src/
# Type checking
docker-compose exec rag-app mypy src/
# Linting
docker-compose exec rag-app flake8 src/- API Access: Configure CORS in
src/api/main.py - Network: Adjust
docker-compose.ymlto use internal networks - Authentication: Add API key middleware for production
- Rate Limiting: Enable in
config/system.yaml
- Horizontal: Run multiple API instances behind load balancer
- Vertical: Increase Docker resource limits
- Distributed: Use remote ChromaDB server
# Backup vector store
tar -czf chromadb-backup.tar.gz chromadb/
# Restore
tar -xzf chromadb-backup.tar.gz- Support for additional Laravel versions (11, 10)
- Integration with Livewire, Filament docs
- Query result caching
- Multi-modal support (images, diagrams)
- Conversation history
- Fine-tuned models for Laravel
- Team collaboration features
MIT
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Submit a pull request
For issues and questions:
- GitHub Issues: [Create an issue]
- Laravel Framework Team for excellent documentation
- Ollama for local LLM inference
- ChromaDB for vector storage
- LangChain for RAG orchestration