An AI-powered Retrieval Augmented Generation (RAG) system for regulatory compliance queries. Get instant, accurate answers about GDPR, SOX, HIPAA, and other regulatory frameworks with source citations.
- π¬ Intelligent Q&A: Ask natural language questions about compliance requirements
- π Multi-Regulation Support: GDPR, SOX, HIPAA, PCI-DSS, ISO 27001, and more
- π Source Citations: Every answer includes references to source documents
- π Beautiful Web UI: Easy-to-use Gradio interface
- π€ Document Upload: Add your own regulatory documents
- π Similarity Search: Find relevant document sections without generating answers
- π Query History: Track and export compliance queries for audit trails
- π€ Multiple LLM Support: Works with OpenAI (GPT) or Ollama (local, free)
- Python: 3.11.8 (recommended) or 3.10.13+
- pip: 23.0 or higher
- Ollama (optional): For free local LLM - Download here
- OpenAI API Key (optional): For cloud-based GPT models
mkdir compliance-rag-assistant
cd compliance-rag-assistant# Create virtual environment
python -m venv venv
# Activate (Linux/Mac)
source venv/bin/activate
# Activate (Windows)
venv\Scripts\activate# Upgrade pip
pip install --upgrade pip
# Install requirements
pip install -r requirements.txt# Copy environment template
cp .env.example .env
# Edit .env file with your settings (optional)
# For Ollama (free): No changes needed
# For OpenAI: Add your API key# Generate sample regulatory documents
python scripts/create_sample_docs.py# Start the web UI
python -m src.ui.gradio_appOpen your browser to: http://localhost:7860
- Open the web interface at
http://localhost:7860 - Go to "π System Setup" tab
- Select LLM provider:
- Ollama (free, runs locally) - No API key needed
- OpenAI (paid, cloud) - Requires API key
- Enter model name:
- For Ollama:
llama2,mistral,mixtral - For OpenAI:
gpt-3.5-turbo,gpt-4
- For Ollama:
- Click "Initialize System"
- Wait for confirmation: "β System initialized successfully!"
- Go to "π¬ Ask Questions" tab
- Type your compliance question in the text box
- Optionally filter by regulation type (GDPR, SOX, HIPAA, etc.)
- Enable "Show Sources" to see document references
- Click "Get Answer"
- View the answer with source citations!
Example Questions:
- "What are the GDPR requirements for data retention?"
- "What security controls does HIPAA require for PHI?"
- "How long do I have to notify authorities about a data breach?"
- "What are SOX 404 internal control requirements?"
- "What encryption is required for protected health information?"
- Go to "π Search Documents" tab
- Enter a search query (e.g., "encryption requirements")
- Adjust number of results
- Click "Search"
- View similar document sections with similarity scores
- Go to "π€ Upload Documents" tab
- Click "Upload Documents"
- Select PDF, TXT, or DOCX files
- Click "Process Documents"
- Wait for confirmation
- Your documents are now searchable!
- Go to "π History" tab
- Click "Refresh" to see recent queries
- Click "Export" to save query history for audit purposes
compliance-rag-assistant/
β
βββ src/ # Source code
β βββ core/ # Core RAG functionality
β β βββ document_loader.py # Load PDF, TXT, DOCX files
β β βββ text_processor.py # Intelligent text chunking
β β βββ embeddings.py # Embedding model management
β β βββ vector_store.py # FAISS vector database
β β βββ rag_engine.py # Main RAG orchestrator
β β
β βββ models/ # LLM provider integrations
β β βββ llm_factory.py # Factory pattern for LLMs
β β βββ openai_provider.py # OpenAI GPT integration
β β βββ ollama_provider.py # Ollama local models
β β
β βββ ui/ # User interface
β β βββ gradio_app.py # Gradio web application
β β βββ components.py # UI components & logic
β β
β βββ utils/ # Utilities
β βββ config.py # Configuration management
β βββ logger.py # Logging utilities
β
βββ configs/ # Configuration files
β βββ default.yaml # Default settings
β βββ development.yaml # Dev environment
β βββ production.yaml # Production settings
β
βββ data/ # Data storage
β βββ regulatory_documents/ # Your documents go here
β βββ vector_db/ # Vector database storage
β
βββ scripts/ # Utility scripts
β βββ create_sample_docs.py # Generate sample documents
β βββ rebuild_vector_db.py # Rebuild vector database
β
βββ logs/ # Application logs
β
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
βββ README.md # This file
Setup Ollama:
# 1. Download Ollama from https://ollama.ai
# 2. Pull a model
ollama pull llama2
# 3. Start Ollama (usually auto-starts)
ollama serve
# 4. In the UI, select:
# - Provider: ollama
# - Model: llama2Supported Ollama Models:
llama2- Fast, good qualitymistral- Better quality, slightly slowermixtral- Best quality, requires more resourcescodellama- Good for technical compliance
Setup OpenAI:
# 1. Get API key from https://platform.openai.com
# 2. Add to .env file
OPENAI_API_KEY=sk-your-key-here
# 3. In the UI, select:
# - Provider: openai
# - Model: gpt-3.5-turbo or gpt-4
# - Enter your API keyEdit configs/default.yaml:
# Chunk size for document splitting
rag:
chunk_size: 1000 # Increase for longer chunks
chunk_overlap: 200 # Context overlap
# Retrieval settings
top_k: 4 # Number of sources to retrieve
# LLM temperature
temperature: 0.1 # Lower = more deterministic- PDF (
.pdf) - Text files (
.txt) - Word documents (
.docx) - Markdown (
.md)
- Go to "π€ Upload Documents" tab
- Upload your files
- Click "Process Documents"
# Copy files to documents directory
cp your_regulation.pdf data/regulatory_documents/
# Rebuild vector database
python scripts/rebuild_vector_db.pydata/regulatory_documents/
βββ gdpr/
β βββ gdpr_full_text.pdf
β βββ gdpr_guidelines.pdf
βββ sox/
β βββ sox_section_302.pdf
β βββ sox_section_404.pdf
βββ hipaa/
βββ hipaa_privacy_rule.pdf
βββ hipaa_security_rule.pdfRAG (Retrieval Augmented Generation) combines document search with AI generation:
- π Load: Import your regulatory documents
- βοΈ Chunk: Split into manageable pieces (chunks)
- π’ Embed: Convert text to numerical vectors (embeddings)
- πΎ Store: Save in vector database (FAISS)
- π Retrieve: Find relevant chunks for your query
- π€ Generate: LLM creates answer using retrieved context
- β Accurate: Answers based on actual documents, not memorization
- β Transparent: Shows source citations
- β Up-to-date: Add new regulations easily
- β Private: Can run entirely locally with Ollama
- β Auditable: Track what was asked and answered
# test_system.py
from src.core.rag_engine import RegulatoryComplianceRAG
# Initialize
rag = RegulatoryComplianceRAG(
llm_provider="ollama",
model_name="llama2"
)
# Test query
result = rag.query(
question="What are GDPR data retention requirements?",
return_sources=True
)
print("Answer:", result['answer'])
print(f"Sources: {result['num_sources']}")python test_system.pySolution: Go to System Setup tab and initialize the system first.
Solution:
# Check if Ollama is running
ollama list
# Start Ollama
ollama serve
# Pull the model
ollama pull llama2Solution:
# Create sample documents
python scripts/create_sample_docs.py
# Or upload your own via the UISolution: Add your API key to .env file:
OPENAI_API_KEY=sk-your-actual-key-hereSolution:
# Reinstall dependencies
pip install --upgrade -r requirements.txtSolution:
- Use a smaller model (llama2 instead of mixtral)
- Reduce
top_kin config (fewer sources retrieved) - Use GPU if available (requires
faiss-gpu)
Start reading in this order:
src/utils/config.py- Configuration managementsrc/core/document_loader.py- How documents are loadedsrc/core/text_processor.py- How text is chunkedsrc/core/embeddings.py- How embeddings worksrc/core/vector_store.py- Vector database operationssrc/core/rag_engine.py- Main orchestrator (brings it all together)src/ui/gradio_app.py- Web interface
Embeddings: Converting text to numbers that represent meaning
"data privacy" β [0.23, 0.56, 0.12, ...]
"personal information" β [0.24, 0.55, 0.13, ...]
# Similar meanings = similar vectorsVector Search: Finding similar text using math
query = "encryption requirements"
# Finds documents about: encryption, security, data protectionChunking: Splitting documents while maintaining context
Document (5000 words) β
Chunk 1 (1000 chars) ββ
Chunk 2 (1000 chars) ββΌβ 200 char overlap
Chunk 3 (1000 chars) ββ-
Be Specific: Ask detailed questions
- β "Tell me about GDPR"
- β "What are the GDPR requirements for data retention periods?"
-
Use Filters: Select regulation type when you know it
- Faster and more accurate results
-
Check Sources: Always review source citations
- Verify the information from original documents
-
Add More Documents: The more documents you add, the better the answers
- Upload your company policies
- Add regulatory updates
-
Experiment with Models:
- Fast queries:
llama2,gpt-3.5-turbo - Best quality:
mixtral,gpt-4
- Fast queries:
This tool is for informational purposes only and does not constitute legal advice. Always consult with qualified legal professionals for compliance matters.
- With Ollama: All data stays on your machine (100% private)
- With OpenAI: Queries are sent to OpenAI's servers (read their privacy policy)
- Answers are only as good as the documents you provide
- AI can make mistakes - always verify important information
- Not a replacement for compliance officers or legal counsel
Having issues? Check:
- The troubleshooting section above
- Application logs in
logs/app.log - Status messages in the UI
MIT License - See LICENSE file for details
Made with β€οΈ for the compliance community
Last Updated: Oct2025