🏛️ Compliance RAG Assistant

An AI-powered Retrieval Augmented Generation (RAG) system for regulatory compliance queries. Get instant, accurate answers about GDPR, SOX, HIPAA, and other regulatory frameworks with source citations.

🌟 Features

💬 Intelligent Q&A: Ask natural language questions about compliance requirements
📚 Multi-Regulation Support: GDPR, SOX, HIPAA, PCI-DSS, ISO 27001, and more
🔍 Source Citations: Every answer includes references to source documents
🌐 Beautiful Web UI: Easy-to-use Gradio interface
📤 Document Upload: Add your own regulatory documents
🔎 Similarity Search: Find relevant document sections without generating answers
📊 Query History: Track and export compliance queries for audit trails
🤖 Multiple LLM Support: Works with OpenAI (GPT) or Ollama (local, free)

📋 Prerequisites

Python: 3.11.8 (recommended) or 3.10.13+
pip: 23.0 or higher
Ollama (optional): For free local LLM - Download here
OpenAI API Key (optional): For cloud-based GPT models

⚡ Quick Start

1️⃣ Clone or Create Project Directory

mkdir compliance-rag-assistant
cd compliance-rag-assistant

2️⃣ Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate (Linux/Mac)
source venv/bin/activate

# Activate (Windows)
venv\Scripts\activate

3️⃣ Install Dependencies

# Upgrade pip
pip install --upgrade pip

# Install requirements
pip install -r requirements.txt

4️⃣ Setup Configuration

# Copy environment template
cp .env.example .env

# Edit .env file with your settings (optional)
# For Ollama (free): No changes needed
# For OpenAI: Add your API key

5️⃣ Create Sample Documents

# Generate sample regulatory documents
python scripts/create_sample_docs.py

6️⃣ Launch the Application

# Start the web UI
python -m src.ui.gradio_app

Open your browser to: http://localhost:7860

🎯 How to Use

Using the Web Interface

Step 1: Initialize the System

Open the web interface at http://localhost:7860
Go to "🚀 System Setup" tab
Select LLM provider:
- Ollama (free, runs locally) - No API key needed
- OpenAI (paid, cloud) - Requires API key
Enter model name:
- For Ollama: llama2, mistral, mixtral
- For OpenAI: gpt-3.5-turbo, gpt-4
Click "Initialize System"
Wait for confirmation: "✅ System initialized successfully!"

Step 2: Ask Questions

Go to "💬 Ask Questions" tab
Type your compliance question in the text box
Optionally filter by regulation type (GDPR, SOX, HIPAA, etc.)
Enable "Show Sources" to see document references
Click "Get Answer"
View the answer with source citations!

Example Questions:

"What are the GDPR requirements for data retention?"
"What security controls does HIPAA require for PHI?"
"How long do I have to notify authorities about a data breach?"
"What are SOX 404 internal control requirements?"
"What encryption is required for protected health information?"

Step 3: Search Documents (Optional)

Go to "🔍 Search Documents" tab
Enter a search query (e.g., "encryption requirements")
Adjust number of results
Click "Search"
View similar document sections with similarity scores

Step 4: Upload Your Documents

Go to "📤 Upload Documents" tab
Click "Upload Documents"
Select PDF, TXT, or DOCX files
Click "Process Documents"
Wait for confirmation
Your documents are now searchable!

Step 5: View History

Go to "📊 History" tab
Click "Refresh" to see recent queries
Click "Export" to save query history for audit purposes

🗂️ Project Structure

compliance-rag-assistant/
│
├── src/                          # Source code
│   ├── core/                     # Core RAG functionality
│   │   ├── document_loader.py   # Load PDF, TXT, DOCX files
│   │   ├── text_processor.py    # Intelligent text chunking
│   │   ├── embeddings.py        # Embedding model management
│   │   ├── vector_store.py      # FAISS vector database
│   │   └── rag_engine.py        # Main RAG orchestrator
│   │
│   ├── models/                   # LLM provider integrations
│   │   ├── llm_factory.py       # Factory pattern for LLMs
│   │   ├── openai_provider.py   # OpenAI GPT integration
│   │   └── ollama_provider.py   # Ollama local models
│   │
│   ├── ui/                       # User interface
│   │   ├── gradio_app.py        # Gradio web application
│   │   └── components.py        # UI components & logic
│   │
│   └── utils/                    # Utilities
│       ├── config.py            # Configuration management
│       └── logger.py            # Logging utilities
│
├── configs/                      # Configuration files
│   ├── default.yaml             # Default settings
│   ├── development.yaml         # Dev environment
│   └── production.yaml          # Production settings
│
├── data/                         # Data storage
│   ├── regulatory_documents/    # Your documents go here
│   └── vector_db/               # Vector database storage
│
├── scripts/                      # Utility scripts
│   ├── create_sample_docs.py   # Generate sample documents
│   └── rebuild_vector_db.py    # Rebuild vector database
│
├── logs/                         # Application logs
│
├── requirements.txt              # Python dependencies
├── .env.example                  # Environment template
└── README.md                     # This file

🔧 Configuration

Using Ollama (Free, Local)

Setup Ollama:

# 1. Download Ollama from https://ollama.ai

# 2. Pull a model
ollama pull llama2

# 3. Start Ollama (usually auto-starts)
ollama serve

# 4. In the UI, select:
#    - Provider: ollama
#    - Model: llama2

Supported Ollama Models:

llama2 - Fast, good quality
mistral - Better quality, slightly slower
mixtral - Best quality, requires more resources
codellama - Good for technical compliance

Using OpenAI (Paid, Cloud)

Setup OpenAI:

# 1. Get API key from https://platform.openai.com

# 2. Add to .env file
OPENAI_API_KEY=sk-your-key-here

# 3. In the UI, select:
#    - Provider: openai
#    - Model: gpt-3.5-turbo or gpt-4
#    - Enter your API key

Adjusting Settings

Edit configs/default.yaml:

# Chunk size for document splitting
rag:
  chunk_size: 1000              # Increase for longer chunks
  chunk_overlap: 200            # Context overlap

# Retrieval settings
  top_k: 4                      # Number of sources to retrieve

# LLM temperature
  temperature: 0.1              # Lower = more deterministic

📚 Adding Your Own Documents

Supported File Formats

PDF (.pdf)
Text files (.txt)
Word documents (.docx)
Markdown (.md)

Method 1: Using the Web UI

Go to "📤 Upload Documents" tab
Upload your files
Click "Process Documents"

Method 2: Direct File Copy

# Copy files to documents directory
cp your_regulation.pdf data/regulatory_documents/

# Rebuild vector database
python scripts/rebuild_vector_db.py

Organizing Documents

data/regulatory_documents/
├── gdpr/
│   ├── gdpr_full_text.pdf
│   └── gdpr_guidelines.pdf
├── sox/
│   ├── sox_section_302.pdf
│   └── sox_section_404.pdf
└── hipaa/
    ├── hipaa_privacy_rule.pdf
    └── hipaa_security_rule.pdf

🎓 Understanding RAG

What is RAG?

RAG (Retrieval Augmented Generation) combines document search with AI generation:

📄 Load: Import your regulatory documents
✂️ Chunk: Split into manageable pieces (chunks)
🔢 Embed: Convert text to numerical vectors (embeddings)
💾 Store: Save in vector database (FAISS)
🔍 Retrieve: Find relevant chunks for your query
🤖 Generate: LLM creates answer using retrieved context

Why RAG for Compliance?

✅ Accurate: Answers based on actual documents, not memorization
✅ Transparent: Shows source citations
✅ Up-to-date: Add new regulations easily
✅ Private: Can run entirely locally with Ollama
✅ Auditable: Track what was asked and answered

🧪 Testing the System

Run the Test Script

# test_system.py
from src.core.rag_engine import RegulatoryComplianceRAG

# Initialize
rag = RegulatoryComplianceRAG(
    llm_provider="ollama",
    model_name="llama2"
)

# Test query
result = rag.query(
    question="What are GDPR data retention requirements?",
    return_sources=True
)

print("Answer:", result['answer'])
print(f"Sources: {result['num_sources']}")

python test_system.py

🐛 Troubleshooting

Issue: "System not initialized"

Solution: Go to System Setup tab and initialize the system first.

Issue: "Ollama connection error"

Solution:

# Check if Ollama is running
ollama list

# Start Ollama
ollama serve

# Pull the model
ollama pull llama2

Issue: "No documents found"

Solution:

# Create sample documents
python scripts/create_sample_docs.py

# Or upload your own via the UI

Issue: "OpenAI API key error"

Solution: Add your API key to .env file:

OPENAI_API_KEY=sk-your-actual-key-here

Issue: "Import errors"

Solution:

# Reinstall dependencies
pip install --upgrade -r requirements.txt

Issue: "Slow responses"

Solution:

Use a smaller model (llama2 instead of mixtral)
Reduce top_k in config (fewer sources retrieved)
Use GPU if available (requires faiss-gpu)

📖 Learn More

Understanding the Code

Start reading in this order:

src/utils/config.py - Configuration management
src/core/document_loader.py - How documents are loaded
src/core/text_processor.py - How text is chunked
src/core/embeddings.py - How embeddings work
src/core/vector_store.py - Vector database operations
src/core/rag_engine.py - Main orchestrator (brings it all together)
src/ui/gradio_app.py - Web interface

Key Concepts

Embeddings: Converting text to numbers that represent meaning

"data privacy" → [0.23, 0.56, 0.12, ...]
"personal information" → [0.24, 0.55, 0.13, ...]
# Similar meanings = similar vectors

Vector Search: Finding similar text using math

query = "encryption requirements"
# Finds documents about: encryption, security, data protection

Chunking: Splitting documents while maintaining context

Document (5000 words) →
    Chunk 1 (1000 chars) ─┐
    Chunk 2 (1000 chars) ─┼─ 200 char overlap
    Chunk 3 (1000 chars) ─┘

💡 Tips for Best Results

Be Specific: Ask detailed questions
- ❌ "Tell me about GDPR"
- ✅ "What are the GDPR requirements for data retention periods?"
Use Filters: Select regulation type when you know it
- Faster and more accurate results
Check Sources: Always review source citations
- Verify the information from original documents
Add More Documents: The more documents you add, the better the answers
- Upload your company policies
- Add regulatory updates
Experiment with Models:
- Fast queries: llama2, gpt-3.5-turbo
- Best quality: mixtral, gpt-4

⚠️ Important Notes

Disclaimer

This tool is for informational purposes only and does not constitute legal advice. Always consult with qualified legal professionals for compliance matters.

Data Privacy

With Ollama: All data stays on your machine (100% private)
With OpenAI: Queries are sent to OpenAI's servers (read their privacy policy)

Limitations

Answers are only as good as the documents you provide
AI can make mistakes - always verify important information
Not a replacement for compliance officers or legal counsel

📞 Support

Having issues? Check:

The troubleshooting section above
Application logs in logs/app.log
Status messages in the UI

📄 License

MIT License - See LICENSE file for details

Made with ❤️ for the compliance community

Last Updated: Oct2025

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data		data
raw_documents		raw_documents
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements_minimal.txt		requirements_minimal.txt

Folders and files

Latest commit

History

Repository files navigation

🏛️ Compliance RAG Assistant

🌟 Features

📋 Prerequisites

⚡ Quick Start

1️⃣ Clone or Create Project Directory

2️⃣ Create Virtual Environment

3️⃣ Install Dependencies

4️⃣ Setup Configuration

5️⃣ Create Sample Documents

6️⃣ Launch the Application

🎯 How to Use

Using the Web Interface

Step 1: Initialize the System

Step 2: Ask Questions

Step 3: Search Documents (Optional)

Step 4: Upload Your Documents

Step 5: View History

🗂️ Project Structure

🔧 Configuration

Using Ollama (Free, Local)

Using OpenAI (Paid, Cloud)

Adjusting Settings

📚 Adding Your Own Documents

Supported File Formats

Method 1: Using the Web UI

Method 2: Direct File Copy

Organizing Documents

🎓 Understanding RAG

What is RAG?

Why RAG for Compliance?

🧪 Testing the System

Run the Test Script

🐛 Troubleshooting

Issue: "System not initialized"

Issue: "Ollama connection error"

Issue: "No documents found"

Issue: "OpenAI API key error"

Issue: "Import errors"

Issue: "Slow responses"

📖 Learn More

Understanding the Code

Key Concepts

💡 Tips for Best Results

⚠️ Important Notes

Disclaimer

Data Privacy

Limitations

📞 Support

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages