A production-ready SQL query generator using Retrieval-Augmented Generation (RAG) with smart schema retrieval and multi-database support. Test with: https://kitan-dara06-rag-sql-srcuistreamlit-app-wfghav.streamlit.app/
- 🤖 Natural Language to SQL: Ask questions in plain English
- 🔌 Multi-Database Support: SQLite, PostgreSQL, MySQL
- 🧠 Smart Retrieval: Foreign key analysis for better context
- 🎯 High Accuracy: AST validation and query critic
- 🌐 Web UI: Streamlit interface for easy access
- 🔒 Secure: Session-isolated, AST validation, error sanitization
pip install -r requirements.txtcp .env.example .env
# Edit .env and add your OPENAI_KEYstreamlit run app.pyOpen http://localhost:8501 in your browser.
- Select your database type (SQLite/PostgreSQL/MySQL)
- Enter connection details
- Click "Connect"
- Click "Index Schema"
- Start asking questions!
- "How many users are in the database?"
- "What is the total revenue from all orders?"
- "Which customer spent the most money?"
- "Show me the top 5 products by sales"
User Question
↓
Smart Retrieval (Vector Search + Foreign Keys)
↓
SQL Generation (GPT-4o-mini)
↓
AST Validation (sqlglot)
↓
Query Execution
↓
Answer Synthesis
SQL_RAG/
├── app.py # Streamlit web UI
├── generator2.py # Main RAG agent (production)
├── generator.py # Simple RAG agent (baseline)
├── config.py # Configuration management
├── logger.py # Logging system
├── sql_rag.py # Schema extraction
├── indexer.py # Schema indexing
├── validators.py # SQL validation
├── exceptions.py # Custom exceptions
├── rate_limiter.py # API rate limiting
├── tests/ # Test files
└── requirements.txt # Dependencies
Edit .env file:
# OpenAI
OPENAI_KEY=your_key_here
OPENAI_MODEL=gpt-4o-mini
# Database (example for PostgreSQL)
DB_TYPE=postgresql
DB_HOST=localhost
DB_PORT=5432
DB_NAME=your_database
DB_USER=your_user
DB_PASSWORD=your_password
# ChromaDB
CHROMA_DB_PATH=./repo_db
EMBEDDING_MODEL=all-MiniLM-L6-v2
# Logging
LOG_LEVEL=INFO
LOG_FILE=sql_rag.logfrom sqlalchemy import create_engine
from generator2 import run_agent
# Create database engine
engine = create_engine("postgresql://user:pass@localhost/db")
# Ask a question
result = run_agent("How many users?", engine)
if result and result['success']:
print(result['data'])cd tests
python test_sql_rag.py
python test_postgres.pypython indexer.py- AST Validation: Blocks modification operations (INSERT, UPDATE, DELETE)
- Error Sanitization: Prevents information leakage
- Session Isolation: Multi-tenant capable
- Rate Limiting: Prevents API abuse
- OpenAI GPT-4o-mini: SQL generation
- ChromaDB: Vector database for schema search
- SQLAlchemy: Database abstraction
- sqlglot: SQL parsing and validation
- Streamlit: Web UI
- Sentence Transformers: Text embeddings
MIT
Contributions welcome! Please open an issue or PR.