Intelligent conversational AI with RAG-powered contextual memory
- RAG-Powered Responses — Retrieval-Augmented Generation using Pinecone vector database
- Contextual Memory — Automatically generates and stores embeddings for conversation history
- Real-time Communication — WebSocket-based messaging with Socket.IO
- Async Processing — Celery task queue for background embedding generation
- Scalable Architecture — Microservices design with Redis message queue
- Vector Search — Semantic search across conversation history using OpenAI embeddings
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Client │───►│ Flask │───►│ Redis │───►│ Celery │
│ WebSocket │ │ API │ │ Message │ │ Worker │
└─────────────┘ └─────────────┘ │ Queue │ └─────────────┘
│ └─────────────┘ │
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ PostgreSQL │ │ Pinecone │
│ Database │ │ Vector │
└─────────────┘ │ Database │
└─────────────┘
▲ ▲
│ │
└──────────────────┬───────────────────┘
│
┌─────────────┐
│ OpenAI │
│ API │
└─────────────┘
- Flask API — WebSocket server handling real-time messaging
- Celery Workers — Asynchronous AI inference and embedding generation
- PostgreSQL — Persistent storage for users, conversations, and messages
- Redis — Message queue for task distribution and WebSocket scaling
- Pinecone — Vector database for semantic search and RAG
- LangChain — Framework orchestrating LLM, embeddings, and vector retrieval
- Docker & Docker Compose
- Python 3.11+ (for local development)
- OpenAI API key
- Pinecone API key
git clone https://github.com/krushna-b/MIRA
cd MIRACreate .env in the root directory:
OPENAI_API_KEY=sk-proj-...
PINECONE_API_KEY=pcsk_...
SECRET_KEY=your-secret-key-hereUpdate backend/.env:
OPENAI_API_KEY=sk-proj-...
PINECONE_API_KEY=pcsk_...
PINECONE_INDEX_NAME=mira-embeddings
# Database
DATABASE_URL=postgresql://mira_user:mira_password@postgres:5432/mira_db
REDIS_URL=redis://redis:6379/0
# Flask
FLASK_ENV=development
SECRET_KEY=your-secret-key-here
CORS_ORIGINS=http://localhost:3000docker-compose up --buildThis will start:
- PostgreSQL (port 5432)
- Redis (port 6379)
- Flask API (port 5001)
- Celery Worker
mira/
├── backend/ # Python backend
│ ├── app/
│ │ ├── api/ # API routes and WebSocket handlers
│ │ │ ├── routes.py # REST endpoints
│ │ │ └── websocket.py # Socket.IO events
│ │ ├── models/ # SQLAlchemy models
│ │ │ ├── user.py
│ │ │ ├── conversation.py
│ │ │ └── message.py
│ │ ├── schemas/ # Pydantic schemas
│ │ ├── services/ # Business logic
│ │ │ ├── langchain_rag_service.py # RAG implementation
│ │ │ ├── pinecone_service.py # Vector operations
│ │ │ ├── embedding_service.py # OpenAI embeddings
│ │ │ └── database_service.py # Database operations
│ │ ├── tasks/ # Celery tasks
│ │ │ ├── inference_task.py # AI response generation
│ │ │ └── embedding_task.py # Background embeddings
│ │ └── utils/ # Helpers
│ ├── celery_worker.py # Celery app configuration
│ ├── requirements.txt
│ └── Dockerfile
├── docker-compose.yml
├── .env
└── README.md
- User sends message via WebSocket
- Flask API validates and stores message in PostgreSQL
- Celery task queued for AI processing
- RAG service retrieves relevant context from Pinecone
- LLM generates response using context + conversation history
- Response saved to database and sent via WebSocket
- Background task generates embeddings for both messages
- Embeddings stored in Pinecone for future retrieval
Edit backend/app/services/langchain_rag_service.py:
# Embedding model
OpenAIEmbeddings(model="text-embedding-3-small") # 1536 dimensions
# Chat model
ChatOpenAI(model="gpt-5.2", temperature=0.7)- Backend Framework: Flask + Flask-SocketIO
- Task Queue: Celery + Redis
- Database: PostgreSQL + SQLAlchemy
- Vector Database: Pinecone
- LLM Framework: LangChain
- AI Models: OpenAI GPT-5.2, text-embedding-3-small
- Containerization: Docker
This project is licensed under the MIT License — see the LICENSE file for details.