An enterprise-grade, production-ready AI Agent system powered by Google Gemini 2.5 Pro, LangGraph, and Next.js. Features a multi-agent pipeline with Corrective RAG (CRAG) self-reflection, hybrid LLM routing, JWT/RBAC security, multi-channel gateway, and full-stack observability.
- 4-Node LangGraph Pipeline β Planner β Researcher β Reviewer β Synthesizer with conditional routing
- Corrective RAG (CRAG) β Self-reflective quality grading with automatic query rewriting and retry loops (up to 3 attempts)
- Intent-Driven Routing β Planner agent classifies queries into
rag,web, ordirectintents for optimal resource utilization - Answer Synthesis β Dedicated synthesizer agent polishes raw answers with proper formatting and artifact removal
- Data Sensitivity Routing β Confidential documents automatically routed to local Ollama (DeepSeek R1) for air-gapped processing
- Query Complexity Optimization β Simple queries β Gemini Flash (fast/cheap), complex reasoning β Gemini Pro (best quality)
- Automatic Fallback β Ollama unavailable β graceful fallback to Gemini Flash
- JWT Authentication β HS256/RS256 token validation with configurable secret
- Role-Based Access Control β Hierarchical roles (
viewerβengineerβmanagerβexecutive) with document-level filtering - Qdrant Access Tags β RBAC tags injected into vector search filters for data isolation
- Dev Mode Bypass β
AUTH_ENABLED=falsefor seamless local development
- Unified Message Protocol β Standardized
UnifiedMessage/UnifiedResponseacross all channels - Web Adapter β SSE streaming for the Next.js frontend
- API Adapter β Synchronous JSON for CI/CD, CLIs, and microservices
- Extensible Architecture β Abstract
ChannelAdapterbase class for adding Slack, Feishu, WeChat bots
- BM25 + Vector Hybrid Search β Sparse keyword + dense embedding retrieval via
QueryFusionRetrieverwith Reciprocal Rank Fusion (RRF) - Citation Query Engine β Traceable source documents with file names, relevance scores, and content previews
- Singleton Index Caching β
@lru_cacheeliminates per-request index rebuilds
- Multi-Turn Conversation Memory β Redis-backed session history with contextual follow-up support
- Multimodal RAG β Image upload via drag-and-drop, clipboard paste, or file picker with Gemini Vision analysis
- Anti-Arrogance Prompting β System prompts ensure internal documents are prioritized over parametric memory
- Typewriter Streaming β Character-by-character SSE streaming with blinking cursor animation
- Syntax-Highlighted Code Blocks β
react-syntax-highlighterwith oneDark theme and one-click copy - Rich Markdown Rendering β GFM tables, inline code, links, lists with
@tailwindcss/typography - Session Sidebar β Create, switch, and delete conversations with real-time management
- Micro-Animations β Fade-in-up entrances, three-dot typing indicator, hover effects
- Arize Phoenix Tracing β Full LLM/Retriever/Tool call tracing with latency and token analytics
- Pipeline Performance Logging β Per-node timing instrumentation with structured logs
- Prometheus Metrics β Request latency, throughput, and error rate via
/metrics - Structured Logging β Loguru with
InterceptHandlerfor unified routing - 66-Test Suite β Comprehensive pytest coverage across agents, auth, router, channels, and pipeline
graph TD
classDef frontend fill:#000000,stroke:#3b82f6,stroke-width:2px,color:#fff
classDef backend fill:#18181b,stroke:#a855f7,stroke-width:2px,color:#fff
classDef agent fill:#27272a,stroke:#eab308,stroke-width:2px,color:#fff
classDef tool fill:#1e1e1e,stroke:#10b981,stroke-width:2px,color:#fff
classDef storage fill:#f8fafc,stroke:#64748b,stroke-width:2px,color:#000
classDef observe fill:#1a1a2e,stroke:#e94560,stroke-width:2px,color:#fff
classDef security fill:#1a1a1a,stroke:#ef4444,stroke-width:2px,color:#fff
User((User)) -->|Query + Image| UI[Next.js 16 Frontend]
UI:::frontend -- SSE Stream --> FastAPI[FastAPI Backend]
FastAPI:::backend --> Auth{JWT/RBAC}
Auth:::security -->|Validate| Roles[Role Expansion]
Roles:::security --> Vision{Image Attached?}
Vision:::backend -->|Yes| GeminiVision[Gemini Vision]
GeminiVision:::tool --> Enriched[Enriched Query]
Vision -->|No| Enriched
Enriched --> Planner[π― Planner Agent]
Planner:::agent -->|Intent Classification| Router{LLM Router}
Router:::agent -->|rag| Researcher[π Researcher Agent]
Router -->|web| Researcher
Router -->|direct| Researcher
Researcher:::agent --> Reviewer[π Reviewer Agent]
Reviewer:::agent -->|APPROVE| Synthesizer[β¨ Synthesizer Agent]
Reviewer -->|REJECT + Rewrite| Researcher
Researcher -->|RAG| Hybrid[Hybrid Retriever]
Hybrid:::tool --> BM25[BM25 Sparse]
Hybrid --> Vector[Vector Dense]
BM25 --> RRF[RRF Fusion]
Vector --> RRF
RRF --> Qdrant[(Qdrant)]:::storage
Researcher -->|Web| Tavily[Tavily Search]:::tool
Synthesizer:::agent --> LLMRouter[Hybrid LLM Router]
LLMRouter:::tool -->|Simple| Flash[Gemini Flash]:::storage
LLMRouter -->|Complex| Pro[Gemini Pro]:::storage
LLMRouter -->|Confidential| Ollama[Local Ollama]:::storage
Synthesizer --> SSE[Typewriter SSE Stream]
SSE:::backend -->|Character-by-char| UI
FastAPI -.->|Auto-Instrument| Phoenix[Phoenix Traces]:::observe
FastAPI -.->|Metrics| Prometheus[Prometheus]:::observe
enterprise-rag-agent/
βββ app/
β βββ agents/ # Multi-Agent LangGraph Pipeline
β β βββ graph.py # LangGraph StateGraph builder (instrumented)
β β βββ state.py # AgentState TypedDict (shared state)
β β βββ planner.py # Intent classifier (rag/web/direct)
β β βββ researcher.py # RAG retrieval + web search + direct answer
β β βββ reviewer.py # CRAG quality grading + query rewriting
β β βββ synthesizer.py # Answer polishing + artifact removal
β βββ api/
β β βββ chat.py # Chat endpoint, SSE streaming, Vision
β β βββ channels.py # Multi-channel gateway API routes
β βββ channels/ # Channel Adapters
β β βββ gateway.py # Unified message protocol + dispatcher
β β βββ web_adapter.py # Web/SSE adapter
β β βββ api_adapter.py # REST API adapter
β βββ core/
β β βββ auth.py # JWT authentication + RBAC role expansion
β β βββ llm_router.py # Hybrid LLM Router (Cloud/Local routing)
β β βββ logger.py # Loguru structured logging
β βββ services/
β β βββ memory.py # Redis + in-memory session store
β β βββ vector_store.py # Hybrid retrieval, citation engine
β β βββ document_processor.py # Document ingestion pipeline
β βββ main.py # FastAPI app, Phoenix + Prometheus init
βββ frontend/
β βββ src/
β βββ app/
β β βββ globals.css # Design system, animations, typewriter cursor
β β βββ layout.tsx # Root layout with metadata
β β βββ page.tsx # Home page
β βββ components/chat/
β βββ ChatContainer.tsx # Main orchestrator, streaming state machine
β βββ ChatInput.tsx # Text + image input with drag/drop/paste
β βββ ChatMessage.tsx # Markdown rendering + typewriter cursor
β βββ Sidebar.tsx # Session list with CRUD operations
βββ tests/ # Comprehensive Test Suite (66 tests)
β βββ conftest.py # Shared fixtures (MockLLM, test states)
β βββ test_agents.py # Planner, Reviewer, Synthesizer unit tests
β βββ test_auth.py # JWT, RBAC, role hierarchy tests
β βββ test_channels.py # Gateway, adapter, protocol tests
β βββ test_graph_pipeline.py # E2E pipeline + routing tests
β βββ test_llm_router.py # LLM routing logic tests
βββ scripts/
β βββ ingest_data.py # Basic document ingestion
β βββ ingest_with_rbac.py # RBAC-tagged document ingestion
βββ data/ # Source documents for RAG
βββ docker-compose.yml # Qdrant + Redis + Backend orchestration
βββ Dockerfile # Backend container image
βββ requirements.txt # Python dependencies
- Docker & Docker Compose
- Node.js 18+ (for frontend)
- Python 3.11+ (for local development)
Create a .env file in the root directory:
GOOGLE_API_KEY=your_gemini_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
QDRANT_HOST=localhost
REDIS_URL=redis://localhost:6379/0
# Optional: Enable JWT auth (default: false for dev)
AUTH_ENABLED=false
JWT_SECRET=your-secret-key-here
# Optional: Enable Hybrid LLM Router
LLM_ROUTER_ENABLED=false
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=deepseek-r1:14bdocker compose up -dThis starts:
| Service | Port | Purpose |
|---|---|---|
| Qdrant | 6333 | Vector database |
| Redis | 6379 | Session persistence |
| Backend | 8000 | FastAPI API server |
# Basic ingestion
python scripts/ingest_data.py
# With RBAC access tags
python scripts/ingest_with_rbac.pycd frontend
npm install
npm run devOpen http://localhost:3000 and try:
| Query | Tests |
|---|---|
| "What is hybrid search and how does RRF work?" | RAG retrieval + citation |
| "What's the weather in Tokyo today?" | Tavily web search fallback |
| "What about its key advantages?" | Multi-turn memory (follow-up) |
| π Upload an image + "Analyze this image" | Multimodal Vision analysis |
| "Hello, how are you?" | Direct answer (no retrieval) |
python -m pytest tests/ -vPOST /api/chat
Content-Type: application/json
{
"query": "What is hybrid search?",
"session_id": "optional-session-id",
"image_base64": "optional-base64-image"
}# List available channels
GET /api/channels
# Send message through a channel
POST /api/channels/{channel}/message
{
"message": "What is RAG?",
"session_id": "session-123",
"user_id": "user-456",
"auth_token": "optional-jwt-token"
}
# Get LLM Router info
GET /api/channels/router/infoGET /api/sessions # List all sessions
GET /api/sessions/{id}/messages # Get session messages
DELETE /api/sessions/{id} # Delete a session| Tool | URL | Purpose |
|---|---|---|
| Phoenix Traces | http://localhost:6006 |
LLM call tracing, token costs, latency |
| Prometheus | http://localhost:8000/metrics |
Request metrics, error rates |
Each graph node is instrumented with timing:
βΆ [Planner] node started
β [Planner] node completed in 450ms | keys=['intent']
βΆ [Researcher] node started
β [Researcher] node completed in 1200ms | keys=['raw_answer', 'sources']
βΆ [Reviewer] node started
β [Reviewer] node completed in 380ms | keys=['review_status', 'final_answer', 'retry_count']
βΆ [Synthesizer] node started
β [Synthesizer] node completed in 950ms | keys=['final_answer', 'llm_route_info']
Pipeline completed in 2980ms | intent=rag | review=APPROVE | answer_len=342 | route=gemini-2.5-pro
| Layer | Technology |
|---|---|
| LLM | Google Gemini 2.5 Pro / Flash |
| Local LLM | Ollama (DeepSeek R1) β optional |
| Embeddings | Gemini Embedding-001 (768d) |
| Vision | Gemini 2.5 Pro (multimodal) |
| Orchestration | LangGraph (StateGraph) |
| Framework | LlamaIndex + LangChain |
| Vector DB | Qdrant |
| Cache | Redis 7 (Alpine) |
| Backend | FastAPI + Uvicorn |
| Frontend | Next.js 16 + React 19 |
| Styling | TailwindCSS 4 + Typography |
| Code Highlight | react-syntax-highlighter (Prism) |
| Web Search | Tavily API |
| Auth | PyJWT (HS256/RS256) |
| Tracing | Arize Phoenix + OpenTelemetry |
| Metrics | Prometheus + FastAPI Instrumentator |
| Logging | Loguru |
| Testing | pytest + pytest-asyncio |
- β Multi-agent LangGraph pipeline (Planner β Researcher β Reviewer β Synthesizer)
- β Corrective RAG (CRAG) with self-reflective quality grading and retry loops
- β Hybrid LLM Router (Cloud/Local routing by sensitivity + complexity)
- β JWT/RBAC security layer with role-based document access
- β Multi-channel gateway with unified message protocol
- β Typewriter SSE streaming with blinking cursor animation
- β Answer synthesis with markdown artifact removal
- β Pipeline observability with per-node timing instrumentation
- β Comprehensive test suite (66 tests across 5 modules)
- β Redis session persistence (cross-restart)
- β Syntax-highlighted code blocks with copy button
- β Micro-animation system (fade-in-up, typing indicator, hover effects)
- β SSE state machine refactor for stream stability
- β Multi-turn conversation memory with session management
- β Multimodal RAG (image upload + Gemini Vision)
- β BM25 + Vector hybrid retrieval with RRF fusion
- β Citation Query Engine with source panel
- β Sidebar session management (create/switch/delete)
- β GFM Markdown rendering
- β Phoenix observability + Prometheus metrics
- β
Singleton index caching with
@lru_cache
- β Basic ReActAgent with Gemini 2.5 Pro
- β Qdrant vector search
- β Tavily web search tool
- β SSE streaming
- β Next.js frontend
Developed with Next.js, FastAPI, LangGraph, LlamaIndex, and β€οΈ