Skip to content

hypoxic127/enterprise-rag-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Enterprise RAG Agent v3.0

License: MIT Python 3.11+ Next.js 16 Gemini 2.5 Pro LangGraph Qdrant Redis Phoenix

An enterprise-grade, production-ready AI Agent system powered by Google Gemini 2.5 Pro, LangGraph, and Next.js. Features a multi-agent pipeline with Corrective RAG (CRAG) self-reflection, hybrid LLM routing, JWT/RBAC security, multi-channel gateway, and full-stack observability.


✨ Core Features

πŸ€– Multi-Agent Orchestration (v3.0)

  • 4-Node LangGraph Pipeline β€” Planner β†’ Researcher β†’ Reviewer β†’ Synthesizer with conditional routing
  • Corrective RAG (CRAG) β€” Self-reflective quality grading with automatic query rewriting and retry loops (up to 3 attempts)
  • Intent-Driven Routing β€” Planner agent classifies queries into rag, web, or direct intents for optimal resource utilization
  • Answer Synthesis β€” Dedicated synthesizer agent polishes raw answers with proper formatting and artifact removal

🧠 Hybrid LLM Router

  • Data Sensitivity Routing β€” Confidential documents automatically routed to local Ollama (DeepSeek R1) for air-gapped processing
  • Query Complexity Optimization β€” Simple queries β†’ Gemini Flash (fast/cheap), complex reasoning β†’ Gemini Pro (best quality)
  • Automatic Fallback β€” Ollama unavailable β†’ graceful fallback to Gemini Flash

πŸ” JWT/RBAC Security Layer

  • JWT Authentication β€” HS256/RS256 token validation with configurable secret
  • Role-Based Access Control β€” Hierarchical roles (viewer β†’ engineer β†’ manager β†’ executive) with document-level filtering
  • Qdrant Access Tags β€” RBAC tags injected into vector search filters for data isolation
  • Dev Mode Bypass β€” AUTH_ENABLED=false for seamless local development

🌐 Multi-Channel Gateway

  • Unified Message Protocol β€” Standardized UnifiedMessage/UnifiedResponse across all channels
  • Web Adapter β€” SSE streaming for the Next.js frontend
  • API Adapter β€” Synchronous JSON for CI/CD, CLIs, and microservices
  • Extensible Architecture β€” Abstract ChannelAdapter base class for adding Slack, Feishu, WeChat bots

πŸ” Advanced Retrieval

  • BM25 + Vector Hybrid Search β€” Sparse keyword + dense embedding retrieval via QueryFusionRetriever with Reciprocal Rank Fusion (RRF)
  • Citation Query Engine β€” Traceable source documents with file names, relevance scores, and content previews
  • Singleton Index Caching β€” @lru_cache eliminates per-request index rebuilds

🧠 AI & LLM Capabilities

  • Multi-Turn Conversation Memory β€” Redis-backed session history with contextual follow-up support
  • Multimodal RAG β€” Image upload via drag-and-drop, clipboard paste, or file picker with Gemini Vision analysis
  • Anti-Arrogance Prompting β€” System prompts ensure internal documents are prioritized over parametric memory

🎨 Premium Frontend

  • Typewriter Streaming β€” Character-by-character SSE streaming with blinking cursor animation
  • Syntax-Highlighted Code Blocks β€” react-syntax-highlighter with oneDark theme and one-click copy
  • Rich Markdown Rendering β€” GFM tables, inline code, links, lists with @tailwindcss/typography
  • Session Sidebar β€” Create, switch, and delete conversations with real-time management
  • Micro-Animations β€” Fade-in-up entrances, three-dot typing indicator, hover effects

βš™οΈ Engineering & Observability

  • Arize Phoenix Tracing β€” Full LLM/Retriever/Tool call tracing with latency and token analytics
  • Pipeline Performance Logging β€” Per-node timing instrumentation with structured logs
  • Prometheus Metrics β€” Request latency, throughput, and error rate via /metrics
  • Structured Logging β€” Loguru with InterceptHandler for unified routing
  • 66-Test Suite β€” Comprehensive pytest coverage across agents, auth, router, channels, and pipeline

πŸ—οΈ System Architecture

graph TD
    classDef frontend fill:#000000,stroke:#3b82f6,stroke-width:2px,color:#fff
    classDef backend fill:#18181b,stroke:#a855f7,stroke-width:2px,color:#fff
    classDef agent fill:#27272a,stroke:#eab308,stroke-width:2px,color:#fff
    classDef tool fill:#1e1e1e,stroke:#10b981,stroke-width:2px,color:#fff
    classDef storage fill:#f8fafc,stroke:#64748b,stroke-width:2px,color:#000
    classDef observe fill:#1a1a2e,stroke:#e94560,stroke-width:2px,color:#fff
    classDef security fill:#1a1a1a,stroke:#ef4444,stroke-width:2px,color:#fff

    User((User)) -->|Query + Image| UI[Next.js 16 Frontend]
    UI:::frontend -- SSE Stream --> FastAPI[FastAPI Backend]
    FastAPI:::backend --> Auth{JWT/RBAC}
    Auth:::security -->|Validate| Roles[Role Expansion]
    Roles:::security --> Vision{Image Attached?}
    Vision:::backend -->|Yes| GeminiVision[Gemini Vision]
    GeminiVision:::tool --> Enriched[Enriched Query]
    Vision -->|No| Enriched

    Enriched --> Planner[🎯 Planner Agent]
    Planner:::agent -->|Intent Classification| Router{LLM Router}
    Router:::agent -->|rag| Researcher[πŸ” Researcher Agent]
    Router -->|web| Researcher
    Router -->|direct| Researcher

    Researcher:::agent --> Reviewer[πŸ”Ž Reviewer Agent]
    Reviewer:::agent -->|APPROVE| Synthesizer[✨ Synthesizer Agent]
    Reviewer -->|REJECT + Rewrite| Researcher

    Researcher -->|RAG| Hybrid[Hybrid Retriever]
    Hybrid:::tool --> BM25[BM25 Sparse]
    Hybrid --> Vector[Vector Dense]
    BM25 --> RRF[RRF Fusion]
    Vector --> RRF
    RRF --> Qdrant[(Qdrant)]:::storage

    Researcher -->|Web| Tavily[Tavily Search]:::tool

    Synthesizer:::agent --> LLMRouter[Hybrid LLM Router]
    LLMRouter:::tool -->|Simple| Flash[Gemini Flash]:::storage
    LLMRouter -->|Complex| Pro[Gemini Pro]:::storage
    LLMRouter -->|Confidential| Ollama[Local Ollama]:::storage

    Synthesizer --> SSE[Typewriter SSE Stream]
    SSE:::backend -->|Character-by-char| UI

    FastAPI -.->|Auto-Instrument| Phoenix[Phoenix Traces]:::observe
    FastAPI -.->|Metrics| Prometheus[Prometheus]:::observe
Loading

πŸ“ Project Structure

enterprise-rag-agent/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ agents/                     # Multi-Agent LangGraph Pipeline
β”‚   β”‚   β”œβ”€β”€ graph.py                # LangGraph StateGraph builder (instrumented)
β”‚   β”‚   β”œβ”€β”€ state.py                # AgentState TypedDict (shared state)
β”‚   β”‚   β”œβ”€β”€ planner.py              # Intent classifier (rag/web/direct)
β”‚   β”‚   β”œβ”€β”€ researcher.py           # RAG retrieval + web search + direct answer
β”‚   β”‚   β”œβ”€β”€ reviewer.py             # CRAG quality grading + query rewriting
β”‚   β”‚   └── synthesizer.py          # Answer polishing + artifact removal
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ chat.py                 # Chat endpoint, SSE streaming, Vision
β”‚   β”‚   └── channels.py             # Multi-channel gateway API routes
β”‚   β”œβ”€β”€ channels/                   # Channel Adapters
β”‚   β”‚   β”œβ”€β”€ gateway.py              # Unified message protocol + dispatcher
β”‚   β”‚   β”œβ”€β”€ web_adapter.py          # Web/SSE adapter
β”‚   β”‚   └── api_adapter.py          # REST API adapter
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ auth.py                 # JWT authentication + RBAC role expansion
β”‚   β”‚   β”œβ”€β”€ llm_router.py           # Hybrid LLM Router (Cloud/Local routing)
β”‚   β”‚   └── logger.py               # Loguru structured logging
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ memory.py               # Redis + in-memory session store
β”‚   β”‚   β”œβ”€β”€ vector_store.py         # Hybrid retrieval, citation engine
β”‚   β”‚   └── document_processor.py   # Document ingestion pipeline
β”‚   └── main.py                     # FastAPI app, Phoenix + Prometheus init
β”œβ”€β”€ frontend/
β”‚   └── src/
β”‚       β”œβ”€β”€ app/
β”‚       β”‚   β”œβ”€β”€ globals.css         # Design system, animations, typewriter cursor
β”‚       β”‚   β”œβ”€β”€ layout.tsx          # Root layout with metadata
β”‚       β”‚   └── page.tsx            # Home page
β”‚       └── components/chat/
β”‚           β”œβ”€β”€ ChatContainer.tsx    # Main orchestrator, streaming state machine
β”‚           β”œβ”€β”€ ChatInput.tsx        # Text + image input with drag/drop/paste
β”‚           β”œβ”€β”€ ChatMessage.tsx      # Markdown rendering + typewriter cursor
β”‚           └── Sidebar.tsx          # Session list with CRUD operations
β”œβ”€β”€ tests/                          # Comprehensive Test Suite (66 tests)
β”‚   β”œβ”€β”€ conftest.py                 # Shared fixtures (MockLLM, test states)
β”‚   β”œβ”€β”€ test_agents.py              # Planner, Reviewer, Synthesizer unit tests
β”‚   β”œβ”€β”€ test_auth.py                # JWT, RBAC, role hierarchy tests
β”‚   β”œβ”€β”€ test_channels.py            # Gateway, adapter, protocol tests
β”‚   β”œβ”€β”€ test_graph_pipeline.py      # E2E pipeline + routing tests
β”‚   └── test_llm_router.py          # LLM routing logic tests
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ ingest_data.py              # Basic document ingestion
β”‚   └── ingest_with_rbac.py         # RBAC-tagged document ingestion
β”œβ”€β”€ data/                           # Source documents for RAG
β”œβ”€β”€ docker-compose.yml              # Qdrant + Redis + Backend orchestration
β”œβ”€β”€ Dockerfile                      # Backend container image
└── requirements.txt                # Python dependencies

πŸš€ Quick Start

1. Prerequisites

  • Docker & Docker Compose
  • Node.js 18+ (for frontend)
  • Python 3.11+ (for local development)

2. Environment Configuration

Create a .env file in the root directory:

GOOGLE_API_KEY=your_gemini_api_key_here
TAVILY_API_KEY=your_tavily_api_key_here
QDRANT_HOST=localhost
REDIS_URL=redis://localhost:6379/0

# Optional: Enable JWT auth (default: false for dev)
AUTH_ENABLED=false
JWT_SECRET=your-secret-key-here

# Optional: Enable Hybrid LLM Router
LLM_ROUTER_ENABLED=false
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=deepseek-r1:14b

3. Start Infrastructure

docker compose up -d

This starts:

Service Port Purpose
Qdrant 6333 Vector database
Redis 6379 Session persistence
Backend 8000 FastAPI API server

4. Data Ingestion

# Basic ingestion
python scripts/ingest_data.py

# With RBAC access tags
python scripts/ingest_with_rbac.py

5. Start Frontend

cd frontend
npm install
npm run dev

6. Experience the Agent

Open http://localhost:3000 and try:

Query Tests
"What is hybrid search and how does RRF work?" RAG retrieval + citation
"What's the weather in Tokyo today?" Tavily web search fallback
"What about its key advantages?" Multi-turn memory (follow-up)
πŸ“Ž Upload an image + "Analyze this image" Multimodal Vision analysis
"Hello, how are you?" Direct answer (no retrieval)

7. Run Tests

python -m pytest tests/ -v

πŸ”Œ API Reference

Chat (SSE Streaming)

POST /api/chat
Content-Type: application/json

{
  "query": "What is hybrid search?",
  "session_id": "optional-session-id",
  "image_base64": "optional-base64-image"
}

Multi-Channel Gateway

# List available channels
GET /api/channels

# Send message through a channel
POST /api/channels/{channel}/message
{
  "message": "What is RAG?",
  "session_id": "session-123",
  "user_id": "user-456",
  "auth_token": "optional-jwt-token"
}

# Get LLM Router info
GET /api/channels/router/info

Session Management

GET  /api/sessions                      # List all sessions
GET  /api/sessions/{id}/messages        # Get session messages
DELETE /api/sessions/{id}               # Delete a session

πŸ“Š Observability

Tool URL Purpose
Phoenix Traces http://localhost:6006 LLM call tracing, token costs, latency
Prometheus http://localhost:8000/metrics Request metrics, error rates

Pipeline Logging

Each graph node is instrumented with timing:

β–Ά [Planner] node started
βœ“ [Planner] node completed in 450ms | keys=['intent']
β–Ά [Researcher] node started
βœ“ [Researcher] node completed in 1200ms | keys=['raw_answer', 'sources']
β–Ά [Reviewer] node started
βœ“ [Reviewer] node completed in 380ms | keys=['review_status', 'final_answer', 'retry_count']
β–Ά [Synthesizer] node started
βœ“ [Synthesizer] node completed in 950ms | keys=['final_answer', 'llm_route_info']
Pipeline completed in 2980ms | intent=rag | review=APPROVE | answer_len=342 | route=gemini-2.5-pro

πŸ”§ Tech Stack

Layer Technology
LLM Google Gemini 2.5 Pro / Flash
Local LLM Ollama (DeepSeek R1) β€” optional
Embeddings Gemini Embedding-001 (768d)
Vision Gemini 2.5 Pro (multimodal)
Orchestration LangGraph (StateGraph)
Framework LlamaIndex + LangChain
Vector DB Qdrant
Cache Redis 7 (Alpine)
Backend FastAPI + Uvicorn
Frontend Next.js 16 + React 19
Styling TailwindCSS 4 + Typography
Code Highlight react-syntax-highlighter (Prism)
Web Search Tavily API
Auth PyJWT (HS256/RS256)
Tracing Arize Phoenix + OpenTelemetry
Metrics Prometheus + FastAPI Instrumentator
Logging Loguru
Testing pytest + pytest-asyncio

πŸ“‹ Changelog

v3.0 β€” Multi-Agent AI OS

  • βœ… Multi-agent LangGraph pipeline (Planner β†’ Researcher β†’ Reviewer β†’ Synthesizer)
  • βœ… Corrective RAG (CRAG) with self-reflective quality grading and retry loops
  • βœ… Hybrid LLM Router (Cloud/Local routing by sensitivity + complexity)
  • βœ… JWT/RBAC security layer with role-based document access
  • βœ… Multi-channel gateway with unified message protocol
  • βœ… Typewriter SSE streaming with blinking cursor animation
  • βœ… Answer synthesis with markdown artifact removal
  • βœ… Pipeline observability with per-node timing instrumentation
  • βœ… Comprehensive test suite (66 tests across 5 modules)

v2.1 β€” Optimization Phase

  • βœ… Redis session persistence (cross-restart)
  • βœ… Syntax-highlighted code blocks with copy button
  • βœ… Micro-animation system (fade-in-up, typing indicator, hover effects)
  • βœ… SSE state machine refactor for stream stability

v2.0 β€” Architecture Upgrade

  • βœ… Multi-turn conversation memory with session management
  • βœ… Multimodal RAG (image upload + Gemini Vision)
  • βœ… BM25 + Vector hybrid retrieval with RRF fusion
  • βœ… Citation Query Engine with source panel
  • βœ… Sidebar session management (create/switch/delete)
  • βœ… GFM Markdown rendering
  • βœ… Phoenix observability + Prometheus metrics
  • βœ… Singleton index caching with @lru_cache

v1.0 β€” Initial Release

  • βœ… Basic ReActAgent with Gemini 2.5 Pro
  • βœ… Qdrant vector search
  • βœ… Tavily web search tool
  • βœ… SSE streaming
  • βœ… Next.js frontend

Developed with Next.js, FastAPI, LangGraph, LlamaIndex, and ❀️

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors