Skip to content

natserract/modern-rag

Repository files navigation

Modern RAG

A new strategy to understanding your business

Development

Setup

# Create python environment
uv venv

# Sync package
uv sync

# Activate environment
source .venv/bin/activate

Environment Setup

  1. Create .env file in the project root with required variables:
OPENAI_API_KEY=your_openai_api_key_here

Optional environment variables:

  • MILVUS_HOST (default: localhost)
  • MILVUS_PORT (default: 19530)
  • MILVUS_DB (default: None)
  • MILVUS_COLLECTION (default: documents)
  • EMBEDDING_MODEL (default: text-embedding-3-small)
  • SQLITE_DB_PATH (default: db/app.db)

Start Services

Start Milvus vector database and dependencies with Docker Compose:

docker compose up

This starts:

  • Milvus (vector database) on port 19530
  • etcd (Milvus metadata) on port 2379
  • MinIO (Milvus storage) on ports 9000/9001

Run API Server

Using Make:

make run-api

Equivalent command:

uvicorn api.app:app --reload

API

Endpoints

Agent Endpoints

POST /agent/chat/stream - Stream chat responses using CrewAI pipeline or OpenAI fallback

  • Body: { messages: [{ role, content }], model?: string }
  • Returns: Server-Sent Events (SSE) stream

Document Management

POST /uploads - Upload and process documents (PDF, Excel, CSV)

  • Extracts, chunks, embeds into Milvus
  • CSV/Excel rows loaded into SQLite tables
  • Returns: { filename, chunk_count, inserted_ids, collection, doc_id, table?, row_count? }

GET /documents - List all documents from SQLite and Milvus (merged by doc_id)

  • Returns: { documents: [...], count: number, sqlite_tables: [...] }
  • Includes metadata for all SQLite tables

DELETE /documents/{doc_id} - Delete document from both SQLite and Milvus

  • Returns: { status, doc_id, deleted_from_sqlite, deleted_from_milvus }

Search & Utilities

POST /search - Semantic search in Milvus

  • Body: { query: string, top_k?: number, score_threshold?: number, document_ids_filter?: string[] }
  • Returns: { results: [{ content, metadata }] }

POST /milvus/reset - Drop Milvus collection (destructive)

  • Returns: { status, dropped_collection }

GET /health - Health check

  • Returns: { status: "ok" }

UI

Environment Setup

  1. Create .env.local file in the ui/ directory:
NEXT_PUBLIC_API_URL=http://127.0.0.1:8000

Run UI Server

Navigate to the ui/ directory and start the development server:

cd ui
pnpm dev
# or
npm run dev

Open http://localhost:3000 to access the UI.

The UI provides:

  • Knowledge Tab: Upload documents (PDF, Excel, CSV) with drag-and-drop interface, view all uploaded documents and SQLite tables, delete documents
  • Chat Tab: Interactive chat interface with streaming responses, response duration display, and loading indicators

Workflow

This section explains the end-to-end agent and tool workflow from request to response, how components interact, and where key logic lives.

How this differs from traditional RAG

Traditional RAG relies primarily on a single vector store to retrieve passages and then draft an answer. Our approach intentionally combines three complementary stores and fuses them during reasoning:

  • Relational store (SQLite): precise, schema-aware answers via NL→SQL for counts, aggregations, and table-grounded facts. Enables verifiable numbers and constraints.
  • Vector store (Milvus): semantic recall of unstructured content (PDFs/CSVs/XLS) to capture context and evidence beyond tables.
  • ER graph (from metadata/schema.mermaid): structural context over entities/relationships to inform retrieval scope and summarization.

This multi-signal design lets the agent both “look up” exact values (SQL) and “look around” for context (vectors/graph), improving faithfulness and coverage over vanilla RAG.

Orchestration (Crew pipeline)

Implemented in api/agents/crew.py using CrewAI agents and tasks:

  1. Query understanding

    • Agent: QueryUnderstandingAgent
    • Tool: UnderstandQueryTool
    • Output: normalized plan JSON with keys like intent, entities, targets, granularity, filters, source_table, count_like.
  2. Plan validation / guard

    • Agent: ContextGuardAgent
    • Tool: EnsureValidPlanTool
    • Output: { plan_json, needs_clarification, message }. If needs_clarification is true, the pipeline may short-circuit with a clarification message.
  3. Retrieval and reasoning

    • Agent: RetrieverAgent
    • Tools invoked in this step:
      • DbIntrospectTool to fetch live SQLite schema as JSON
      • SqlQueryTool to generate and execute a safe SELECT when applicable
      • RetrieveVectorTool to gather semantic evidence from Milvus
      • RetrieveGraphTool to build an ER subgraph context from metadata/schema.mermaid
      • SynthesizeTool to merge plan, graph, and evidence into a draft
      • VerifyTool to compute confidence and support flags
      • SummarizeTool to produce final text
  4. Finalization

    • Agent: ReasoningAgent (no-op placeholder; prior step already summarizes)
    • CrewOrchestrator appends the final result to the execution log.

Tools: responsibilities and I/O

Defined in api/agents/tools.py with Pydantic arg schemas and return values. Each tool appends an event to an execution log.

  • understand_query (UnderstandQueryTool)

    • Input: { user_question }
    • Output: plan JSON
    • Uses: api/agents/query_understanding.py (loads metadata/schema.mermaid and metadata/terms.json)
  • ensure_valid_plan (EnsureValidPlanTool)

    • Input: { user_question, plan_json, schema_data? }
    • Output: { plan_json, needs_clarification, message }
    • Uses: dynamic entity-to-table resolution via DbIntrospectTool and metadata/terms.json
  • db_introspect (DbIntrospectTool)

    • Input: { include_columns }
    • Output: { tables: [{ name, columns? }] } from live SQLite
  • retrieve_vector (RetrieveVectorTool)

    • Input: { user_question, top_k }
    • Output: list of { content, metadata } from Milvus
  • retrieve_graph (RetrieveGraphTool)

    • Input: { plan_json }
    • Output: { graph: { entities, relationships }, entities, source_table, targets, filters }
  • synthesize (SynthesizeTool)

    • Input: { plan_json, vector_results_json, graph_context_json }
    • Output: draft JSON { plan, graph, evidence }
  • verify (VerifyTool)

    • Input: { draft_json }
    • Output: { draft, confidence, supported }
  • summarize (SummarizeTool)

    • Input: { verification_json }
    • Output: final text
  • sql_query (SqlQueryTool)

    • Input: { question, plan_json?, schema_data? }
    • Output: { sql, columns, rows }
    • Uses: api/agents/sql_agent.py for LLM SQL generation and read-only execution; guards write operations and caps results.

Execution logging

api/agents/tools.py manages a per-execution session:

  • _start_execution_session() starts a session buffer.
  • _append_eval_log(event) records tool inputs/outputs with timestamps.
  • _end_execution_session() persists the buffered session into eval/logs.json and clears the buffer.

Sequence diagram

sequenceDiagram
    autonumber
    participant Client
    participant FastAPI as FastAPI (api/agent.py)
    participant Crew as CrewOrchestrator
    participant Tools as Tools (tools.py)
    participant SQLite as SQLite
    participant Milvus as Milvus
    participant OpenAI as OpenAI

    Client->>FastAPI: POST /agent/chat/stream
    FastAPI->>Crew: run_stream(question)
    Crew->>Tools: understand_query(question)
    Tools-->>Crew: plan JSON
    Crew->>Tools: ensure_valid_plan(question, plan)
    Tools-->>Crew: { plan_json, needs_clarification, message }
    alt needs_clarification
        Crew-->>FastAPI: clarification message
    else proceed
        Crew->>Tools: db_introspect()
        Tools->>SQLite: PRAGMA, sqlite_master
        SQLite-->>Tools: schema JSON
        Crew->>Tools: sql_query(question, plan, schema)
        Tools->>SQLite: SELECT (read-only)
        SQLite-->>Tools: { columns, rows }
        Crew->>Tools: retrieve_vector(question)
        Tools->>OpenAI: embed_texts
        Tools->>Milvus: search_by_vector
        Milvus-->>Tools: docs
        Crew->>Tools: retrieve_graph(plan)
        Tools-->>Crew: graph context
        Crew->>Tools: synthesize(plan, vector, graph)
        Tools-->>Crew: draft
        Crew->>Tools: verify(draft)
        Tools-->>Crew: verification
        Crew->>Tools: summarize(verification)
        Tools-->>Crew: final text
        Crew-->>FastAPI: final text
    end
    note over FastAPI: If Crew fails → fallback to OpenAI streaming, else non-streaming
Loading

Fallbacks

  • Primary: Crew pipeline (multi-agent tools) via CrewOrchestrator.
  • Secondary: OpenAI Chat Completions streaming with the provided history.
  • Tertiary: Non-streaming single-shot completion to avoid total failure.

Slide

https://docs.google.com/presentation/d/1rvQRH_ex0IGg7xhUj9WUlba5_O-bcS6aYDFVa0_LH6s/edit?usp=sharing

Releases

No releases published

Packages

 
 
 

Contributors