Modern RAG

A new strategy to understanding your business

Development

Setup

# Create python environment
uv venv

# Sync package
uv sync

# Activate environment
source .venv/bin/activate

Environment Setup

Create .env file in the project root with required variables:

OPENAI_API_KEY=your_openai_api_key_here

Optional environment variables:

MILVUS_HOST (default: localhost)
MILVUS_PORT (default: 19530)
MILVUS_DB (default: None)
MILVUS_COLLECTION (default: documents)
EMBEDDING_MODEL (default: text-embedding-3-small)
SQLITE_DB_PATH (default: db/app.db)

Start Services

Start Milvus vector database and dependencies with Docker Compose:

docker compose up

This starts:

Milvus (vector database) on port 19530
etcd (Milvus metadata) on port 2379
MinIO (Milvus storage) on ports 9000/9001

Run API Server

Using Make:

make run-api

Equivalent command:

uvicorn api.app:app --reload

API

Endpoints

Agent Endpoints

POST /agent/chat/stream - Stream chat responses using CrewAI pipeline or OpenAI fallback

Body: { messages: [{ role, content }], model?: string }
Returns: Server-Sent Events (SSE) stream

Document Management

POST /uploads - Upload and process documents (PDF, Excel, CSV)

Extracts, chunks, embeds into Milvus
CSV/Excel rows loaded into SQLite tables
Returns: { filename, chunk_count, inserted_ids, collection, doc_id, table?, row_count? }

GET /documents - List all documents from SQLite and Milvus (merged by doc_id)

Returns: { documents: [...], count: number, sqlite_tables: [...] }
Includes metadata for all SQLite tables

DELETE /documents/{doc_id} - Delete document from both SQLite and Milvus

Returns: { status, doc_id, deleted_from_sqlite, deleted_from_milvus }

Search & Utilities

POST /search - Semantic search in Milvus

Body: { query: string, top_k?: number, score_threshold?: number, document_ids_filter?: string[] }
Returns: { results: [{ content, metadata }] }

POST /milvus/reset - Drop Milvus collection (destructive)

Returns: { status, dropped_collection }

GET /health - Health check

Returns: { status: "ok" }

UI

Environment Setup

Create .env.local file in the ui/ directory:

NEXT_PUBLIC_API_URL=http://127.0.0.1:8000

Run UI Server

Navigate to the ui/ directory and start the development server:

cd ui
pnpm dev
# or
npm run dev

Open http://localhost:3000 to access the UI.

The UI provides:

Knowledge Tab: Upload documents (PDF, Excel, CSV) with drag-and-drop interface, view all uploaded documents and SQLite tables, delete documents
Chat Tab: Interactive chat interface with streaming responses, response duration display, and loading indicators

Workflow

This section explains the end-to-end agent and tool workflow from request to response, how components interact, and where key logic lives.

How this differs from traditional RAG

Traditional RAG relies primarily on a single vector store to retrieve passages and then draft an answer. Our approach intentionally combines three complementary stores and fuses them during reasoning:

Relational store (SQLite): precise, schema-aware answers via NL→SQL for counts, aggregations, and table-grounded facts. Enables verifiable numbers and constraints.
Vector store (Milvus): semantic recall of unstructured content (PDFs/CSVs/XLS) to capture context and evidence beyond tables.
ER graph (from metadata/schema.mermaid): structural context over entities/relationships to inform retrieval scope and summarization.

This multi-signal design lets the agent both “look up” exact values (SQL) and “look around” for context (vectors/graph), improving faithfulness and coverage over vanilla RAG.

Orchestration (Crew pipeline)

Implemented in api/agents/crew.py using CrewAI agents and tasks:

Query understanding
- Agent: QueryUnderstandingAgent
- Tool: UnderstandQueryTool
- Output: normalized plan JSON with keys like intent, entities, targets, granularity, filters, source_table, count_like.
Plan validation / guard
- Agent: ContextGuardAgent
- Tool: EnsureValidPlanTool
- Output: { plan_json, needs_clarification, message }. If needs_clarification is true, the pipeline may short-circuit with a clarification message.
Retrieval and reasoning
- Agent: RetrieverAgent
- Tools invoked in this step:
  - DbIntrospectTool to fetch live SQLite schema as JSON
  - SqlQueryTool to generate and execute a safe SELECT when applicable
  - RetrieveVectorTool to gather semantic evidence from Milvus
  - RetrieveGraphTool to build an ER subgraph context from metadata/schema.mermaid
  - SynthesizeTool to merge plan, graph, and evidence into a draft
  - VerifyTool to compute confidence and support flags
  - SummarizeTool to produce final text
Finalization
- Agent: ReasoningAgent (no-op placeholder; prior step already summarizes)
- CrewOrchestrator appends the final result to the execution log.

Tools: responsibilities and I/O

Defined in api/agents/tools.py with Pydantic arg schemas and return values. Each tool appends an event to an execution log.

understand_query (UnderstandQueryTool)
- Input: { user_question }
- Output: plan JSON
- Uses: api/agents/query_understanding.py (loads metadata/schema.mermaid and metadata/terms.json)
ensure_valid_plan (EnsureValidPlanTool)
- Input: { user_question, plan_json, schema_data? }
- Output: { plan_json, needs_clarification, message }
- Uses: dynamic entity-to-table resolution via DbIntrospectTool and metadata/terms.json
db_introspect (DbIntrospectTool)
- Input: { include_columns }
- Output: { tables: [{ name, columns? }] } from live SQLite
retrieve_vector (RetrieveVectorTool)
- Input: { user_question, top_k }
- Output: list of { content, metadata } from Milvus
retrieve_graph (RetrieveGraphTool)
- Input: { plan_json }
- Output: { graph: { entities, relationships }, entities, source_table, targets, filters }
synthesize (SynthesizeTool)
- Input: { plan_json, vector_results_json, graph_context_json }
- Output: draft JSON { plan, graph, evidence }
verify (VerifyTool)
- Input: { draft_json }
- Output: { draft, confidence, supported }
summarize (SummarizeTool)
- Input: { verification_json }
- Output: final text
sql_query (SqlQueryTool)
- Input: { question, plan_json?, schema_data? }
- Output: { sql, columns, rows }
- Uses: api/agents/sql_agent.py for LLM SQL generation and read-only execution; guards write operations and caps results.

Execution logging

api/agents/tools.py manages a per-execution session:

_start_execution_session() starts a session buffer.
_append_eval_log(event) records tool inputs/outputs with timestamps.
_end_execution_session() persists the buffered session into eval/logs.json and clears the buffer.

Sequence diagram

sequenceDiagram
    autonumber
    participant Client
    participant FastAPI as FastAPI (api/agent.py)
    participant Crew as CrewOrchestrator
    participant Tools as Tools (tools.py)
    participant SQLite as SQLite
    participant Milvus as Milvus
    participant OpenAI as OpenAI

    Client->>FastAPI: POST /agent/chat/stream
    FastAPI->>Crew: run_stream(question)
    Crew->>Tools: understand_query(question)
    Tools-->>Crew: plan JSON
    Crew->>Tools: ensure_valid_plan(question, plan)
    Tools-->>Crew: { plan_json, needs_clarification, message }
    alt needs_clarification
        Crew-->>FastAPI: clarification message
    else proceed
        Crew->>Tools: db_introspect()
        Tools->>SQLite: PRAGMA, sqlite_master
        SQLite-->>Tools: schema JSON
        Crew->>Tools: sql_query(question, plan, schema)
        Tools->>SQLite: SELECT (read-only)
        SQLite-->>Tools: { columns, rows }
        Crew->>Tools: retrieve_vector(question)
        Tools->>OpenAI: embed_texts
        Tools->>Milvus: search_by_vector
        Milvus-->>Tools: docs
        Crew->>Tools: retrieve_graph(plan)
        Tools-->>Crew: graph context
        Crew->>Tools: synthesize(plan, vector, graph)
        Tools-->>Crew: draft
        Crew->>Tools: verify(draft)
        Tools-->>Crew: verification
        Crew->>Tools: summarize(verification)
        Tools-->>Crew: final text
        Crew-->>FastAPI: final text
    end
    note over FastAPI: If Crew fails → fallback to OpenAI streaming, else non-streaming

Fallbacks

Primary: Crew pipeline (multi-agent tools) via CrewOrchestrator.
Secondary: OpenAI Chat Completions streaming with the provided history.
Tertiary: Non-streaming single-shot completion to avoid total failure.

Slide

https://docs.google.com/presentation/d/1rvQRH_ex0IGg7xhUj9WUlba5_O-bcS6aYDFVa0_LH6s/edit?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
api		api
db		db
eval		eval
metadata		metadata
sample_data		sample_data
ui		ui
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modern RAG

Development

Setup

Environment Setup

Start Services

Run API Server

API

Endpoints

Agent Endpoints

Document Management

Search & Utilities

UI

Environment Setup

Run UI Server

Workflow

How this differs from traditional RAG

Orchestration (Crew pipeline)

Tools: responsibilities and I/O

Execution logging

Sequence diagram

Fallbacks

Slide

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modern RAG

Development

Setup

Environment Setup

Start Services

Run API Server

API

Endpoints

Agent Endpoints

Document Management

Search & Utilities

UI

Environment Setup

Run UI Server

Workflow

How this differs from traditional RAG

Orchestration (Crew pipeline)

Tools: responsibilities and I/O

Execution logging

Sequence diagram

Fallbacks

Slide

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages