A multi-domain RAG pipeline that automatically resolves or escalates customer support tickets for HackerRank, Anthropic/Claude, and Visa.
Given a CSV of support tickets (each with an issue, subject, and company), produce a structured output classifying each ticket as either replied (with a grounded auto-response) or escalated (with a justification), along with its product_area and request_type.
CSV Input
│
▼
① Validator — normalize encoding, deduplicate (SHA-256), detect language
│
▼
② Safety Filter — block prompt injection, PII, garbage input
│
▼
③ Router — route to domain: hackerrank | claude | visa | all
│
▼
④ Retriever — hybrid BM25 + FAISS → RRF → Cross-Encoder rerank → top-3 chunks
│
▼
⑤ Risk Classifier — regex-based scoring across 6 risk categories, escalate if score ≥ 2
│
▼
⑥ Generator — Ollama llama3 (local) → Claude 3.5 Sonnet (API fallback)
│
▼
⑦ Formatter — infer product_area from chunk source path, classify request_type
│
▼
output.csv
The core of the system is a hybrid retrieval pipeline:
- FAISS (dense) —
all-MiniLM-L6-v2embeddings, L2-normalized, cosine similarity semantics. Catches semantic paraphrases. - BM25Okapi (sparse) — keyword matching on tokenized corpus. Catches exact product/feature names.
- Reciprocal Rank Fusion — merges both ranked lists using
score = 1/(60 + rank_faiss) + 1/(60 + rank_bm25). No score normalization required. k=60 per the original Cormack et al. paper. - Cross-Encoder reranking —
ms-marco-MiniLM-L-6-v2scores the top-30 RRF candidates jointly against the query. Far more accurate than bi-encoder alone.
The knowledge base is 774 markdown files (322 Claude, 438 HackerRank, 14 Visa), chunked into ~3800 chunks using markdown-header-aware splitting with a 200-word sliding window and 50-word overlap.
Tickets are escalated when:
- Safety filter flags injection, PII, or garbage input
- Zero chunks survive retrieval after domain fallback
- Risk score ≥ 2 (financial fraud, account security, legal threats, system outages, security disclosures each score +2; urgency scores +1)
Borderline retrievals are not hard-escalated — the LLM system prompt instructs the model to explicitly say it lacks documentation when context is insufficient, which is more accurate than a raw logit threshold.
- Python 3.10+
- (Optional) Ollama with
llama3pulled for free local inference - Anthropic API key for fallback generation (required if Ollama is not running)
git clone <repo-url>
cd hackerrank-orchestrate-may26
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activatepip install -r requirements.txtcp .env.example .envEdit .env and set your API key:
ANTHROPIC_API_KEY=your_key_here
Run once (or whenever the docs in data/ change):
cd code
python corpus/loader.pyThis reads all .md files from data/{claude,hackerrank,visa}/, chunks them, computes embeddings (cached to .cache/), and builds the FAISS and BM25 indexes. Takes ~2-3 minutes on first run; subsequent runs reuse cached embeddings.
cd code
python main.py \
--input ../support_tickets/support_tickets.csv \
--output ../support_tickets/output.csvOptional — run on the sample set with ground-truth answers for comparison:
python main.py \
--input ../support_tickets/sample_support_tickets.csv \
--output ../support_tickets/sample_output.csvcode/
├── main.py # CLI entry point
├── agent/
│ ├── validator.py # Input cleaning, dedup, language detection
│ ├── safety.py # Injection, PII, garbage detection
│ ├── router.py # Domain routing (explicit + weighted keywords)
│ ├── retriever.py # Hybrid retrieval: FAISS + BM25 + RRF + Cross-Encoder
│ ├── risk.py # Rule-based risk classification
│ ├── generator.py # LLM generation with Ollama → Claude failover
│ ├── formatter.py # Output structuring, product area inference
│ └── cache.py # SHA-256 keyed LLM response disk cache
├── corpus/
│ ├── schema.py # Chunk dataclass
│ ├── chunker.py # Markdown-aware sliding window chunker
│ └── loader.py # Index builder (FAISS + BM25 + diskcache)
└── api/
└── api.py # FastAPI REST endpoint (async, threadpool for LLM)
data/
├── claude/ # 322 markdown support docs
├── hackerrank/ # 438 markdown support docs
└── visa/ # 14 markdown support docs
support_tickets/
├── support_tickets.csv # Evaluation input
├── sample_support_tickets.csv # Dev set with ground-truth answers
└── output.csv # Generated output
The pipeline writes exactly 5 columns to the output CSV:
| Column | Values | Description |
|---|---|---|
status |
replied / escalated |
Whether the ticket was auto-resolved or needs human review |
product_area |
e.g. Team And Enterprise Plans |
Inferred from retrieved document's directory structure |
response |
string | Auto-generated grounded answer, or escalation message |
justification |
string | Why this decision was made |
request_type |
bug / feature_request / product_issue / invalid |
Classified via regex |
The generator tries providers in this order:
| Priority | Provider | Trigger |
|---|---|---|
| 1st | Ollama llama3 (local) |
Detected via /api/tags — free, private, no latency |
| 2nd | Claude 3.5 Sonnet (API) | Fallback when Ollama unavailable |
| 3rd | Escalation message | Fallback when no API key is configured |
LLM responses are cached on disk by SHA-256 of the prompt. Re-runs on identical tickets are instant and free.