Skip to content

himanshu2394i/hackerrank-orchestrate-may26

 
 

Repository files navigation

Support Triage Agent

A multi-domain RAG pipeline that automatically resolves or escalates customer support tickets for HackerRank, Anthropic/Claude, and Visa.


Approach

Problem

Given a CSV of support tickets (each with an issue, subject, and company), produce a structured output classifying each ticket as either replied (with a grounded auto-response) or escalated (with a justification), along with its product_area and request_type.

Solution: 7-Stage Pipeline

CSV Input
  │
  ▼
① Validator       — normalize encoding, deduplicate (SHA-256), detect language
  │
  ▼
② Safety Filter   — block prompt injection, PII, garbage input
  │
  ▼
③ Router          — route to domain: hackerrank | claude | visa | all
  │
  ▼
④ Retriever       — hybrid BM25 + FAISS → RRF → Cross-Encoder rerank → top-3 chunks
  │
  ▼
⑤ Risk Classifier — regex-based scoring across 6 risk categories, escalate if score ≥ 2
  │
  ▼
⑥ Generator       — Ollama llama3 (local) → Claude 3.5 Sonnet (API fallback)
  │
  ▼
⑦ Formatter       — infer product_area from chunk source path, classify request_type
  │
  ▼
output.csv

Retrieval Design

The core of the system is a hybrid retrieval pipeline:

  1. FAISS (dense) — all-MiniLM-L6-v2 embeddings, L2-normalized, cosine similarity semantics. Catches semantic paraphrases.
  2. BM25Okapi (sparse) — keyword matching on tokenized corpus. Catches exact product/feature names.
  3. Reciprocal Rank Fusion — merges both ranked lists using score = 1/(60 + rank_faiss) + 1/(60 + rank_bm25). No score normalization required. k=60 per the original Cormack et al. paper.
  4. Cross-Encoder rerankingms-marco-MiniLM-L-6-v2 scores the top-30 RRF candidates jointly against the query. Far more accurate than bi-encoder alone.

The knowledge base is 774 markdown files (322 Claude, 438 HackerRank, 14 Visa), chunked into ~3800 chunks using markdown-header-aware splitting with a 200-word sliding window and 50-word overlap.

Escalation Logic

Tickets are escalated when:

  • Safety filter flags injection, PII, or garbage input
  • Zero chunks survive retrieval after domain fallback
  • Risk score ≥ 2 (financial fraud, account security, legal threats, system outages, security disclosures each score +2; urgency scores +1)

Borderline retrievals are not hard-escalated — the LLM system prompt instructs the model to explicitly say it lacks documentation when context is insufficient, which is more accurate than a raw logit threshold.


Setup

Prerequisites

  • Python 3.10+
  • (Optional) Ollama with llama3 pulled for free local inference
  • Anthropic API key for fallback generation (required if Ollama is not running)

1. Clone and create environment

git clone <repo-url>
cd hackerrank-orchestrate-may26

python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
source .venv/bin/activate

2. Install dependencies

pip install -r requirements.txt

3. Configure environment

cp .env.example .env

Edit .env and set your API key:

ANTHROPIC_API_KEY=your_key_here

4. Build the knowledge index

Run once (or whenever the docs in data/ change):

cd code
python corpus/loader.py

This reads all .md files from data/{claude,hackerrank,visa}/, chunks them, computes embeddings (cached to .cache/), and builds the FAISS and BM25 indexes. Takes ~2-3 minutes on first run; subsequent runs reuse cached embeddings.

5. Run the pipeline

cd code
python main.py \
  --input ../support_tickets/support_tickets.csv \
  --output ../support_tickets/output.csv

Optional — run on the sample set with ground-truth answers for comparison:

python main.py \
  --input ../support_tickets/sample_support_tickets.csv \
  --output ../support_tickets/sample_output.csv

Project Structure

code/
├── main.py                  # CLI entry point
├── agent/
│   ├── validator.py         # Input cleaning, dedup, language detection
│   ├── safety.py            # Injection, PII, garbage detection
│   ├── router.py            # Domain routing (explicit + weighted keywords)
│   ├── retriever.py         # Hybrid retrieval: FAISS + BM25 + RRF + Cross-Encoder
│   ├── risk.py              # Rule-based risk classification
│   ├── generator.py         # LLM generation with Ollama → Claude failover
│   ├── formatter.py         # Output structuring, product area inference
│   └── cache.py             # SHA-256 keyed LLM response disk cache
├── corpus/
│   ├── schema.py            # Chunk dataclass
│   ├── chunker.py           # Markdown-aware sliding window chunker
│   └── loader.py            # Index builder (FAISS + BM25 + diskcache)
└── api/
    └── api.py               # FastAPI REST endpoint (async, threadpool for LLM)

data/
├── claude/                  # 322 markdown support docs
├── hackerrank/              # 438 markdown support docs
└── visa/                    # 14 markdown support docs

support_tickets/
├── support_tickets.csv      # Evaluation input
├── sample_support_tickets.csv  # Dev set with ground-truth answers
└── output.csv               # Generated output

Output Format

The pipeline writes exactly 5 columns to the output CSV:

Column Values Description
status replied / escalated Whether the ticket was auto-resolved or needs human review
product_area e.g. Team And Enterprise Plans Inferred from retrieved document's directory structure
response string Auto-generated grounded answer, or escalation message
justification string Why this decision was made
request_type bug / feature_request / product_issue / invalid Classified via regex

LLM Configuration

The generator tries providers in this order:

Priority Provider Trigger
1st Ollama llama3 (local) Detected via /api/tags — free, private, no latency
2nd Claude 3.5 Sonnet (API) Fallback when Ollama unavailable
3rd Escalation message Fallback when no API key is configured

LLM responses are cached on disk by SHA-256 of the prompt. Re-runs on identical tickets are instant and free.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%