A production-grade, terminal-based AI agent designed for the HackerRank Orchestrate (May 2026) hackathon. This agent autonomously triages, routes, and responds to support tickets across three major product ecosystems—HackerRank, Claude (Anthropic), and Visa—using a strictly grounded local support corpus.
- Strict Grounding: Every response is backed by evidence from a local 770+ document corpus.
- Hybrid Architecture: Combines fast rule-based safety gates with sophisticated LLM-based reasoning.
- Intelligent Routing: Automatically identifies the product domain and scopes retrieval to relevant documentation.
- Safety First: Built-in protection against prompt injections, adversarial requests, and sensitive data handling.
- Determinism: Uses high-relevance retrieval thresholds and JSON mode for reproducible, predictable results.
The agent employs a Retrieval-Augmented Generation (RAG) pipeline optimized for support workflows. It prioritizes safety and accuracy over generic conversation.
- Ingestion: Support tickets are batch-loaded from CSV.
- Domain Routing: The agent identifies if the ticket belongs to HackerRank, Claude, or Visa based on the company metadata.
- Safety Gate: Immediate rule-based check for adversarial patterns (SQL injection, system prompts) or high-risk topics (identity theft, fraud).
- Retrieval: The system performs a TF-IDF search across the scoped corpus to find the most relevant documentation chunks.
- Grounding Gate: Before calling the LLM, the system assesses the strength of the retrieved evidence. If the match score is below the threshold, it escalates to a human agent instead of guessing.
- Reasoned Generation: The LLM (Groq Llama 3.3) generates a structured JSON response containing the status, product area, user response, and a clear justification.
- Final Validation: A post-generation check ensures the response contains no hallucinated URLs, follows the mandatory schema, and is linguistically grounded in the retrieved sources.
| Component | Technology | Rationale |
|---|---|---|
| LLM Backend | Groq (Llama-3.3-70b) | Ultra-low latency and state-of-the-art reasoning for JSON tasks. |
| Retrieval Engine | TF-IDF (Scikit-learn) | Fast, local, and highly effective for keyword-dense support docs. |
| Framework | Python 3.10+ | Standard for AI pipelines with robust CSV and text processing. |
| Safety Gate | Pattern matching & Regex | Low-cost, deterministic pre-filtering for high-risk requests. |
| Validation | Lexical Overlap Check | Ensures the LLM actually uses the provided context. |
main.py: CLI entry point and batch processing loop.agent.py: Core orchestration logic for the triage pipeline.safety.py: Specialized rules for pre-retrieval safety escalation.classifier.py: Detects invalid, off-topic, or spam requests.retriever.py: Handles query synthesis and scoped documentation retrieval.corpus_loader.py: Loads and chunks 770+ documents from thedata/directory.indexer.py: Builds and manages the TF-IDF search index.llm_client.py: Thread-safe interface for Groq API with automatic retries.validators.py: Post-generation grounding and schema validation.output_writer.py: Formats and writes results to the output CSV.config.py: Centralized configuration (thresholds, prompts, model params).utils.py: Text normalization, language detection, and logging utilities.eval_sample.py: Performance evaluator for the provided sample set.design_decisions.md: Rationale for architectural and technical choices.run.bat: Convenience batch script for Windows execution.requirements.txt: Project dependency definitions..env.example: Template for environment variable configuration.
- Python 3.10+
- A Groq API Key
- Install dependencies:
pip install -r requirements.txt
- Configure your environment:
Create a
.envfile in the project root with your API key:GROQ_API_KEY=gsk_...
Run the full triage on the main ticket set:
python main.pyTest against the sample set:
python main.py --sampleRun a specific ticket for debugging:
python main.py --ticket 5 --verboseThe agent follows a zero-trust approach to knowledge. It is prohibited from using internal model weights for product facts. If the retrieved documentation does not explicitly answer the user's query, the agent MUST escalate.
High-risk scenarios are automatically escalated to human agents:
- Financial Security: Fraud, stolen cards, or identity theft.
- Account Integrity: Password resets, access restoration, or account deletion.
- Platform Integrity: Bug reports, security vulnerabilities, or service outages.
- Adversarial Input: Attempts to extract system prompts or bypass safety rules.
A secondary script eval_sample.py is included to verify the agent's performance against the provided ground truth sample labels. This was used during development to tune retrieval thresholds and minimize over-escalation while maintaining 100% safety.
This project is licensed under the MIT License - see the LICENSE file for details.

