📐 Design Notes — Customer Email Agent

🧠 Memory Design

Per-Sender Isolated Memory — Each sender has an independent memory store to prevent cross-user context leakage and maintain clean conversation tracking.
Dual Memory Architecture — Combines InMemoryChatMessageHistory (full conversational context for the LLM) with a lightweight structured history log (category, action, subject) for fast system-level checks.
Context Injection Strategy — When a new email arrives, the agent builds a composite context string (chat history + system summary) and injects it into the LLM prompt to enable follow-up awareness.
Action-Aware Metadata Tracking — The lightweight history log enables duplicate detection, escalation stickiness, and quick retrieval of the last handled category without querying the full chat memory.

Structured Confidence Scoring — The LLM outputs a validated confidence score (0.0–1.0) via Pydantic schema, ensuring reliable decision-making instead of heuristic guesswork.
Threshold-Based Automation Control — Emails with confidence ≥ 0.75 are eligible for auto-reply; anything below triggers escalation to avoid risky automation.
Confidence as Risk Guardrail — Low-confidence classifications act as a safety net, ensuring ambiguous or unclear emails are handled by humans.

Multi-Layer Escalation Checks — Escalation occurs if confidence is low, high-risk keywords are detected (fraud, hack, sue, legal, lawyer), or the sender was previously escalated.
Sticky Escalation Rule — Once escalated, all future emails from that sender in the session are automatically escalated for consistent handling.
LLM-Bypass for Duplicates — Duplicate follow-ups skip LLM analysis entirely and receive a system-generated acknowledgment, improving efficiency and reducing API cost.