TL;DR: Paste a support ticket, get a structured triage report in 2-5 minutes. The agent pulls evidence from PostHog's own data, live docs, and GitHub issues in parallel — then produces a confidence-graded root cause assessment with a ready-to-send customer response. No guessing, no stale knowledge, and a read-only investigation workflow by design.
Watch it diagnose a real unanswered issue
Every support engineer gets a head start on every ticket.
The bottleneck in support isn't answering tickets — it's the 15-30 minutes of investigation before an answer is even possible. Checking person properties, querying events, searching GitHub for known bugs, cross-referencing docs, figuring out which SDK version matters — all before a single word is written back to the customer.
This agent runs that same investigation in parallel — the same MCP queries, the same GitHub searches, the same docs lookups — and produces a structured report that a human can review, edit, and send. It doesn't replace support engineers; it gives them a head start so they can focus on the hard problems.
Without the agent With the agent
Ticket arrives Ticket arrives
│ │
▼ ▼
TSE reads ticket (2 min) Agent triages (5-10 min)
│ │
▼ ▼
Check PostHog data (5 min) TSE reviews report (2 min)
│ │
▼ ▼
Search GitHub issues (5 min) Edit + send response (3 min)
│ │
▼ ▼
Read docs (5 min) Done ✓ (~10-15 min total)
│
▼
Draft response (5 min)
│
▼
Done ✓ (~20-40 min total)
Here's the agent triaging an actual support ticket — a customer reporting that posthog.displaySurvey() fails to show a survey on subsequent calls after closing it.
"When I use
posthog.displaySurvey()to programmatically show a survey, it displays correctly the first time but fails to appear on subsequent attempts after being closed. My survey is API-mode. I calldisplaySurvey()withignoreConditions: trueandignoreDelay: true. After closing and callingdisplaySurvey()again with the same survey ID, nothing appears. Manually removing the DOM element (.PostHogSurvey-{surveyId}) before callingdisplaySurvey()fixes it. posthog-js latest. Chrome and Safari."
Phase 0 — Intake (5s): Parsed the ticket, identified product area (Surveys), extracted key details (API mode, displaySurvey, DOM cleanup workaround), routed to posthog-survey-diagnosis skill.
Phase 1 — Parallel Research (30s): Fired these simultaneously:
| Query | Source | Finding |
|---|---|---|
gh search issues --repo PostHog/posthog-js "displaySurvey" |
GitHub CLI | Exact match: Issue #2586 — closed Nov 2025 |
gh issue view 2586 |
GitHub CLI | Root cause confirmed by @adboio: cleanup only triggered for Popover type, not API type displayed as popover |
gh pr view 2595 |
GitHub CLI | Fix PR merged Nov 19 2025 — changed cleanup to check displayType not surveyType |
npm view posthog-js version |
npm | Current latest: v1.363.5 — fix available since v1.296.0 |
docs-search("displaySurvey API mode") |
PostHog MCP | Confirmed displaySurvey is the recommended method |
Web search: displaySurvey not showing second time |
WebSearch | Additional confirmation across multiple reports |
Phase 2 — Triage Report:
Root Cause: Known fixed bug (posthog-js#2586)
Confidence: ✅ Confirmed by data
Fix: Upgrade to posthog-js >= 1.296.0
Workaround: Remove DOM element before re-calling (customer already found this)
Escalation: Not needed — unless customer confirms they're on >= 1.296.0
Thanks for the detailed report and reproduction steps — really helpful.
This is a known bug fixed in posthog-js v1.296.0 (November 2025). The close handler only cleaned up DOM elements for Popover-type surveys, not API-type surveys rendered as popovers. Fix: PR #2595.
To resolve:
npm install posthog-js@latest(current latest is v1.363.5).Could you confirm your exact version? Your workaround of removing the DOM element is safe to keep using until you upgrade.
Time: 2-5 minutes end-to-end versus 20-40 minutes of manual investigation.
# 1. Clone and enter the project
git clone https://github.com/mongo-ai/posthog-triage-agent.git
cd posthog-triage-agent
# 2. Create .env from the tracked template
cp .env.example .env
# 3. Fill in .env (or export these in your shell)
export POSTHOG_API_KEY="phx_..." # Personal API key (Settings → Personal API Keys → MCP preset)
export POSTHOG_ORG_ID="..." # Settings → Organization → General
export POSTHOG_PROJECT_ID="..." # Visible in URL: posthog.com/project/<ID>
export GITHUB_PAT="ghp_..." # GitHub PAT with repo read access
# 4. Install Playwright for browser inspection
npx playwright install chromium
# 5. Run the setup smoke test
./test-setup.sh
# 6. Open Claude Code
claude
# Then invoke the agent with a ticket:
# /posthog-support-agent <paste ticket text here>
#
# Demo tickets test the workflow (intake, search, report structure).
# They reference synthetic entities that won't exist in your project —
# "not found" results are expected. For a full demo, use a real ticket.The agent runs a three-phase workflow. Code gathers facts; the model connects them.
| Phase | What happens | Time |
|---|---|---|
| Phase 0: Intake | Parse the ticket — extract distinct_id, product area, flag keys, timeframe, urgency | ~5s |
| Phase 1: Parallel Research | Fire 7+ queries simultaneously across PostHog MCP, GitHub, DeepWiki, Context7, and web search | ~30s |
| Phase 2: Synthesis | Produce an evidence-graded triage report with root cause, known-bug match, and draft customer response | ~15s |
Total: 2-5 minutes per ticket, versus 20-40 minutes of manual investigation.
The ticket intake skill parses messy customer messages into structured investigation inputs:
- Identifiers: distinct_id, person ID, event names, flag keys, error fingerprints
- Context: product area, timeframe, blast radius, urgency level
- Routing: which SDK repo is relevant, which diagnosis skill to invoke
Parallelization is mandatory. Every tool call that could run simultaneously does:
| Track | Source | What it checks |
|---|---|---|
| PostHog MCP | Customer's pinned project | Person properties, events, flag definitions, error details, survey config, pipeline status |
| PostHog Docs | docs-search |
Current feature config, known limitations, setup requirements |
| DeepWiki | Source code analysis | PostHog codebase architecture for the relevant component |
| Context7 | SDK documentation | Version-specific changes, migration guides, API surface |
| GitHub | Issue search (2+ variants) | Known bugs, PRs, fix status — searches BEFORE blaming the customer |
| Web Search | Broader symptom search | Community reports, Stack Overflow, related issues |
The output is always structured:
- Evidence table — every claim cites a real tool/query
- Root cause assessment — with honest confidence grading
- Known bug check — GitHub search with 2+ query variants
- Draft customer response — empathetic, specific, actionable
- Escalation decision — with engineering-ready context if needed
Every conclusion is tagged with one of three levels. The agent downgrades confidence rather than overstates certainty.
| Level | Meaning | When |
|---|---|---|
| Confirmed by data | Direct evidence proves the cause | Flag definition shows wrong condition; events query shows zero ingestion |
| Likely based on pattern match | Evidence strongly suggests but doesn't prove | Symptoms match a known GitHub issue; docs say feature requires X |
| Suspected, needs human verification | Plausible hypothesis but insufficient evidence | Can't inspect customer's code; multiple possible causes |
If confidence is only "suspected," the agent produces an engineering-ready escalation packet instead of pretending certainty.
The tracked repo currently includes 18 skills. The agent routes across domain diagnosis skills plus supporting workflow skills.
| Skill | Covers | Key Checks |
|---|---|---|
| Ticket Intake | Normalize raw tickets | Issue type, product area, identifiers, blast radius, first investigation path |
| Feature Flag Diagnosis | Flags, experiments, A/B tests | Flag definition, evaluation events, identify timing, property race conditions, bootstrap flicker, local evaluation latency |
| Session Replay Diagnosis | Web + mobile replay | SDK version, capture vs playback, CSP/CORS, config flags, mobile OOM/throttle, masking issues |
| Event Diagnosis | Missing, delayed, duplicated, filtered events | Event query, SDK version/region, ad-blocker/proxy detection, ingestion warnings, pipeline filters, UUID dedup |
| Error Tracking Diagnosis | Exceptions, stack traces, source maps | Exception events, source map upload verification, release context, symbolication |
| Survey Diagnosis | Display, targeting, response collection | Survey status, targeting conditions, URL rules, repeat suppression, surveys_opt_in config, API-mode integration |
| Pipeline Diagnosis | CDP functions, destinations | Function status, execution logs, auth/rate-limit errors, trigger/filter conditions, credential/payload verification |
| Data Warehouse Diagnosis | Warehouse syncs and joins | Sync failures, schema mismatches, stale data, query errors, source-specific setup |
| Web Analytics Diagnosis | Pageviews, bounce rate, attribution | $pageview/$pageleave, SPA routing, UTMs, reverse proxy gaps, GA discrepancies |
| Billing Diagnosis | Plan/entitlement/account issues | Plan tier, feature availability, billing bugs, access vs role confusion |
| Self-Hosted Diagnosis | Docker/Helm/Kubernetes issues | Infra boundaries, ClickHouse/Postgres health, reverse proxy, migrations |
| Site Inspector | Browser evidence (escalation only) | PostHog SDK presence, config extraction, network resources, CSP violations — requires public URL and browser-relevant issue |
| Skill | Purpose |
|---|---|
| Triage Report | Final synthesis with evidence and confidence grading |
| Response Drafting | Customer-facing replies matched to severity and certainty |
| Escalation | Engineering-ready escalation packet |
| Ship Fix | PR-ready docs or small-fix brief |
| KB Article | Knowledge-base or docs draft from a resolved issue |
| Slack Triage | Slack formatting and posting workflow |
Support Ticket
│
▼
┌─────────────────────┐
│ Ticket Intake │ Parse → extract identifiers → route to skill
│ (Phase 0, ~5s) │
└─────────┬───────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PARALLEL RESEARCH BLAST │
│ (Phase 1, ~30s) │
│ │
│ ┌──────────────┐ ┌───────────┐ ┌──────────────────┐ │
│ │ Domain Skill │ │ PostHog │ │ GitHub │ │
│ │ (specialized)│ │ MCP │ │ (MCP + gh CLI) │ │
│ │ │──│ EU / US │ │ 10 SDK repos │ │
│ │ Flag diag │ │ │ │ PostHog/posthog │ │
│ │ Replay diag │ ├───────────┤ └──────────────────┘ │
│ │ Event diag │ │ DeepWiki │ ┌──────────────────┐ │
│ │ Error diag │ │ MCP │ │ Playwright │ │
│ │ Survey diag │ ├───────────┤ │ (escalation only)│ │
│ │ Pipeline diag│ │ Context7 │ └──────────────────┘ │
│ │ Site inspect │ │ MCP │ ┌──────────────────┐ │
│ └──────────────┘ └───────────┘ │ Web Search │ │
│ └──────────────────┘ │
└─────────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ TRIAGE REPORT │
│ (Phase 2, ~15s) │
│ │
│ Evidence Table → Root Cause → Known Bugs → Draft Reply │
│ │
│ ┌────────────────┐ ┌──────────┐ ┌───────────────────┐ │
│ │ ● Confirmed │ │ ● Likely │ │ ● Suspected │ │
│ └────────────────┘ └──────────┘ └───────────────────┘ │
└─────────────────────────────────────────────────────────┘
The agent investigates issues across PostHog's multi-layer ingestion pipeline. This reference shows how data flows from SDKs through Kafka to ClickHouse/Postgres, plus the diagnostic strategies the agent applies at each layer.
Credit: Generated with NotebookLM
The workflow is intended to be read-only. The agent prompt forbids mutations, and .claude/settings.json explicitly denies:
rm— no file deletiongit push— no repo mutationscurl -X POST/PUT/PATCH/DELETE— no write API calls
Important nuance: this is a workflow-level guarantee, not a formally sandboxed proof of non-mutation. The shell allowlist is still broader than a hard read-only sandbox because local diagnostics rely on gh, node, npx, and Playwright. Treat the repo as operator-reviewed automation, not an unbreakable enforcement boundary.
| MCP Server | Purpose | What it provides |
|---|---|---|
| PostHog (EU + US) | Primary evidence source | Persons, events, flags, errors, surveys, logs, HogQL, CDP functions, docs search |
| DeepWiki | Source code analysis | Architecture-level understanding of PostHog codebase components |
| Context7 | Customer stack docs | Framework docs (Next.js, React, Django, etc.) for diagnosing integration-boundary issues |
| GitHub | Known-issue search (Agent SDK) | Issue tracking, PR status, release notes — required for headless Agent SDK deployments where no CLI is available |
| CLI Tool | Purpose | Why CLI over MCP |
|---|---|---|
gh (GitHub CLI) |
Local GitHub fallback and deep inspection | Useful for fast local inspection, --json output, and full comment threads. The tracked agent prompt prefers GitHub MCP first, then falls back to gh when needed. |
| Playwright | Browser inspection (escalation only) | SDK presence, config extraction, CSP/CORS detection — only for browser-relevant issues with a public URL |
The current prototype runs interactively in Claude Code. The production vision is a pipeline that complements existing support tooling (Zendesk, Pylon, HogHero) by pre-triaging tickets so engineers start with evidence, not a blank page:
Zendesk MCP PostHog MCP Slack MCP
(ticket arrives) → (investigation) → (report delivered)
│
▼
TSE reviews + sends
(5 min, not 20)
| Integration | MCP Server | Status | What it provides |
|---|---|---|---|
| Zendesk | zendesk-mcp-server |
External option only | Pull tickets, read comments/tags/priority, search ticket history |
| Slack | Official Slack MCP | Configured in .mcp.json and permitted in settings.json — requires OAuth login on first use |
Post triage reports to channels, thread follow-ups, notify on-call |
| PostHog | mcp.posthog.com |
In use | Project data, docs search, event queries, flag definitions |
Target: "Triage Zendesk ticket #48291 and post the report to #support-escalations"
The path from prototype to production is the Claude Agent SDK — the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript.
What it unlocks:
- Programmatic invocation — trigger triage from an API call, not a chat window
- Hooks —
PreToolUse/PostToolUsefor audit logging, cost tracking, and guardrails - Subagents — spawn specialized diagnosis agents per product area in parallel
- Sessions — resume investigations across multiple exchanges with full context
# Future: headless triage agent
from claude_agent_sdk import query, ClaudeAgentOptions
async for message in query(
prompt=f"Triage this support ticket: {ticket_text}",
options=ClaudeAgentOptions(
allowed_tools=["Read", "Glob", "Grep", "WebSearch", "WebFetch", "Agent"],
mcp_servers={
"posthog": {"type": "http", "url": "https://mcp.posthog.com/mcp", ...},
"github": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]},
},
),
):
if hasattr(message, "result"):
post_to_slack(message.result)Use PostHog itself to observe the agent. PostHog LLM Analytics tracks generations, traces, costs, and latency — the same product the agent investigates, now monitoring the agent.
response = client.messages.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": ticket_text}],
posthog_distinct_id="support-agent-v1",
posthog_trace_id=f"triage-{ticket_id}",
posthog_properties={
"product_area": "session_replay",
"priority": "P1",
"$ai_prompt_name": "posthog-triage-agent",
},
)A feedback loop: the agent triages PostHog issues while PostHog monitors the agent's performance.
This is a working prototype, not a production system. Deploying it for real support requires answering several questions that are intentionally left open.
| Question | Status | Notes |
|---|---|---|
| Can an AI agent directly query customer project data via MCP? | Needs investigation | The prototype currently pins to a single org/project. In production, would the agent query customer accounts directly, or only PostHog's internal instance? This is a policy decision, not a technical one. |
| What customer data flows through the LLM? | Needs review | Event properties, person properties, and distinct IDs pass through Claude during triage. A compliance review is needed to determine what data can be sent to third-party LLM providers, and whether PII scrubbing or data masking is required before queries. |
| EU data residency | Partially addressed | PostHog MCP supports EU/US region pinning. But the LLM provider (Anthropic) processes data — the compliance implications of routing EU customer data through a US-based LLM need evaluation. |
| Component | Status | What's needed |
|---|---|---|
| PostHog MCP | Working | Core evidence source — stable and in use |
| GitHub search | Working | gh CLI + MCP fallback — reliable |
| Zendesk MCP | Not configured in this repo | The community Zendesk MCP exists but hasn't been evaluated here for reliability, auth model, or feature completeness. |
| Slack MCP | Partially wired | .mcp.json and a Slack formatting skill exist, but end-to-end posting has not been validated in the tracked repo. |
| Claude Agent SDK | Not started | The headless deployment path. The prototype validates the workflow; migrating to the Agent SDK is the production path. |
- Cost per triage: Each triage involves multiple LLM calls + MCP queries. The per-ticket cost hasn't been measured. PostHog's own LLM Analytics could track this.
- Accuracy measurement: Initial blind tests score 97/100 across 5 real issues (see
evaluations.md), but a production accuracy loop comparing agent output to TSE-written responses at scale would be needed. - Internal vs customer data: The safest starting point may be pointing the agent at PostHog's internal dogfood instance rather than customer accounts — triaging based on what the support team can see, not direct customer data access. This sidesteps most compliance questions while still providing value.
- Hallucination risk: The agent is designed to cite sources and grade confidence, but LLMs can still present plausible-sounding information that isn't grounded in evidence. Human review of every triage report remains essential.
.
├── .claude/
│ ├── agents/
│ │ └── posthog-support-agent.md # Agent brain — workflow, rules, speed requirements
│ ├── skills/
│ │ ├── posthog-ticket-intake/ # Phase 0: parse tickets into structured inputs
│ │ ├── posthog-feature-flag-diagnosis/
│ │ ├── posthog-session-replay-diagnosis/
│ │ ├── posthog-event-diagnosis/
│ │ ├── posthog-error-tracking-diagnosis/
│ │ ├── posthog-survey-diagnosis/
│ │ ├── posthog-pipeline-diagnosis/
│ │ ├── posthog-data-warehouse-diagnosis/
│ │ ├── posthog-web-analytics-diagnosis/
│ │ ├── posthog-billing-diagnosis/
│ │ ├── posthog-selfhosted-diagnosis/
│ │ ├── posthog-site-inspector/
│ │ ├── posthog-triage-report/ # Phase 2: synthesize evidence into report
│ │ ├── posthog-response-drafting/
│ │ ├── posthog-escalation/
│ │ ├── posthog-ship-fix/
│ │ ├── posthog-kb-article/
│ │ └── posthog-slack-triage/
│ └── settings.json # Read-only permissions + deny rules
├── .mcp.json # MCP server connections (PostHog EU/US, DeepWiki, Context7, GitHub, Slack)
├── .env.example # Copy to .env before running the smoke test
├── CLAUDE.md # Workflow instructions
├── demo-tickets.md # (local-only, gitignored) Synthetic tickets for demos
├── evaluations.md # Blind test results + live issue diagnoses
├── test-setup.sh # Smoke test for all connections
└── docs/
├── architecture.svg
└── posthog-architecture-blueprint.jpeg
Inspired by PostHog's own lessons: What we wish we knew before building AI agents.
| Decision | Why |
|---|---|
| Read-only only | Support agents should never mutate customer state — one misconfigured flag could break production |
| Least privilege | MCP access is scoped to one org/project with feature filtering, not blanket admin access |
| Live docs over cached knowledge | SDK behavior changes every release — the agent fetches current docs before making claims |
| Known-bug search before user blame | Searching GitHub with 2+ query variants before concluding "misconfiguration" prevents false accusations |
| Confidence grading | Three explicit levels instead of fake certainty — support teams need to know what's proven vs suspected |
| Parallel-first architecture | 7+ queries fire simultaneously — a human doing this sequentially takes 15 minutes; the agent takes 30 seconds |
| Specialized skills over one big prompt | Multiple diagnosis and workflow skills with specific checks per product area, not one generic "investigate everything" instruction |
| Escalation packets, not guesses | When evidence is insufficient, the agent produces an engineering-ready escalation packet instead of pretending to know |
