Skip to content

mongo-ai/posthog-triage-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PostHog Support Triage Agent

TL;DR: Paste a support ticket, get a structured triage report in 2-5 minutes. The agent pulls evidence from PostHog's own data, live docs, and GitHub issues in parallel — then produces a confidence-graded root cause assessment with a ready-to-send customer response. No guessing, no stale knowledge, and a read-only investigation workflow by design.

Watch it diagnose a real unanswered issue


The Vision

Every support engineer gets a head start on every ticket.

The bottleneck in support isn't answering tickets — it's the 15-30 minutes of investigation before an answer is even possible. Checking person properties, querying events, searching GitHub for known bugs, cross-referencing docs, figuring out which SDK version matters — all before a single word is written back to the customer.

This agent runs that same investigation in parallel — the same MCP queries, the same GitHub searches, the same docs lookups — and produces a structured report that a human can review, edit, and send. It doesn't replace support engineers; it gives them a head start so they can focus on the hard problems.

         Without the agent                    With the agent

  Ticket arrives                        Ticket arrives
       │                                     │
       ▼                                     ▼
  TSE reads ticket (2 min)              Agent triages (5-10 min)
       │                                     │
       ▼                                     ▼
  Check PostHog data (5 min)            TSE reviews report (2 min)
       │                                     │
       ▼                                     ▼
  Search GitHub issues (5 min)          Edit + send response (3 min)
       │                                     │
       ▼                                     ▼
  Read docs (5 min)                     Done ✓  (~10-15 min total)
       │
       ▼
  Draft response (5 min)
       │
       ▼
  Done ✓  (~20-40 min total)

Architecture at a Glance

PostHog Support Triage Agent Architecture


See It Work: A Real Triage

Here's the agent triaging an actual support ticket — a customer reporting that posthog.displaySurvey() fails to show a survey on subsequent calls after closing it.

The Ticket

"When I use posthog.displaySurvey() to programmatically show a survey, it displays correctly the first time but fails to appear on subsequent attempts after being closed. My survey is API-mode. I call displaySurvey() with ignoreConditions: true and ignoreDelay: true. After closing and calling displaySurvey() again with the same survey ID, nothing appears. Manually removing the DOM element (.PostHogSurvey-{surveyId}) before calling displaySurvey() fixes it. posthog-js latest. Chrome and Safari."

What the Agent Did (2-5 minutes)

Phase 0 — Intake (5s): Parsed the ticket, identified product area (Surveys), extracted key details (API mode, displaySurvey, DOM cleanup workaround), routed to posthog-survey-diagnosis skill.

Phase 1 — Parallel Research (30s): Fired these simultaneously:

Query Source Finding
gh search issues --repo PostHog/posthog-js "displaySurvey" GitHub CLI Exact match: Issue #2586 — closed Nov 2025
gh issue view 2586 GitHub CLI Root cause confirmed by @adboio: cleanup only triggered for Popover type, not API type displayed as popover
gh pr view 2595 GitHub CLI Fix PR merged Nov 19 2025 — changed cleanup to check displayType not surveyType
npm view posthog-js version npm Current latest: v1.363.5 — fix available since v1.296.0
docs-search("displaySurvey API mode") PostHog MCP Confirmed displaySurvey is the recommended method
Web search: displaySurvey not showing second time WebSearch Additional confirmation across multiple reports

Phase 2 — Triage Report:

Root Cause:   Known fixed bug (posthog-js#2586)
Confidence:   ✅ Confirmed by data
Fix:          Upgrade to posthog-js >= 1.296.0
Workaround:   Remove DOM element before re-calling (customer already found this)
Escalation:   Not needed — unless customer confirms they're on >= 1.296.0

The Draft Customer Response (ready to edit and send)

Thanks for the detailed report and reproduction steps — really helpful.

This is a known bug fixed in posthog-js v1.296.0 (November 2025). The close handler only cleaned up DOM elements for Popover-type surveys, not API-type surveys rendered as popovers. Fix: PR #2595.

To resolve: npm install posthog-js@latest (current latest is v1.363.5).

Could you confirm your exact version? Your workaround of removing the DOM element is safe to keep using until you upgrade.

Time: 2-5 minutes end-to-end versus 20-40 minutes of manual investigation.


Quick Start

# 1. Clone and enter the project
git clone https://github.com/mongo-ai/posthog-triage-agent.git
cd posthog-triage-agent

# 2. Create .env from the tracked template
cp .env.example .env

# 3. Fill in .env (or export these in your shell)
export POSTHOG_API_KEY="phx_..."      # Personal API key (Settings → Personal API Keys → MCP preset)
export POSTHOG_ORG_ID="..."           # Settings → Organization → General
export POSTHOG_PROJECT_ID="..."       # Visible in URL: posthog.com/project/<ID>
export GITHUB_PAT="ghp_..."           # GitHub PAT with repo read access

# 4. Install Playwright for browser inspection
npx playwright install chromium

# 5. Run the setup smoke test
./test-setup.sh

# 6. Open Claude Code
claude

# Then invoke the agent with a ticket:
#   /posthog-support-agent <paste ticket text here>
#
# Demo tickets test the workflow (intake, search, report structure).
# They reference synthetic entities that won't exist in your project —
# "not found" results are expected. For a full demo, use a real ticket.

How It Works

The agent runs a three-phase workflow. Code gathers facts; the model connects them.

Phase What happens Time
Phase 0: Intake Parse the ticket — extract distinct_id, product area, flag keys, timeframe, urgency ~5s
Phase 1: Parallel Research Fire 7+ queries simultaneously across PostHog MCP, GitHub, DeepWiki, Context7, and web search ~30s
Phase 2: Synthesis Produce an evidence-graded triage report with root cause, known-bug match, and draft customer response ~15s

Total: 2-5 minutes per ticket, versus 20-40 minutes of manual investigation.

Phase 0: Intake

The ticket intake skill parses messy customer messages into structured investigation inputs:

  • Identifiers: distinct_id, person ID, event names, flag keys, error fingerprints
  • Context: product area, timeframe, blast radius, urgency level
  • Routing: which SDK repo is relevant, which diagnosis skill to invoke

Phase 1: Parallel Research Blast

Parallelization is mandatory. Every tool call that could run simultaneously does:

Track Source What it checks
PostHog MCP Customer's pinned project Person properties, events, flag definitions, error details, survey config, pipeline status
PostHog Docs docs-search Current feature config, known limitations, setup requirements
DeepWiki Source code analysis PostHog codebase architecture for the relevant component
Context7 SDK documentation Version-specific changes, migration guides, API surface
GitHub Issue search (2+ variants) Known bugs, PRs, fix status — searches BEFORE blaming the customer
Web Search Broader symptom search Community reports, Stack Overflow, related issues

Phase 2: Triage Report

The output is always structured:

  • Evidence table — every claim cites a real tool/query
  • Root cause assessment — with honest confidence grading
  • Known bug check — GitHub search with 2+ query variants
  • Draft customer response — empathetic, specific, actionable
  • Escalation decision — with engineering-ready context if needed

Confidence Grading

Every conclusion is tagged with one of three levels. The agent downgrades confidence rather than overstates certainty.

Level Meaning When
Confirmed by data Direct evidence proves the cause Flag definition shows wrong condition; events query shows zero ingestion
Likely based on pattern match Evidence strongly suggests but doesn't prove Symptoms match a known GitHub issue; docs say feature requires X
Suspected, needs human verification Plausible hypothesis but insufficient evidence Can't inspect customer's code; multiple possible causes

If confidence is only "suspected," the agent produces an engineering-ready escalation packet instead of pretending certainty.


Tracked Skills

The tracked repo currently includes 18 skills. The agent routes across domain diagnosis skills plus supporting workflow skills.

Diagnosis skills

Skill Covers Key Checks
Ticket Intake Normalize raw tickets Issue type, product area, identifiers, blast radius, first investigation path
Feature Flag Diagnosis Flags, experiments, A/B tests Flag definition, evaluation events, identify timing, property race conditions, bootstrap flicker, local evaluation latency
Session Replay Diagnosis Web + mobile replay SDK version, capture vs playback, CSP/CORS, config flags, mobile OOM/throttle, masking issues
Event Diagnosis Missing, delayed, duplicated, filtered events Event query, SDK version/region, ad-blocker/proxy detection, ingestion warnings, pipeline filters, UUID dedup
Error Tracking Diagnosis Exceptions, stack traces, source maps Exception events, source map upload verification, release context, symbolication
Survey Diagnosis Display, targeting, response collection Survey status, targeting conditions, URL rules, repeat suppression, surveys_opt_in config, API-mode integration
Pipeline Diagnosis CDP functions, destinations Function status, execution logs, auth/rate-limit errors, trigger/filter conditions, credential/payload verification
Data Warehouse Diagnosis Warehouse syncs and joins Sync failures, schema mismatches, stale data, query errors, source-specific setup
Web Analytics Diagnosis Pageviews, bounce rate, attribution $pageview/$pageleave, SPA routing, UTMs, reverse proxy gaps, GA discrepancies
Billing Diagnosis Plan/entitlement/account issues Plan tier, feature availability, billing bugs, access vs role confusion
Self-Hosted Diagnosis Docker/Helm/Kubernetes issues Infra boundaries, ClickHouse/Postgres health, reverse proxy, migrations
Site Inspector Browser evidence (escalation only) PostHog SDK presence, config extraction, network resources, CSP violations — requires public URL and browser-relevant issue

Workflow skills

Skill Purpose
Triage Report Final synthesis with evidence and confidence grading
Response Drafting Customer-facing replies matched to severity and certainty
Escalation Engineering-ready escalation packet
Ship Fix PR-ready docs or small-fix brief
KB Article Knowledge-base or docs draft from a resolved issue
Slack Triage Slack formatting and posting workflow

Architecture

Support Ticket
    │
    ▼
┌─────────────────────┐
│   Ticket Intake      │  Parse → extract identifiers → route to skill
│   (Phase 0, ~5s)     │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────┐
│                PARALLEL RESEARCH BLAST                    │
│                (Phase 1, ~30s)                            │
│                                                          │
│  ┌──────────────┐  ┌───────────┐  ┌──────────────────┐  │
│  │ Domain Skill │  │ PostHog   │  │ GitHub           │  │
│  │ (specialized)│  │ MCP       │  │ (MCP + gh CLI)   │  │
│  │              │──│ EU / US   │  │ 10 SDK repos     │  │
│  │ Flag diag    │  │           │  │ PostHog/posthog  │  │
│  │ Replay diag  │  ├───────────┤  └──────────────────┘  │
│  │ Event diag   │  │ DeepWiki  │  ┌──────────────────┐  │
│  │ Error diag   │  │ MCP       │  │ Playwright       │  │
│  │ Survey diag  │  ├───────────┤  │ (escalation only)│  │
│  │ Pipeline diag│  │ Context7  │  └──────────────────┘  │
│  │ Site inspect │  │ MCP       │  ┌──────────────────┐  │
│  └──────────────┘  └───────────┘  │ Web Search       │  │
│                                    └──────────────────┘  │
└─────────────────────────┬───────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│                    TRIAGE REPORT                         │
│                    (Phase 2, ~15s)                        │
│                                                          │
│  Evidence Table → Root Cause → Known Bugs → Draft Reply  │
│                                                          │
│  ┌────────────────┐ ┌──────────┐ ┌───────────────────┐  │
│  │ ● Confirmed    │ │ ● Likely │ │ ● Suspected       │  │
│  └────────────────┘ └──────────┘ └───────────────────┘  │
└─────────────────────────────────────────────────────────┘

PostHog Platform Context

The agent investigates issues across PostHog's multi-layer ingestion pipeline. This reference shows how data flows from SDKs through Kafka to ClickHouse/Postgres, plus the diagnostic strategies the agent applies at each layer.

PostHog TSE: Technical Architecture & Diagnostic Blueprint

Credit: Generated with NotebookLM

Security Model

The workflow is intended to be read-only. The agent prompt forbids mutations, and .claude/settings.json explicitly denies:

  • rm — no file deletion
  • git push — no repo mutations
  • curl -X POST/PUT/PATCH/DELETE — no write API calls

Important nuance: this is a workflow-level guarantee, not a formally sandboxed proof of non-mutation. The shell allowlist is still broader than a hard read-only sandbox because local diagnostics rely on gh, node, npx, and Playwright. Treat the repo as operator-reviewed automation, not an unbreakable enforcement boundary.


Integrations

MCP Servers

MCP Server Purpose What it provides
PostHog (EU + US) Primary evidence source Persons, events, flags, errors, surveys, logs, HogQL, CDP functions, docs search
DeepWiki Source code analysis Architecture-level understanding of PostHog codebase components
Context7 Customer stack docs Framework docs (Next.js, React, Django, etc.) for diagnosing integration-boundary issues
GitHub Known-issue search (Agent SDK) Issue tracking, PR status, release notes — required for headless Agent SDK deployments where no CLI is available

CLI Tools

CLI Tool Purpose Why CLI over MCP
gh (GitHub CLI) Local GitHub fallback and deep inspection Useful for fast local inspection, --json output, and full comment threads. The tracked agent prompt prefers GitHub MCP first, then falls back to gh when needed.
Playwright Browser inspection (escalation only) SDK presence, config extraction, CSP/CORS detection — only for browser-relevant issues with a public URL

Roadmap: From Prototype to Pipeline

North Star: Pre-Triage at Scale

The current prototype runs interactively in Claude Code. The production vision is a pipeline that complements existing support tooling (Zendesk, Pylon, HogHero) by pre-triaging tickets so engineers start with evidence, not a blank page:

Zendesk MCP              PostHog MCP              Slack MCP
(ticket arrives)    →    (investigation)     →    (report delivered)
                                                       │
                                                       ▼
                                              TSE reviews + sends
                                              (5 min, not 20)
Integration MCP Server Status What it provides
Zendesk zendesk-mcp-server External option only Pull tickets, read comments/tags/priority, search ticket history
Slack Official Slack MCP Configured in .mcp.json and permitted in settings.json — requires OAuth login on first use Post triage reports to channels, thread follow-ups, notify on-call
PostHog mcp.posthog.com In use Project data, docs search, event queries, flag definitions

Target: "Triage Zendesk ticket #48291 and post the report to #support-escalations"

Claude Agent SDK

The path from prototype to production is the Claude Agent SDK — the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript.

What it unlocks:

  • Programmatic invocation — trigger triage from an API call, not a chat window
  • HooksPreToolUse/PostToolUse for audit logging, cost tracking, and guardrails
  • Subagents — spawn specialized diagnosis agents per product area in parallel
  • Sessions — resume investigations across multiple exchanges with full context
# Future: headless triage agent
from claude_agent_sdk import query, ClaudeAgentOptions

async for message in query(
    prompt=f"Triage this support ticket: {ticket_text}",
    options=ClaudeAgentOptions(
        allowed_tools=["Read", "Glob", "Grep", "WebSearch", "WebFetch", "Agent"],
        mcp_servers={
            "posthog": {"type": "http", "url": "https://mcp.posthog.com/mcp", ...},
            "github": {"command": "npx", "args": ["-y", "@modelcontextprotocol/server-github"]},
        },
    ),
):
    if hasattr(message, "result"):
        post_to_slack(message.result)

Self-Monitoring with PostHog

Use PostHog itself to observe the agent. PostHog LLM Analytics tracks generations, traces, costs, and latency — the same product the agent investigates, now monitoring the agent.

response = client.messages.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": ticket_text}],
    posthog_distinct_id="support-agent-v1",
    posthog_trace_id=f"triage-{ticket_id}",
    posthog_properties={
        "product_area": "session_replay",
        "priority": "P1",
        "$ai_prompt_name": "posthog-triage-agent",
    },
)

A feedback loop: the agent triages PostHog issues while PostHog monitors the agent's performance.


Limitations & Open Questions

This is a working prototype, not a production system. Deploying it for real support requires answering several questions that are intentionally left open.

Data Access & Compliance

Question Status Notes
Can an AI agent directly query customer project data via MCP? Needs investigation The prototype currently pins to a single org/project. In production, would the agent query customer accounts directly, or only PostHog's internal instance? This is a policy decision, not a technical one.
What customer data flows through the LLM? Needs review Event properties, person properties, and distinct IDs pass through Claude during triage. A compliance review is needed to determine what data can be sent to third-party LLM providers, and whether PII scrubbing or data masking is required before queries.
EU data residency Partially addressed PostHog MCP supports EU/US region pinning. But the LLM provider (Anthropic) processes data — the compliance implications of routing EU customer data through a US-based LLM need evaluation.

Integration Maturity

Component Status What's needed
PostHog MCP Working Core evidence source — stable and in use
GitHub search Working gh CLI + MCP fallback — reliable
Zendesk MCP Not configured in this repo The community Zendesk MCP exists but hasn't been evaluated here for reliability, auth model, or feature completeness.
Slack MCP Partially wired .mcp.json and a Slack formatting skill exist, but end-to-end posting has not been validated in the tracked repo.
Claude Agent SDK Not started The headless deployment path. The prototype validates the workflow; migrating to the Agent SDK is the production path.

Operational Questions

  • Cost per triage: Each triage involves multiple LLM calls + MCP queries. The per-ticket cost hasn't been measured. PostHog's own LLM Analytics could track this.
  • Accuracy measurement: Initial blind tests score 97/100 across 5 real issues (see evaluations.md), but a production accuracy loop comparing agent output to TSE-written responses at scale would be needed.
  • Internal vs customer data: The safest starting point may be pointing the agent at PostHog's internal dogfood instance rather than customer accounts — triaging based on what the support team can see, not direct customer data access. This sidesteps most compliance questions while still providing value.
  • Hallucination risk: The agent is designed to cite sources and grade confidence, but LLMs can still present plausible-sounding information that isn't grounded in evidence. Human review of every triage report remains essential.

Project Structure

.
├── .claude/
│   ├── agents/
│   │   └── posthog-support-agent.md    # Agent brain — workflow, rules, speed requirements
│   ├── skills/
│   │   ├── posthog-ticket-intake/      # Phase 0: parse tickets into structured inputs
│   │   ├── posthog-feature-flag-diagnosis/
│   │   ├── posthog-session-replay-diagnosis/
│   │   ├── posthog-event-diagnosis/
│   │   ├── posthog-error-tracking-diagnosis/
│   │   ├── posthog-survey-diagnosis/
│   │   ├── posthog-pipeline-diagnosis/
│   │   ├── posthog-data-warehouse-diagnosis/
│   │   ├── posthog-web-analytics-diagnosis/
│   │   ├── posthog-billing-diagnosis/
│   │   ├── posthog-selfhosted-diagnosis/
│   │   ├── posthog-site-inspector/
│   │   ├── posthog-triage-report/      # Phase 2: synthesize evidence into report
│   │   ├── posthog-response-drafting/
│   │   ├── posthog-escalation/
│   │   ├── posthog-ship-fix/
│   │   ├── posthog-kb-article/
│   │   └── posthog-slack-triage/
│   └── settings.json                   # Read-only permissions + deny rules
├── .mcp.json                           # MCP server connections (PostHog EU/US, DeepWiki, Context7, GitHub, Slack)
├── .env.example                        # Copy to .env before running the smoke test
├── CLAUDE.md                           # Workflow instructions
├── demo-tickets.md                     # (local-only, gitignored) Synthetic tickets for demos
├── evaluations.md                      # Blind test results + live issue diagnoses
├── test-setup.sh                       # Smoke test for all connections
└── docs/
    ├── architecture.svg
    └── posthog-architecture-blueprint.jpeg

Design Decisions

Inspired by PostHog's own lessons: What we wish we knew before building AI agents.

Decision Why
Read-only only Support agents should never mutate customer state — one misconfigured flag could break production
Least privilege MCP access is scoped to one org/project with feature filtering, not blanket admin access
Live docs over cached knowledge SDK behavior changes every release — the agent fetches current docs before making claims
Known-bug search before user blame Searching GitHub with 2+ query variants before concluding "misconfiguration" prevents false accusations
Confidence grading Three explicit levels instead of fake certainty — support teams need to know what's proven vs suspected
Parallel-first architecture 7+ queries fire simultaneously — a human doing this sequentially takes 15 minutes; the agent takes 30 seconds
Specialized skills over one big prompt Multiple diagnosis and workflow skills with specific checks per product area, not one generic "investigate everything" instruction
Escalation packets, not guesses When evidence is insufficient, the agent produces an engineering-ready escalation packet instead of pretending to know

About

Read-only support triage agent that investigates PostHog customer issues using MCP and produces confidence-graded triage reports

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages