-
Notifications
You must be signed in to change notification settings - Fork 4
Troubleshooting
Tip
Before reading further, try /ops:doctor. It runs bin/ops-autofix silently — rebuilds wacli with FTS5, re-registers missing MCPs, fixes common registry issues, and reports what it repaired.
- Telegram
- Slack
- MCP general
- SessionStart hook warnings
- Missing tools detection
- Project registry
- AWS CLI
- Daemon
- Doppler
- Stripe
- RevenueCat
- Shopify
- Voice APIs
- Datadog
- New Relic
- OpenTelemetry
- gog (Gmail CLI)
- Memories
- ops-speedup
- infra-monitor alerts
Symptom: FLOOD_WAIT_X error, or dialogs stop loading mid-request.
Cause: Telegram's MTProto API enforces rate limits on getDialogs. The 8-hour cooldown is standard after rapid consecutive calls.
Fix:
- Wait for the cooldown to expire (up to 8 hours)
- Use
search_messagesinstead oflist_dialogs— different endpoint - Reduce frequency — avoid calling
/ops:inbox telegrammore than once per hour
Note
The bundled Telegram MCP server handles this gracefully — it returns partial results rather than failing hard.
Symptom: AUTH_KEY_INVALID or SESSION_REVOKED.
Cause: The gram.js session string was revoked (logged out elsewhere, or Telegram invalidated old sessions).
Fix:
/ops:setup telegramRegenerates the session via bin/ops-telegram-autolink.mjs and auto-writes to .mcp.json.
Symptom: wacli doctor shows CONNECTED: true but AUTH_STATE: degraded. Messages fail silently.
Cause: WhatsApp Web protocol requires periodic app-state key syncs. After 2–4 weeks of inactivity, local keys drift.
Fix:
wacli auth logout
wacli auth login # Re-scan QR code from phone
wacli doctor # Confirm AUTHENTICATED + CONNECTEDTip
Keep a session active every 2 weeks. The setup wizard runs wacli doctor automatically.
Symptom: ○ wacli (not installed) in setup; WhatsApp sections silently skipped (Rule 3 surfaces the option).
Fix: Manual install — see the wacli repository, then re-run /ops:setup cli.
Symptom: invalid_auth / token_revoked.
Fix:
/ops:setup mcp # re-run OAuth via Claude.ai
# — OR —
/ops:setup channels # re-extract local bot token via PlaywrightSymptom: ratelimited after several searches.
Fix: Switch to the local bot token path — unlimited search, private channels without bot membership:
/ops:setup channels # picks up local token via bin/ops-slack-autolink.mjsAny MCP tool (Sentry / Linear / Slack) can hit quota mid-session. Skills automatically fall back to:
| MCP | Fallback |
|---|---|
| Sentry |
sentry-cli issues list --project <slug> or curl the REST API |
| Linear | curl -X POST https://api.linear.app/graphql -H "Authorization: $LINEAR_API_KEY" |
| Slack | Local bot token (unlimited search) |
Symptom: ✗ lines at session start.
Cause: hooks/hooks.json runs scripts/setup.sh to detect missing config. A ✗ = something unconfigured, not broken.
Fix: Run /ops:setup to address flagged items, or ignore warnings for integrations you don't use.
Skills silently skip sections when a tool is missing (Rule 3 surfaces the choice rather than silently skipping secrets).
Verify what's detected:
~/.claude/plugins/cache/ops-marketplace/ops/<version>/bin/ops-setup-detectLook for false values in the JSON output. Fix: install the missing tool, then /ops:setup cli.
/ops:setup registry
# or manually:
cp scripts/registry.example.json scripts/registry.json/ops:setup env
# Appends export to ~/.zshrc or ~/.bashrc (append-only, never rewrites)Restart Claude Code after.
Fix A — Add clusters to registry:
{
"infra": {
"ecs_clusters": ["myapp-production"],
"platform": "aws"
}
}Fix B — Authenticate:
aws configure
aws sts get-caller-identity # verifySymptom: /ops:go takes >10 seconds; daemon-health.json shows last_briefing >5 min old.
Cause: briefing-pre-warm service crashed or the daemon stopped.
Fix:
/ops:doctor # runs auto-repair first
launchctl list | grep claude-ops
launchctl kickstart -k gui/$UID/com.claude-ops.daemon
tail -f ~/.claude/plugins/data/ops-ops-marketplace/logs/ops-daemon.logSymptom: launchctl load fails with Load failed: 5: Input/output error.
Cause: Stale plist, or binary path changed after a plugin version bump.
Fix:
launchctl unload ~/Library/LaunchAgents/com.claude-ops.daemon.plist 2>/dev/null
rm ~/Library/LaunchAgents/com.claude-ops.daemon.plist
/ops:setup daemon # re-generates plist with current pathsSymptom: PreToolUse hook warns wacli health degraded — last sync 14 min ago.
Cause: Daemon alive but wacli-sync service stuck. Usually a WhatsApp re-auth is needed.
Fix:
wacli auth logout && wacli auth login
launchctl kickstart -k gui/$UID/com.claude-ops.daemonWarning
The hook fails closed — skills warn rather than silently retrying with stale auth. This is intentional (Rule 3 + Privacy).
See also: Daemon Guide.
Symptom: doppler secrets get returns You must be logged in.
Fix:
doppler login # run in a real terminal (opens browser)
# or
/ops:setup dopplerSymptom: Secrets that exist in Doppler return empty.
Fix:
doppler setup --project <project> --config prd --no-interactive
doppler configure debug # confirm the bound project + configTip
Every skill supports doppler:KEY_NAME reference tokens — so secrets never leave the vault. See Privacy and Security.
Cause: Using a test key against live data (or vice-versa).
Fix: /ops:setup revenue will re-scan env + Doppler for STRIPE_SECRET_KEY. Verify prefix:
| Prefix | Meaning |
|---|---|
sk_test_… |
Test mode — no real charges |
sk_live_… |
Live mode — real money |
rk_live_… |
Restricted live key — use this for read-only dashboards |
Symptom: Stripe: not configured even though you have a key.
Fix: Paste via the auto-scan prompt, or export in ~/.zshrc:
export STRIPE_SECRET_KEY="rk_live_…"Re-run /ops:setup revenue.
Caution
Never commit sk_live_… keys to source. tests/test-no-secrets.sh grep-checks pre-commit — if it trips, fix before pushing.
Symptom: /ops:revenue returns RevenueCat: Error (401) or the dashboard widget shows no MRR.
Fix: Verify the project in the RevenueCat dashboard URL: https://app.revenuecat.com/projects/<PROJECT_ID>/.... Then:
/ops:setup revenue # re-enter via the prompt, or
export REVENUECAT_PROJECT_ID="proj_xxxxx"
export REVENUECAT_API_KEY="sk_xxxxx"Note
Secret API keys start with sk_. Public SDK keys (appl_…, goog_…) won't work for dashboards — they're for client apps only.
Cause: Admin token revoked in Shopify admin → Apps → Private apps, or token mis-scoped.
Fix: Create a new admin API token with scopes read_orders,read_products,read_inventory,read_customers,read_fulfillments. Then:
/ops:setup shopify # re-enter store URL + tokenSymptom: THROTTLED errors mid-query from /ops:ecom.
Cause: Shopify Admin GraphQL uses a leaky-bucket cost system — heavy queries drain fast.
Fix:
-
/ops:ecomauto-paginates with cost-aware throttling — usually self-recovers - For burst load, add 5–10s between calls
- Consider an upgrade on the Shopify plan for higher bucket size
| API | Symptom | Fix |
|---|---|---|
| Bland AI | 402 Payment Required |
Top up credits at app.bland.ai |
| ElevenLabs |
quota_exceeded on TTS |
Upgrade plan or wait for monthly reset |
| Groq (Whisper) | 429 Too Many Requests |
Free tier: ~30 RPM. Throttle in /ops:voice transcribe
|
Tip
/ops:voice always logs the estimated cost before calling. Cancel if the estimate exceeds your threshold.
Symptom: /ops:monitor datadog returns 403 Forbidden or Authentication error.
Cause: Either the API key is wrong or the App key is missing. Datadog's /v1/monitor endpoint needs both headers — DD-API-KEY and DD-APPLICATION-KEY.
Fix:
/ops:settings datadog # re-enter both keys and smoke-test
# or
/ops:monitor --setupVerify directly:
curl -sf -H "DD-API-KEY: $DD_API_KEY" \
-H "DD-APPLICATION-KEY: $DD_APPLICATION_KEY" \
"https://api.datadoghq.com/api/v1/monitor" | jq '.[0]'Note
Datadog has regional endpoints (api.datadoghq.com, api.datadoghq.eu, api.us3.datadoghq.com, …). If 401 persists with a known-good key, check the region — /ops:settings datadog will prompt for it.
Symptom: Keys authenticate but monitors list is empty.
Fix: Set DD_SITE explicitly — e.g., export DD_SITE=datadoghq.eu. /ops:settings datadog re-runs the smoke test against the right region.
Symptom: /ops:monitor newrelic returns User does not have required capabilities on account <ID> or an empty result where you expect alerts.
Cause: The User API key is authorized for account A, but the NerdGraph query is scoped to account B. Common after switching orgs or rotating keys.
Fix:
/ops:settings newrelic # prompts for User API key + account ID, smoke-testsVerify directly (replace <KEY> and <ACCOUNT_ID>):
curl -sf -X POST https://api.newrelic.com/graphql \
-H "Api-Key: <KEY>" \
-H "Content-Type: application/json" \
-d '{"query":"{ actor { account(id: <ACCOUNT_ID>) { name } } }"}' | jqImportant
Use a User API key, not an Ingest, License, or Browser key. Only User keys can read NerdGraph alerts.
Symptom: 404 on NerdGraph even with a valid key.
Fix: If your New Relic account is in the EU region, set NEWRELIC_REGION=EU. /ops:monitor routes to https://api.eu.newrelic.com/graphql automatically.
Symptom: /ops:monitor otel returns connection refused or DNS lookup failed.
Fix checklist:
| Check | Command |
|---|---|
| Endpoint URL correct? |
echo $OTEL_EXPORTER_OTLP_ENDPOINT — should include scheme (https://) and port |
| Reachable from this machine? | curl -sf "$OTEL_EXPORTER_OTLP_ENDPOINT/v1/traces" -o /dev/null -w "%{http_code}\n" |
| Auth header set? |
/ops:settings otel → re-enter otel_headers (e.g. authorization=Bearer <token>) |
| TLS cert valid? | Retry curl without -k — if it works with -k but fails without, trust chain is broken |
Fix:
/ops:monitor --setup # re-enter endpoint + headers
/ops:settings otel # smoke-test after updateNote
monitor-agent uses OTLP/HTTP (not gRPC). The endpoint must accept HTTP POST on /v1/traces and /v1/metrics. Most OTLP collectors support both transports — pick HTTP in your collector config if you have a choice.
Symptom: HTTP 413 from the collector on large trace batches.
Fix: Increase the collector's max_request_body_size (e.g., in otel-collector-config.yaml → receivers.otlp.protocols.http.max_request_body_size: 8MB). monitor-agent is a read client, but some backends also reject oversized responses on trace search.
As of v1.1.0, gog is the public steipete/gogcli build. Cross-OS install:
| OS | Install |
|---|---|
| macOS | brew install steipete/tap/gogcli |
| Linux | curl -fsSL https://raw.githubusercontent.com/steipete/gogcli/main/install.sh | sh |
| Windows (Scoop) | scoop bucket add steipete https://github.com/steipete/scoop-bucket && scoop install gogcli |
| Any | go install github.com/steipete/gogcli@latest |
The setup wizard runs the OS-appropriate command automatically (Rule 4).
Symptom: gog: token expired on send.
Fix:
gog auth add # v1.1.0 command (was: gog auth login)
# or
/ops:setup emailWizard runs the command in background (Rule 2).
Symptom: gog send reports scope read-only even after auth.
Cause: OAuth scope wasn't elevated to gmail.send on initial auth.
Fix: gog auth remove && gog auth add --scopes=full
Note
Command renames in v1.1.0 — gog auth login → gog auth add, and gog cal → gog calendar events. Old commands are aliased for one minor version, but the wizard and all skills emit the new forms.
Symptom: ~/.claude/plugins/data/ops-ops-marketplace/memories/ is empty after 30 minutes.
Checklist:
| Check | Command |
|---|---|
| Daemon running? | launchctl list | grep claude-ops |
| Haiku API key set? | doppler secrets get ANTHROPIC_API_KEY --plain |
| wacli healthy? | wacli doctor |
| Last extractor run? | grep memory-extractor ~/.claude/plugins/data/ops-ops-marketplace/logs/ops-daemon.log |
Force run:
/ops:doctor --run-memory-extractorSee Memories System for the full extraction pipeline.
Symptom: /ops:speedup reports Unknown OS or runs the wrong cleanup.
Cause: Detection uses uname -s + /etc/os-release. WSL2 reports Linux even though the underlying host is Windows.
Fix:
bin/ops-speedup --json # inspect the detected OSIf detection is wrong, force with:
OPS_OS=macos /ops:speedup
OPS_OS=wsl /ops:speedup
OPS_OS=linux /ops:speedupWarning
infra-monitor flags public S3 buckets as high-priority fires. This is intentional — public buckets are the #1 source of cloud data leaks.
Symptom: /ops:fires shows S3 bucket <name> is publicly accessible.
Decide:
| Scenario | Action |
|---|---|
| Bucket is genuinely public (static site, public assets) | Mark as known-good in registry.json → infra.s3.allowlist
|
| Bucket shouldn't be public | Run aws s3api put-public-access-block --bucket <name> --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"
|
Caution
Rule 5 applies — any bucket-level destructive change (delete, make-private, revoke) requires explicit per-action confirmation. infra-monitor only reports; it never auto-remediates.
- Daemon Guide — full daemon internals, logs, health contract
- Plugin Rules — the five rules that constrain every fix
- Privacy and Security — what the plugin accesses (and doesn't)
-
/ops:doctor --verbose— emits every diagnostic + auto-fix attempted