Troubleshooting

Common issues, organized by component. Every fix is copy-pasteable.

Tip

Before reading further, try /ops:doctor. It runs bin/ops-autofix silently — rebuilds wacli with FTS5, re-registers missing MCPs, fixes common registry issues, and reports what it repaired.

Telegram

Rate limiting on `list_dialogs`

Symptom: FLOOD_WAIT_X error, or dialogs stop loading mid-request.

Cause: Telegram's MTProto API enforces rate limits on getDialogs. The 8-hour cooldown is standard after rapid consecutive calls.

Fix:

Wait for the cooldown to expire (up to 8 hours)
Use search_messages instead of list_dialogs — different endpoint
Reduce frequency — avoid calling /ops:inbox telegram more than once per hour

Note

The bundled Telegram MCP server handles this gracefully — it returns partial results rather than failing hard.

Session string expired

Symptom: AUTH_KEY_INVALID or SESSION_REVOKED.

Cause: The gram.js session string was revoked (logged out elsewhere, or Telegram invalidated old sessions).

Fix:

/ops:setup telegram

Regenerates the session via bin/ops-telegram-autolink.mjs and auto-writes to .mcp.json.

WhatsApp

App-state key desync

Symptom: wacli doctor shows CONNECTED: true but AUTH_STATE: degraded. Messages fail silently.

Cause: WhatsApp Web protocol requires periodic app-state key syncs. After 2–4 weeks of inactivity, local keys drift.

Fix:

wacli auth logout
wacli auth login     # Re-scan QR code from phone
wacli doctor         # Confirm AUTHENTICATED + CONNECTED

Tip

Keep a session active every 2 weeks. The setup wizard runs wacli doctor automatically.

`wacli` command not found

Symptom: ○ wacli (not installed) in setup; WhatsApp sections silently skipped (Rule 3 surfaces the option).

Fix: Manual install — see the wacli repository, then re-run /ops:setup cli.

Slack

Token expiry

Symptom: invalid_auth / token_revoked.

Fix:

/ops:setup mcp        # re-run OAuth via Claude.ai
# — OR —
/ops:setup channels   # re-extract local bot token via Playwright

MCP quota exhaustion

Symptom: ratelimited after several searches.

Fix: Switch to the local bot token path — unlimited search, private channels without bot membership:

/ops:setup channels   # picks up local token via bin/ops-slack-autolink.mjs

MCP general — quota exhaustion and fallback

Any MCP tool (Sentry / Linear / Slack) can hit quota mid-session. Skills automatically fall back to:

MCP	Fallback
Sentry	`sentry-cli issues list --project <slug>` or `curl` the REST API
Linear	`curl -X POST https://api.linear.app/graphql -H "Authorization: $LINEAR_API_KEY"`
Slack	Local bot token (unlimited search)

SessionStart hook warnings

Symptom: ✗ lines at session start.

Cause: hooks/hooks.json runs scripts/setup.sh to detect missing config. A ✗ = something unconfigured, not broken.

Fix: Run /ops:setup to address flagged items, or ignore warnings for integrations you don't use.

Missing tools detection

Skills silently skip sections when a tool is missing (Rule 3 surfaces the choice rather than silently skipping secrets).

Verify what's detected:

~/.claude/plugins/cache/ops-marketplace/ops/<version>/bin/ops-setup-detect

Look for false values in the JSON output. Fix: install the missing tool, then /ops:setup cli.

Project registry issues

`registry.json not found`

/ops:setup registry
# or manually:
cp scripts/registry.example.json scripts/registry.json

`CLAUDE_PLUGIN_ROOT` unset

/ops:setup env
# Appends export to ~/.zshrc or ~/.bashrc (append-only, never rewrites)

Restart Claude Code after.

AWS CLI issues

ECS shows empty

Fix A — Add clusters to registry:

{
  "infra": {
    "ecs_clusters": ["myapp-production"],
    "platform": "aws"
  }
}

Fix B — Authenticate:

aws configure
aws sts get-caller-identity   # verify

Daemon

Pre-warm cache not updating

Symptom: /ops:go takes >10 seconds; daemon-health.json shows last_briefing >5 min old.

Cause: briefing-pre-warm service crashed or the daemon stopped.

Fix:

/ops:doctor                     # runs auto-repair first
launchctl list | grep claude-ops
launchctl kickstart -k gui/$UID/com.claude-ops.daemon
tail -f ~/.claude/plugins/data/ops-ops-marketplace/logs/ops-daemon.log

launchd errors

Symptom: launchctl load fails with Load failed: 5: Input/output error.

Cause: Stale plist, or binary path changed after a plugin version bump.

Fix:

launchctl unload ~/Library/LaunchAgents/com.claude-ops.daemon.plist 2>/dev/null
rm ~/Library/LaunchAgents/com.claude-ops.daemon.plist
/ops:setup daemon               # re-generates plist with current paths

Health file stale

Symptom: PreToolUse hook warns wacli health degraded — last sync 14 min ago.

Cause: Daemon alive but wacli-sync service stuck. Usually a WhatsApp re-auth is needed.

Fix:

wacli auth logout && wacli auth login
launchctl kickstart -k gui/$UID/com.claude-ops.daemon

Warning

The hook fails closed — skills warn rather than silently retrying with stale auth. This is intentional (Rule 3 + Privacy).

Doppler

Not authenticated

Symptom: doppler secrets get returns You must be logged in.

Fix:

doppler login    # run in a real terminal (opens browser)
# or
/ops:setup doppler

Wrong project bound

Symptom: Secrets that exist in Doppler return empty.

Fix:

doppler setup --project <project> --config prd --no-interactive
doppler configure debug        # confirm the bound project + config

Tip

Every skill supports doppler:KEY_NAME reference tokens — so secrets never leave the vault. See Privacy and Security.

Stripe

`401 Unauthorized`

Cause: Using a test key against live data (or vice-versa).

Fix: /ops:setup revenue will re-scan env + Doppler for STRIPE_SECRET_KEY. Verify prefix:

Prefix	Meaning
`sk_test_…`	Test mode — no real charges
`sk_live_…`	Live mode — real money
`rk_live_…`	Restricted live key — use this for read-only dashboards

Missing key in auto-scan

Symptom: Stripe: not configured even though you have a key.

Fix: Paste via the auto-scan prompt, or export in ~/.zshrc:

export STRIPE_SECRET_KEY="rk_live_…"

Re-run /ops:setup revenue.

Caution

Never commit sk_live_… keys to source. tests/test-no-secrets.sh grep-checks pre-commit — if it trips, fix before pushing.

RevenueCat

`Invalid project ID`

Symptom: /ops:revenue returns RevenueCat: Error (401) or the dashboard widget shows no MRR.

Fix: Verify the project in the RevenueCat dashboard URL: https://app.revenuecat.com/projects/<PROJECT_ID>/.... Then:

/ops:setup revenue    # re-enter via the prompt, or
export REVENUECAT_PROJECT_ID="proj_xxxxx"
export REVENUECAT_API_KEY="sk_xxxxx"

Note

Secret API keys start with sk_. Public SDK keys (appl_…, goog_…) won't work for dashboards — they're for client apps only.

Shopify

`Invalid access token` / 401

Cause: Admin token revoked in Shopify admin → Apps → Private apps, or token mis-scoped.

Fix: Create a new admin API token with scopes read_orders,read_products,read_inventory,read_customers,read_fulfillments. Then:

/ops:setup shopify    # re-enter store URL + token

GraphQL rate limit

Symptom: THROTTLED errors mid-query from /ops:ecom.

Cause: Shopify Admin GraphQL uses a leaky-bucket cost system — heavy queries drain fast.

Fix:

/ops:ecom auto-paginates with cost-aware throttling — usually self-recovers
For burst load, add 5–10s between calls
Consider an upgrade on the Shopify plan for higher bucket size

Voice APIs (Bland AI · ElevenLabs · Groq)

Quota exceeded

API	Symptom	Fix
Bland AI	`402 Payment Required`	Top up credits at app.bland.ai
ElevenLabs	`quota_exceeded` on TTS	Upgrade plan or wait for monthly reset
Groq (Whisper)	`429 Too Many Requests`	Free tier: ~30 RPM. Throttle in `/ops:voice transcribe`

Tip

/ops:voice always logs the estimated cost before calling. Cancel if the estimate exceeds your threshold.

Datadog

API key 401

Symptom: /ops:monitor datadog returns 403 Forbidden or Authentication error.

Cause: Either the API key is wrong or the App key is missing. Datadog's /v1/monitor endpoint needs both headers — DD-API-KEY and DD-APPLICATION-KEY.

Fix:

/ops:settings datadog      # re-enter both keys and smoke-test
# or
/ops:monitor --setup

Verify directly:

curl -sf -H "DD-API-KEY: $DD_API_KEY" \
        -H "DD-APPLICATION-KEY: $DD_APPLICATION_KEY" \
        "https://api.datadoghq.com/api/v1/monitor" | jq '.[0]'

Note

Datadog has regional endpoints (api.datadoghq.com, api.datadoghq.eu, api.us3.datadoghq.com, …). If 401 persists with a known-good key, check the region — /ops:settings datadog will prompt for it.

Wrong site / region

Symptom: Keys authenticate but monitors list is empty.

Fix: Set DD_SITE explicitly — e.g., export DD_SITE=datadoghq.eu. /ops:settings datadog re-runs the smoke test against the right region.

New Relic

Account ID mismatch

Symptom: /ops:monitor newrelic returns User does not have required capabilities on account <ID> or an empty result where you expect alerts.

Cause: The User API key is authorized for account A, but the NerdGraph query is scoped to account B. Common after switching orgs or rotating keys.

Fix:

/ops:settings newrelic    # prompts for User API key + account ID, smoke-tests

Verify directly (replace <KEY> and <ACCOUNT_ID>):

curl -sf -X POST https://api.newrelic.com/graphql \
  -H "Api-Key: <KEY>" \
  -H "Content-Type: application/json" \
  -d '{"query":"{ actor { account(id: <ACCOUNT_ID>) { name } } }"}' | jq

Important

Use a User API key, not an Ingest, License, or Browser key. Only User keys can read NerdGraph alerts.

EU vs US endpoint

Symptom: 404 on NerdGraph even with a valid key.

Fix: If your New Relic account is in the EU region, set NEWRELIC_REGION=EU. /ops:monitor routes to https://api.eu.newrelic.com/graphql automatically.

OpenTelemetry

OTLP endpoint unreachable

Symptom: /ops:monitor otel returns connection refused or DNS lookup failed.

Fix checklist:

Check	Command
Endpoint URL correct?	`echo $OTEL_EXPORTER_OTLP_ENDPOINT` — should include scheme (`https://`) and port
Reachable from this machine?	`curl -sf "$OTEL_EXPORTER_OTLP_ENDPOINT/v1/traces" -o /dev/null -w "%{http_code}\n"`
Auth header set?	`/ops:settings otel` → re-enter `otel_headers` (e.g. `authorization=Bearer <token>`)
TLS cert valid?	Retry `curl` without `-k` — if it works with `-k` but fails without, trust chain is broken

Fix:

/ops:monitor --setup       # re-enter endpoint + headers
/ops:settings otel         # smoke-test after update

Note

monitor-agent uses OTLP/HTTP (not gRPC). The endpoint must accept HTTP POST on /v1/traces and /v1/metrics. Most OTLP collectors support both transports — pick HTTP in your collector config if you have a choice.

413 Payload Too Large

Symptom: HTTP 413 from the collector on large trace batches.

Fix: Increase the collector's max_request_body_size (e.g., in otel-collector-config.yaml → receivers.otlp.protocols.http.max_request_body_size: 8MB). monitor-agent is a read client, but some backends also reject oversized responses on trace search.

gog (Gmail CLI)

Not installed

As of v1.1.0, gog is the public steipete/gogcli build. Cross-OS install:

OS	Install
macOS	`brew install steipete/tap/gogcli`
Linux	`curl -fsSL https://raw.githubusercontent.com/steipete/gogcli/main/install.sh \| sh`
Windows (Scoop)	`scoop bucket add steipete https://github.com/steipete/scoop-bucket && scoop install gogcli`
Any	`go install github.com/steipete/gogcli@latest`

The setup wizard runs the OS-appropriate command automatically (Rule 4).

OAuth expired

Symptom: gog: token expired on send.

Fix:

gog auth add         # v1.1.0 command (was: gog auth login)
# or
/ops:setup email

Wizard runs the command in background (Rule 2).

Sending disabled

Symptom: gog send reports scope read-only even after auth.

Cause: OAuth scope wasn't elevated to gmail.send on initial auth.

Fix: gog auth remove && gog auth add --scopes=full

Note

Command renames in v1.1.0 — gog auth login → gog auth add, and gog cal → gog calendar events. Old commands are aliased for one minor version, but the wizard and all skills emit the new forms.

Memories not extracting

Symptom: ~/.claude/plugins/data/ops-ops-marketplace/memories/ is empty after 30 minutes.

Checklist:

Check	Command
Daemon running?	`launchctl list \| grep claude-ops`
Haiku API key set?	`doppler secrets get ANTHROPIC_API_KEY --plain`
wacli healthy?	`wacli doctor`
Last extractor run?	`grep memory-extractor ~/.claude/plugins/data/ops-ops-marketplace/logs/ops-daemon.log`

Force run:

/ops:doctor --run-memory-extractor

See Memories System for the full extraction pipeline.

ops-speedup OS detection

Symptom: /ops:speedup reports Unknown OS or runs the wrong cleanup.

Cause: Detection uses uname -s + /etc/os-release. WSL2 reports Linux even though the underlying host is Windows.

Fix:

bin/ops-speedup --json    # inspect the detected OS

If detection is wrong, force with:

OPS_OS=macos /ops:speedup
OPS_OS=wsl   /ops:speedup
OPS_OS=linux /ops:speedup

infra-monitor: public S3 bucket alerts

Warning

infra-monitor flags public S3 buckets as high-priority fires. This is intentional — public buckets are the #1 source of cloud data leaks.

Symptom: /ops:fires shows S3 bucket <name> is publicly accessible.

Decide:

Scenario	Action
Bucket is genuinely public (static site, public assets)	Mark as known-good in `registry.json` → `infra.s3.allowlist`
Bucket shouldn't be public	Run `aws s3api put-public-access-block --bucket <name> --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"`

Caution

Rule 5 applies — any bucket-level destructive change (delete, make-private, revoke) requires explicit per-action confirmation. infra-monitor only reports; it never auto-remediates.

Troubleshooting

Troubleshooting

Contents

Telegram

Rate limiting on list_dialogs

Session string expired

WhatsApp

App-state key desync

wacli command not found

Slack

Token expiry

MCP quota exhaustion

MCP general — quota exhaustion and fallback

SessionStart hook warnings

Missing tools detection

Project registry issues

registry.json not found

CLAUDE_PLUGIN_ROOT unset

AWS CLI issues

ECS shows empty

Daemon

Pre-warm cache not updating

launchd errors

Health file stale

Doppler

Not authenticated

Wrong project bound

Stripe

401 Unauthorized

Missing key in auto-scan

RevenueCat

Invalid project ID

Shopify

Invalid access token / 401

GraphQL rate limit

Voice APIs (Bland AI · ElevenLabs · Groq)

Quota exceeded

Datadog

API key 401

Wrong site / region

New Relic

Account ID mismatch

EU vs US endpoint

OpenTelemetry

OTLP endpoint unreachable

413 Payload Too Large

gog (Gmail CLI)

Not installed

OAuth expired

Sending disabled

Memories not extracting

ops-speedup OS detection

infra-monitor: public S3 bucket alerts

Still stuck?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude-ops Wiki

Rate limiting on `list_dialogs`

`wacli` command not found

`registry.json not found`

`CLAUDE_PLUGIN_ROOT` unset

`401 Unauthorized`

`Invalid project ID`

`Invalid access token` / 401