Skip to content

Troubleshooting

auroracapital edited this page Apr 18, 2026 · 5 revisions

Troubleshooting

Common issues, organized by component. Every fix is copy-pasteable.

Version Auto-repair Rule 3

Tip

Before reading further, try /ops:doctor. It runs bin/ops-autofix silently — rebuilds wacli with FTS5, re-registers missing MCPs, fixes common registry issues, and reports what it repaired.


Contents


Telegram

Rate limiting on list_dialogs

Symptom: FLOOD_WAIT_X error, or dialogs stop loading mid-request.

Cause: Telegram's MTProto API enforces rate limits on getDialogs. The 8-hour cooldown is standard after rapid consecutive calls.

Fix:

  • Wait for the cooldown to expire (up to 8 hours)
  • Use search_messages instead of list_dialogs — different endpoint
  • Reduce frequency — avoid calling /ops:inbox telegram more than once per hour

Note

The bundled Telegram MCP server handles this gracefully — it returns partial results rather than failing hard.

Session string expired

Symptom: AUTH_KEY_INVALID or SESSION_REVOKED.

Cause: The gram.js session string was revoked (logged out elsewhere, or Telegram invalidated old sessions).

Fix:

/ops:setup telegram

Regenerates the session via bin/ops-telegram-autolink.mjs and auto-writes to .mcp.json.


WhatsApp

App-state key desync

Symptom: wacli doctor shows CONNECTED: true but AUTH_STATE: degraded. Messages fail silently.

Cause: WhatsApp Web protocol requires periodic app-state key syncs. After 2–4 weeks of inactivity, local keys drift.

Fix:

wacli auth logout
wacli auth login     # Re-scan QR code from phone
wacli doctor         # Confirm AUTHENTICATED + CONNECTED

Tip

Keep a session active every 2 weeks. The setup wizard runs wacli doctor automatically.

wacli command not found

Symptom: ○ wacli (not installed) in setup; WhatsApp sections silently skipped (Rule 3 surfaces the option).

Fix: Manual install — see the wacli repository, then re-run /ops:setup cli.


Slack

Token expiry

Symptom: invalid_auth / token_revoked.

Fix:

/ops:setup mcp        # re-run OAuth via Claude.ai
# — OR —
/ops:setup channels   # re-extract local bot token via Playwright

MCP quota exhaustion

Symptom: ratelimited after several searches.

Fix: Switch to the local bot token path — unlimited search, private channels without bot membership:

/ops:setup channels   # picks up local token via bin/ops-slack-autolink.mjs

MCP general — quota exhaustion and fallback

Any MCP tool (Sentry / Linear / Slack) can hit quota mid-session. Skills automatically fall back to:

MCP Fallback
Sentry sentry-cli issues list --project <slug> or curl the REST API
Linear curl -X POST https://api.linear.app/graphql -H "Authorization: $LINEAR_API_KEY"
Slack Local bot token (unlimited search)

SessionStart hook warnings

Symptom: lines at session start.

Cause: hooks/hooks.json runs scripts/setup.sh to detect missing config. A = something unconfigured, not broken.

Fix: Run /ops:setup to address flagged items, or ignore warnings for integrations you don't use.


Missing tools detection

Skills silently skip sections when a tool is missing (Rule 3 surfaces the choice rather than silently skipping secrets).

Verify what's detected:

~/.claude/plugins/cache/ops-marketplace/ops/<version>/bin/ops-setup-detect

Look for false values in the JSON output. Fix: install the missing tool, then /ops:setup cli.


Project registry issues

registry.json not found

/ops:setup registry
# or manually:
cp scripts/registry.example.json scripts/registry.json

CLAUDE_PLUGIN_ROOT unset

/ops:setup env
# Appends export to ~/.zshrc or ~/.bashrc (append-only, never rewrites)

Restart Claude Code after.


AWS CLI issues

ECS shows empty

Fix A — Add clusters to registry:

{
  "infra": {
    "ecs_clusters": ["myapp-production"],
    "platform": "aws"
  }
}

Fix B — Authenticate:

aws configure
aws sts get-caller-identity   # verify

Daemon

Pre-warm cache not updating

Symptom: /ops:go takes >10 seconds; daemon-health.json shows last_briefing >5 min old.

Cause: briefing-pre-warm service crashed or the daemon stopped.

Fix:

/ops:doctor                     # runs auto-repair first
launchctl list | grep claude-ops
launchctl kickstart -k gui/$UID/com.claude-ops.daemon
tail -f ~/.claude/plugins/data/ops-ops-marketplace/logs/ops-daemon.log

launchd errors

Symptom: launchctl load fails with Load failed: 5: Input/output error.

Cause: Stale plist, or binary path changed after a plugin version bump.

Fix:

launchctl unload ~/Library/LaunchAgents/com.claude-ops.daemon.plist 2>/dev/null
rm ~/Library/LaunchAgents/com.claude-ops.daemon.plist
/ops:setup daemon               # re-generates plist with current paths

Health file stale

Symptom: PreToolUse hook warns wacli health degraded — last sync 14 min ago.

Cause: Daemon alive but wacli-sync service stuck. Usually a WhatsApp re-auth is needed.

Fix:

wacli auth logout && wacli auth login
launchctl kickstart -k gui/$UID/com.claude-ops.daemon

Warning

The hook fails closed — skills warn rather than silently retrying with stale auth. This is intentional (Rule 3 + Privacy).

See also: Daemon Guide.


Doppler

Not authenticated

Symptom: doppler secrets get returns You must be logged in.

Fix:

doppler login    # run in a real terminal (opens browser)
# or
/ops:setup doppler

Wrong project bound

Symptom: Secrets that exist in Doppler return empty.

Fix:

doppler setup --project <project> --config prd --no-interactive
doppler configure debug        # confirm the bound project + config

Tip

Every skill supports doppler:KEY_NAME reference tokens — so secrets never leave the vault. See Privacy and Security.


Stripe

401 Unauthorized

Cause: Using a test key against live data (or vice-versa).

Fix: /ops:setup revenue will re-scan env + Doppler for STRIPE_SECRET_KEY. Verify prefix:

Prefix Meaning
sk_test_… Test mode — no real charges
sk_live_… Live mode — real money
rk_live_… Restricted live key — use this for read-only dashboards

Missing key in auto-scan

Symptom: Stripe: not configured even though you have a key.

Fix: Paste via the auto-scan prompt, or export in ~/.zshrc:

export STRIPE_SECRET_KEY="rk_live_…"

Re-run /ops:setup revenue.

Caution

Never commit sk_live_… keys to source. tests/test-no-secrets.sh grep-checks pre-commit — if it trips, fix before pushing.


RevenueCat

Invalid project ID

Symptom: /ops:revenue returns RevenueCat: Error (401) or the dashboard widget shows no MRR.

Fix: Verify the project in the RevenueCat dashboard URL: https://app.revenuecat.com/projects/<PROJECT_ID>/.... Then:

/ops:setup revenue    # re-enter via the prompt, or
export REVENUECAT_PROJECT_ID="proj_xxxxx"
export REVENUECAT_API_KEY="sk_xxxxx"

Note

Secret API keys start with sk_. Public SDK keys (appl_…, goog_…) won't work for dashboards — they're for client apps only.


Shopify

Invalid access token / 401

Cause: Admin token revoked in Shopify admin → Apps → Private apps, or token mis-scoped.

Fix: Create a new admin API token with scopes read_orders,read_products,read_inventory,read_customers,read_fulfillments. Then:

/ops:setup shopify    # re-enter store URL + token

GraphQL rate limit

Symptom: THROTTLED errors mid-query from /ops:ecom.

Cause: Shopify Admin GraphQL uses a leaky-bucket cost system — heavy queries drain fast.

Fix:

  • /ops:ecom auto-paginates with cost-aware throttling — usually self-recovers
  • For burst load, add 5–10s between calls
  • Consider an upgrade on the Shopify plan for higher bucket size

Voice APIs (Bland AI · ElevenLabs · Groq)

Quota exceeded

API Symptom Fix
Bland AI 402 Payment Required Top up credits at app.bland.ai
ElevenLabs quota_exceeded on TTS Upgrade plan or wait for monthly reset
Groq (Whisper) 429 Too Many Requests Free tier: ~30 RPM. Throttle in /ops:voice transcribe

Tip

/ops:voice always logs the estimated cost before calling. Cancel if the estimate exceeds your threshold.


Datadog

API key 401

Symptom: /ops:monitor datadog returns 403 Forbidden or Authentication error.

Cause: Either the API key is wrong or the App key is missing. Datadog's /v1/monitor endpoint needs both headers — DD-API-KEY and DD-APPLICATION-KEY.

Fix:

/ops:settings datadog      # re-enter both keys and smoke-test
# or
/ops:monitor --setup

Verify directly:

curl -sf -H "DD-API-KEY: $DD_API_KEY" \
        -H "DD-APPLICATION-KEY: $DD_APPLICATION_KEY" \
        "https://api.datadoghq.com/api/v1/monitor" | jq '.[0]'

Note

Datadog has regional endpoints (api.datadoghq.com, api.datadoghq.eu, api.us3.datadoghq.com, …). If 401 persists with a known-good key, check the region — /ops:settings datadog will prompt for it.

Wrong site / region

Symptom: Keys authenticate but monitors list is empty.

Fix: Set DD_SITE explicitly — e.g., export DD_SITE=datadoghq.eu. /ops:settings datadog re-runs the smoke test against the right region.


New Relic

Account ID mismatch

Symptom: /ops:monitor newrelic returns User does not have required capabilities on account <ID> or an empty result where you expect alerts.

Cause: The User API key is authorized for account A, but the NerdGraph query is scoped to account B. Common after switching orgs or rotating keys.

Fix:

/ops:settings newrelic    # prompts for User API key + account ID, smoke-tests

Verify directly (replace <KEY> and <ACCOUNT_ID>):

curl -sf -X POST https://api.newrelic.com/graphql \
  -H "Api-Key: <KEY>" \
  -H "Content-Type: application/json" \
  -d '{"query":"{ actor { account(id: <ACCOUNT_ID>) { name } } }"}' | jq

Important

Use a User API key, not an Ingest, License, or Browser key. Only User keys can read NerdGraph alerts.

EU vs US endpoint

Symptom: 404 on NerdGraph even with a valid key.

Fix: If your New Relic account is in the EU region, set NEWRELIC_REGION=EU. /ops:monitor routes to https://api.eu.newrelic.com/graphql automatically.


OpenTelemetry

OTLP endpoint unreachable

Symptom: /ops:monitor otel returns connection refused or DNS lookup failed.

Fix checklist:

Check Command
Endpoint URL correct? echo $OTEL_EXPORTER_OTLP_ENDPOINT — should include scheme (https://) and port
Reachable from this machine? curl -sf "$OTEL_EXPORTER_OTLP_ENDPOINT/v1/traces" -o /dev/null -w "%{http_code}\n"
Auth header set? /ops:settings otel → re-enter otel_headers (e.g. authorization=Bearer <token>)
TLS cert valid? Retry curl without -k — if it works with -k but fails without, trust chain is broken

Fix:

/ops:monitor --setup       # re-enter endpoint + headers
/ops:settings otel         # smoke-test after update

Note

monitor-agent uses OTLP/HTTP (not gRPC). The endpoint must accept HTTP POST on /v1/traces and /v1/metrics. Most OTLP collectors support both transports — pick HTTP in your collector config if you have a choice.

413 Payload Too Large

Symptom: HTTP 413 from the collector on large trace batches.

Fix: Increase the collector's max_request_body_size (e.g., in otel-collector-config.yamlreceivers.otlp.protocols.http.max_request_body_size: 8MB). monitor-agent is a read client, but some backends also reject oversized responses on trace search.


gog (Gmail CLI)

Not installed

As of v1.1.0, gog is the public steipete/gogcli build. Cross-OS install:

OS Install
macOS brew install steipete/tap/gogcli
Linux curl -fsSL https://raw.githubusercontent.com/steipete/gogcli/main/install.sh | sh
Windows (Scoop) scoop bucket add steipete https://github.com/steipete/scoop-bucket && scoop install gogcli
Any go install github.com/steipete/gogcli@latest

The setup wizard runs the OS-appropriate command automatically (Rule 4).

OAuth expired

Symptom: gog: token expired on send.

Fix:

gog auth add         # v1.1.0 command (was: gog auth login)
# or
/ops:setup email

Wizard runs the command in background (Rule 2).

Sending disabled

Symptom: gog send reports scope read-only even after auth.

Cause: OAuth scope wasn't elevated to gmail.send on initial auth.

Fix: gog auth remove && gog auth add --scopes=full

Note

Command renames in v1.1.0gog auth logingog auth add, and gog calgog calendar events. Old commands are aliased for one minor version, but the wizard and all skills emit the new forms.


Memories not extracting

Symptom: ~/.claude/plugins/data/ops-ops-marketplace/memories/ is empty after 30 minutes.

Checklist:

Check Command
Daemon running? launchctl list | grep claude-ops
Haiku API key set? doppler secrets get ANTHROPIC_API_KEY --plain
wacli healthy? wacli doctor
Last extractor run? grep memory-extractor ~/.claude/plugins/data/ops-ops-marketplace/logs/ops-daemon.log

Force run:

/ops:doctor --run-memory-extractor

See Memories System for the full extraction pipeline.


ops-speedup OS detection

Symptom: /ops:speedup reports Unknown OS or runs the wrong cleanup.

Cause: Detection uses uname -s + /etc/os-release. WSL2 reports Linux even though the underlying host is Windows.

Fix:

bin/ops-speedup --json    # inspect the detected OS

If detection is wrong, force with:

OPS_OS=macos /ops:speedup
OPS_OS=wsl   /ops:speedup
OPS_OS=linux /ops:speedup

infra-monitor: public S3 bucket alerts

Warning

infra-monitor flags public S3 buckets as high-priority fires. This is intentional — public buckets are the #1 source of cloud data leaks.

Symptom: /ops:fires shows S3 bucket <name> is publicly accessible.

Decide:

Scenario Action
Bucket is genuinely public (static site, public assets) Mark as known-good in registry.jsoninfra.s3.allowlist
Bucket shouldn't be public Run aws s3api put-public-access-block --bucket <name> --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

Caution

Rule 5 applies — any bucket-level destructive change (delete, make-private, revoke) requires explicit per-action confirmation. infra-monitor only reports; it never auto-remediates.


Still stuck?

  • Daemon Guide — full daemon internals, logs, health contract
  • Plugin Rules — the five rules that constrain every fix
  • Privacy and Security — what the plugin accesses (and doesn't)
  • /ops:doctor --verbose — emits every diagnostic + auto-fix attempted

Clone this wiki locally