Skip to content

cathrynlavery/openclaw-ops

Repository files navigation

openclaw-ops

OpenClaw gateway operations skill for agent environments. Covers health checks, repair workflows, continuous monitoring, session analysis, update-change detection, and security review for a local or self-hosted OpenClaw install.

Tested against OpenClaw 2026.5.4.

What it does

Skill

  • /openclaw-ops — full triage and configuration: gateway, auth, exec approvals, cron jobs, channels, sessions, and installation

Scripts

Script Purpose
scripts/heal.sh One-shot auto-fix for the most common gateway issues
scripts/post-update.sh Explicit post-update orchestrator: check-update, heal, workspace reconcile, security scan, final health check, policy-guard sentinel trigger
scripts/watchdog.sh Runs every 5 min, restarts gateway if down, escalates after 3 failures
scripts/watchdog-install.sh Installs the watchdog as a macOS LaunchAgent
scripts/watchdog-uninstall.sh Removes the LaunchAgent
scripts/check-update.sh Detects version changes, explains broken config, auto-fix with --fix
scripts/health-check.sh Declarative URL/process health checks; auto-generates targets file on first run
scripts/security-scan.sh Config hardening and credential exposure scan (0–100 score); skips bulky runtime/log/session history unless --include-sessions is passed
scripts/skill-audit.sh Static security audit for third-party skills before installation
scripts/codex-perf-check.sh Check and fix GPT-5.x performance opt-ins that ship disabled by default; --fix to apply
scripts/workspace-auto-commit.sh Local-only git snapshot helper for OpenClaw workspace repos; defaults to ~/.openclaw/workspace, supports --workspace and --all, never pushes
scripts/workspace-git-audit.sh Audits ~/.openclaw/workspace* repos for git status and auto-commit cron coverage; --show-cron prints setup commands for uncovered repos
scripts/session-monitor.sh Behavioral checks over live session JSONL files; detects retry loops, stuck runs, auth errors
scripts/session-search.sh Full-text session search with structured output and secret redaction
scripts/session-resume.sh Compaction-first markdown resume for a single session, including failure context
scripts/daily-digest.sh Incident, activity, watchdog, and cost summary for the last N hours
scripts/incident-manager.sh Shared machine-readable incident lifecycle helper (sourced by other scripts; state/logs under ~/.openclaw/logs)
scripts/remediation-board.sh Human/agent repair board for recurring bugs, hacks/workarounds, upstream watches, incident notes, status transitions, evidence, and verification checks
scripts/lib.sh Shared helpers: logging, port resolution, state files, sanitization (sourced)

Prerequisites

Tool Required for
openclaw everything
python3 heal.sh, lib.sh, watchdog.sh, session scripts
curl watchdog.sh, health-check.sh HTTP checks
openssl heal.sh auth token generation
rg (ripgrep) session-search.sh
launchctl + macOS watchdog-install.sh (LaunchAgent)
osascript watchdog.sh macOS notifications (optional)

Linux: watchdog-install.sh is macOS only. Use cron instead:

*/5 * * * * bash /path/to/scripts/watchdog.sh >> ~/.openclaw/logs/watchdog.log 2>&1

Minimum version

v2026.2.12 or later. Versions before this contain critical CVEs (including CVE-2026-25253 plus additional SSRF, path traversal, and prompt-injection fixes).

openclaw --version

Quick start

# 1. One-time heal pass
bash scripts/heal.sh

# 2. After every OpenClaw update, run the explicit post-update hook
bash scripts/post-update.sh

#    The hook also runs the VPS workspace reconcile script when present and
#    touches ~/.openclaw/state/policy-guard.trigger so a VPS can react through
#    openclaw-policy-guard.path after the update.

# 3. If you only want the update triage report:
bash scripts/check-update.sh        # report only
bash scripts/check-update.sh --fix  # report + auto-fix

# 4. Install always-on watchdog (macOS)
bash scripts/watchdog-install.sh

# 5. View watchdog log
tail -f ~/.openclaw/logs/watchdog.log

# 6. View incident history
cat ~/.openclaw/logs/heal-incidents.jsonl

# 7. Run health checks — targets file is auto-generated on first run
bash scripts/health-check.sh --verbose

# 8. Audit workspace git protection and print cron setup suggestions
bash scripts/workspace-git-audit.sh --show-cron

Gateway port

Scripts read the gateway port directly from ~/.openclaw/openclaw.json — no hardcoded port, no manual setup. Override with the OPENCLAW_GATEWAY_PORT env var if needed.

Session monitoring

# Scan all active sessions for behavioral issues
bash scripts/session-monitor.sh --verbose

# Search session history
bash scripts/session-search.sh "unauthorized" --limit 10

# Build a resume for a specific session
bash scripts/session-resume.sh ~/.openclaw/agents/knox/sessions/<session>.jsonl

# 24-hour digest: incidents, activity, costs
bash scripts/daily-digest.sh --hours 24

# Track cron/ops findings to completion
bash scripts/remediation-board.sh import-cron-errors
bash scripts/remediation-board.sh list

# Mandatory: when investigation finds a real local OpenClaw error,
# regression, hack/workaround, security concern, or recurring ops finding,
# create/update a board item immediately. The board is the local repair loop;
# upstream links are optional metadata only when an external fix also exists.
bash scripts/remediation-board.sh add-incident log-gap "Log sweep missed runtime errors" --evidence "tmp OpenClaw log contained active failure"
bash scripts/remediation-board.sh close-criteria log-gap "Local log-sweep catches the failure class and the installed workflow is synced"

# Track a recurring bug without starting over next time
bash scripts/remediation-board.sh list --type incident
bash scripts/remediation-board.sh show telegram-split
bash scripts/remediation-board.sh add-incident telegram-split "Telegram topic replies split into many messages" --evidence "Observed in forum topic" --next "Check upstream issue and local channel config"
bash scripts/remediation-board.sh hypothesis telegram-split "Preview draft lane may send instead of edit" --confidence medium
bash scripts/remediation-board.sh tried telegram-split --step "Checked release notes" --result "Found related Telegram delivery fixes"
bash scripts/remediation-board.sh workaround telegram-split "Use explicit message.send for topic-visible replies"
bash scripts/remediation-board.sh export-note telegram-split

Watchdog escalation model

  1. Tier 1 — HTTP ping every 5 min (LaunchAgent or cron)
  2. Tier 2 — Gateway restart + heal.sh if simple restart fails
  3. Tier 3 — macOS notification after 3 failed attempts in 15 min

Platform support

Platform heal.sh watchdog LaunchAgent
macOS
Linux ✓ (via cron)
Windows WSL2 ✓ (via cron)

Viewing logs

macOS:

tail -f ~/.openclaw/logs/gateway.err.log
tail -f ~/.openclaw/logs/watchdog.log

Linux (systemd):

journalctl --user -u openclaw-gateway -f

Notes

  • health-check.sh creates ~/.openclaw/health-targets.conf automatically on first run using the port from openclaw.json. Edit the file to add custom targets or adjust thresholds.
  • health-check.sh can report a process uptime failure immediately after openclaw gateway restart if the target has a minimum uptime threshold (e.g. 300s). That is expected — lower the threshold during smoke tests, then restore it.
  • security-scan.sh reports file paths and line numbers for suspected secrets, but redacts the secret values themselves.
  • check-update.sh is intended for real post-upgrade triage. It is normal to report a version change the first time it runs after an upgrade.
  • post-update.sh is the explicit post-update orchestrator. It skips the heavy sequence when the current version matches the stored state and otherwise runs check-update.sh --fix, heal.sh, the workspace reconcile script if present, security-scan.sh, and a final openclaw health --json.
  • On the VPS, the workspace reconcile stage refreshes model policy, auth/profile state, voice defaults, and the gateway service through openclaw_post_update_reconcile.py (or the equivalent systemd oneshot wrapper).
  • After the health check it best-effort touches ~/.openclaw/state/policy-guard.trigger (creating parent dirs if needed). The VPS can wire openclaw-policy-guard.path to that sentinel after updates.
  • Set OPENCLAW_POST_UPDATE_RECONCILE_SCRIPT (and optionally OPENCLAW_POST_UPDATE_RECONCILE_INTERPRETER) if the reconcile script lives somewhere other than the default workspace path.
  • If another wrapper or automation layer launches the post-update hook, set OPENCLAW_SKIP_WRAPPER_BACKUP=1 for nested openclaw calls so internal subcommands do not trigger backup loops.
  • codex-perf-check.sh requires v2026.4.x or later — the four settings it checks do not exist in earlier releases.

Running tests

bash tests/run.sh

Open-Source Release Checklist

  • Remove any local ~/.openclaw state, logs, or example outputs from the repository.
  • Do not publish screenshots or pasted scan output that contain real tokens, session material, or private channel identifiers.
  • Keep examples generic: placeholder tokens, placeholder user IDs, and non-sensitive hostnames only.
  • Run bash tests/run.sh before publishing changes.

About

OpenClaw operations skill with health checks, repair scripts, watchdogs, update triage, and security scans.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages