Skip to content

Maleick/AutoShip

Repository files navigation

AutoShip — ship GitHub issues on autopilot

Stars Last Commit Version License Security Sponsor

DocsBefore/AfterInstallCommandsHow It WorksDispatch MatrixArchitectureBenchmarks

Issues in. PRs out. Opus budget: intact.

☕ Keep the agents running


A Claude Code plugin that autonomously routes GitHub issues to AI agents — Codex, Gemini, Copilot, or Claude — verifies their work, opens pull requests, merges them, and loops back for the next one. One command starts the loop. You watch it ship.

┌──────────────────────────────────────────┐
│  ISSUE ROUTING         ████████ AUTO     │
│  AGENT DISPATCH        ████████ AUTO     │
│  PR CREATION           ████████ AUTO     │
│  CI MONITORING         ████████ AUTO     │
│  MERGE + CLOSE         ████████ AUTO     │
│  YOUR EFFORT           █        ~5%      │
└──────────────────────────────────────────┘

Before / After

📋 Without AutoShip (20 issues)

  1. Open GitHub issues backlog
  2. Read each issue, estimate complexity
  3. Pick an issue manually
  4. Open a worktree or branch
  5. Write a dispatch prompt from scratch
  6. Paste it into Codex / Gemini / Claude
  7. Watch the agent, babysit if it gets stuck
  8. Review the output yourself
  9. Open a PR manually
  10. Wait for CI, watch for failures
  11. Merge manually
  12. Close the issue manually
  13. Repeat × 19 more issues

Time: 3–8 hrs. You did most of it.

🚀 With AutoShip (20 issues)

/autoship:start

AutoShip reads your open issues, classifies each one (research / docs / simple code / complex), picks the best agent for the job, creates an isolated worktree, dispatches the agent with a focused prompt, verifies the result against acceptance criteria, opens a PR, waits for CI, merges, closes the issue — and immediately starts the next one.

You come back to merged PRs.

Time: you typed one command.

Same issues. One command. Brain free.

  • Third-party first — burns Codex, Gemini, and Copilot quota before touching Claude tokens
  • Parallel workers — up to 20 issues in flight simultaneously
  • Task-type routing — classifies issues into 8 categories (incl. rust_unsafe), routes each to the best agent
  • Live routing config — edit AUTOSHIP.md front matter to change agent priorities, takes effect immediately
  • Verification pipeline — every result reviewed against acceptance criteria before a PR opens
  • Token ledger — per-issue and per-session token spend tracked in .autoship/token-ledger.json
  • Event-driven — bash monitors watch agent output, PR CI, and GitHub issues in real time
  • Durable state — survives restarts via .autoship/state.json and GitHub labels
  • Project context injection — CLAUDE.md/AGENTS.md conventions extracted at startup and injected into every dispatch prompt
  • Opus pre-dispatch advisor — complex/unsafe issues get a 200-word architectural brief before worker dispatch
  • Codex health-check & stuck tracking — fast-fail probe + per-tool stuck_count; exhausted at 3 consecutive STUCKs
  • Rust/Windows routingrust_unsafe task type + rust_windows profile detection route unsafe Rust to Claude

Install

claude plugin marketplace add Maleick/AutoShip && claude plugin install autoship@autoship

Done. Start a new session and run /autoship:start.

Requirements

  • jq — JSON processing (brew install jq)
  • gh — GitHub CLI, authenticated (brew install gh && gh auth login)
  • tmux — terminal multiplexer (brew install tmux)
  • Git repo with a GitHub remote and open issues

Optional agents (more dispatch power)

Tool What it adds Install
codex OpenAI-powered workers via JSON-RPC app-server (preferred for simple/medium) Codex CLI
gemini Google-powered workers Gemini CLI
gh copilot GitHub Copilot workers gh extension install github/gh-copilot
Claude fallback Always available — no install needed built-in

AutoShip detects available tools at startup and routes work accordingly.

Commands

Command What it does
/autoship:start Launch orchestration — classify issues, dispatch agents, loop until done
/autoship:plan Dry run — analyze issues and show dispatch plan without executing
/autoship:stop Gracefully stop all agents, save state, add autoship:paused labels
/autoship:status Live dashboard — active agents, quota bars, PR backlog, token spend

How It Works

GitHub issue opened
      ↓
Agent tier assigned automatically:
  • Config / YAML / docs   →  Claude Haiku
  • Single-module feature  →  Gemini Flash
  • Docs / README          →  GPT-4o Mini
  • Multi-file / security  →  Claude Sonnet
  • Architecture / advisor →  Claude Opus
      ↓
PR opened, CI runs, branch auto-deleted on merge
      ↓
You review decisions. Not boilerplate.
flowchart TD
    A([GitHub Issues]) --> B["/autoship:start"]
    B --> C[Classify Issues\ntask-type classifier]

    C -->|research / docs| D[Gemini · Haiku]
    C -->|simple_code / mechanical| E[Codex · Gemini · Copilot]
    C -->|medium_code| F[Codex-GPT · Sonnet]
    C -->|complex| G[Sonnet + Opus advisor]

    D & E & F & G --> H[Create Worktree\nWrite Prompt]
    H --> I[Agent Works]

    I --> J{Status Word}
    J -->|COMPLETE| K[Reviewer verifies\nAUTOSHIP_RESULT.md]
    J -->|BLOCKED\nSTUCK| L[Re-dispatch\nor Escalate]

    K -->|PASS| M[Open PR]
    K -->|FAIL| L
    L --> H

    M --> N[Wait for CI]
    N --> O[Merge + Close Issue]
    O --> B
Loading

Every agent writes AUTOSHIP_RESULT.md and emits COMPLETE, BLOCKED, or STUCK as its final line. AutoShip never trusts conversation output — only the result file.

Metrics can be refreshed from the same lifecycle boundary: merge, cleanup, and session close. A scheduled job can read .autoship/state.json, .autoship/token-ledger.json, and GitHub PR metadata, then regenerate the metrics snapshot in the README or wiki without hand-editing numbers.

Dispatch Matrix

Task type classification drives agent selection. Edit AUTOSHIP.md front matter to customize:

Task Type Primary Agent Fallback Last Resort
research Gemini Claude Haiku
docs Gemini Claude Haiku
simple_code Codex Spark Gemini · Copilot Claude Haiku
medium_code Codex GPT Claude Sonnet
mechanical Claude Haiku Gemini Codex Spark
ci_fix Claude Haiku Gemini
complex Claude Sonnet + Opus advisor Claude Sonnet retry Opus re-slice

Quota thresholds gate dispatch: if a tool hits 0% estimated quota, it's skipped and the next in line picks up the work. Quotas are estimated via decay model and reset daily.

How Codex dispatch works (no tmux)

AutoShip drives Codex via codex app-server using the JSON-RPC protocol over stdin/stdout FIFOs — no terminal pane required.

# What happens under the hood:
mkfifo .codex-stdin .codex-stdout
codex app-server < .codex-stdin > .codex-stdout &

# AutoShip sends:
{"jsonrpc":"2.0","method":"initialize",...}
{"jsonrpc":"2.0","method":"thread/start",...}
{"jsonrpc":"2.0","method":"turn/start","params":{"input":[{"type":"text","text":"<issue prompt>"}]}}

# Waits for:
{"method":"turn/completed",...}   # → COMPLETE
{"method":"turn/failed",...}      # → STUCK
{"method":"thread/tokenUsage/updated","params":{"totalTokens":1247}}

Token counts from thread/tokenUsage/updated events feed directly into the token ledger.

Verification pipeline

After every agent reports COMPLETE, a dedicated Sonnet reviewer runs:

  1. Validates AUTOSHIP_RESULT.md exists and passes content check
  2. Runs git diff main...HEAD to read the actual changes
  3. Checks every acceptance criterion against the diff
  4. Runs the test suite if one is detected
  5. Returns VERDICT: PASS | FAIL with confidence level

On FAIL: re-dispatches with failure context appended. After 2 fails: escalates to Sonnet. After 3 fails with Sonnet: spawns Opus advisor to re-slice the issue.

State durability

AutoShip keeps state in two places simultaneously:

  • .autoship/state.json — local, fast, rebuilt from GitHub labels on restart
  • GitHub labelsautoship:queued, autoship:in-progress, autoship:paused — durable, visible in the GitHub UI

If you kill the session and restart, /autoship:start reads the labels and picks up where it left off.

Architecture

Four-tier model: Bash watches → Haiku thinks → Sonnet orchestrates → Opus advises

flowchart LR
    subgraph Monitors["🔭 Monitors (bash)"]
        M1["monitor-agents.sh\nevery 5s"]
        M2["monitor-prs.sh\nevery 30s"]
        M3["monitor-issues.sh\nevery 60s"]
    end

    subgraph Triage["🧠 Triage"]
        H["Claude Haiku\nEvent Interpreter"]
    end

    subgraph Executor["⚙️ Executor"]
        S["Claude Sonnet\nOrchestrator"]
    end

    subgraph Advisor["🎯 Advisor"]
        O["Claude Opus\nStrategic Decisions"]
    end

    subgraph Workers["🤖 Workers"]
        W1[Codex]
        W2[Gemini]
        W3[Copilot]
        W4[Claude Haiku\nor Sonnet]
    end

    Monitors -->|events| H
    H -->|"event-queue.json"| S
    S <-->|escalations| O
    S -->|dispatch| Workers
    Workers -->|"AUTOSHIP_RESULT.md"| S
Loading
Tier Role Model
Monitors 3 bash scripts watching agents, PRs, issues bash
Triage Interprets events, categorizes issues, queues actions Claude Haiku
Executor Orchestration, dispatch, verification, PR pipeline Claude Sonnet
Advisor Strategic decisions, UltraPlan, escalations Claude Opus
Workers Actual code changes Codex / Gemini / Claude

Plugin Structure

.claude-plugin/
  plugin.json         ← hooks + metadata
  marketplace.json    ← one-liner install target
hooks/
  activate.sh         ← SessionStart: init + system context injection
  init.sh             ← .autoship/ directory structure
  detect-tools.sh            ← detect Codex/Gemini/Copilot + quota
  monitor-agents.sh          ← watch pane.log for status words (5s)
  monitor-prs.sh             ← watch PR CI + merge status (30s)
  monitor-issues.sh          ← poll GitHub for new/closed issues (60s)
  update-state.sh            ← write issue state + token counts
  cleanup-worktree.sh        ← archive result, remove worktree, close issue
  quota-update.sh            ← decay-based API quota estimation
  classify-issue.sh          ← label issues by task type (8 categories)
  dispatch-codex-appserver.sh← drive Codex via JSON-RPC (no tmux)
  emit-event.sh              ← atomic flock write to event-queue.json
skills/
  orchestrate/             ← orchestration protocol (v3)
  dispatch/           ← agent dispatch (worktree, prompt, third-party first)
  verify/             ← post-completion pipeline (verify, PR, merge)
  status/             ← status dashboard with quota bars
  poll/               ← GitHub issue sync safety net
agents/
  haiku-triage.md     ← event triage agent
  reviewer.md         ← verification reviewer
commands/
  start.md / stop.md / plan.md / status.md / autoship.md
AUTOSHIP.md             ← routing matrix + quota config (YAML front matter, hot-reload)

Benchmarks

Real dispatch results from AutoShip running on its own codebase:

Issue type Agent dispatched Time to merged PR Notes
Simple bug fix Codex Spark ~4 min JSON-RPC dispatch, no interaction
Hook refactor Claude Haiku ~7 min 2 files changed, tests pass
Routing matrix feat Claude Sonnet ~18 min AUTOSHIP.md + 3 hooks updated
Token ledger schema Claude Sonnet ~12 min New JSON schema + recording logic
Emit-event refactor Claude Haiku ~5 min 4 files deduplicated
Archival bug fix Claude Sonnet ~9 min Content validation added
Average ~9 min issue open → PR merged

AutoShip shipped all 9 v1.4.0 self-improvement issues — open to merged PR — in a single session. Zero manual PRs. Zero manual merges.

Automated Metrics Snapshot

Metric Source Refresh Notes
Open PRs gh pr list --state open nightly Backlog size
Merged PRs GitHub merge events on merge Completed work by session
Time to merge PR created/merged timestamps nightly Median and p90 are useful
Token spend .autoship/token-ledger.json on session close Per issue and per session

A cron, GitHub Action, or local scheduled run can regenerate this table and commit it back into the README/wiki as the live metrics snapshot.

Star This Repo

If AutoShip ships issues you didn't have to touch — leave a star. ⭐

Star History Chart

License

MIT

About

Issues in. PRs out. Opus budget: intact. — Autonomous multi-agent GitHub issue → PR pipeline for Claude Code.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages