GitHub - Maleick/AutoShip: Issues in. PRs out. Opus budget: intact. — Autonomous multi-agent GitHub issue → PR pipeline for Claude Code.

Docs • Before/After • Install • Commands • How It Works • Dispatch Matrix • Architecture • Benchmarks

Issues in. PRs out. Opus budget: intact.

A Claude Code plugin that autonomously routes GitHub issues to AI agents — Codex, Gemini, Copilot, or Claude — verifies their work, opens pull requests, merges them, and loops back for the next one. One command starts the loop. You watch it ship.

┌──────────────────────────────────────────┐
│  ISSUE ROUTING         ████████ AUTO     │
│  AGENT DISPATCH        ████████ AUTO     │
│  PR CREATION           ████████ AUTO     │
│  CI MONITORING         ████████ AUTO     │
│  MERGE + CLOSE         ████████ AUTO     │
│  YOUR EFFORT           █        ~5%      │
└──────────────────────────────────────────┘

Before / After

📋 Without AutoShip (20 issues)

Open GitHub issues backlog
Read each issue, estimate complexity
Pick an issue manually
Open a worktree or branch
Write a dispatch prompt from scratch
Paste it into Codex / Gemini / Claude
Watch the agent, babysit if it gets stuck
Review the output yourself
Open a PR manually
Wait for CI, watch for failures
Merge manually
Close the issue manually
Repeat × 19 more issues

Time: 3–8 hrs. You did most of it.

🚀 With AutoShip (20 issues)

/autoship:start

AutoShip reads your open issues, classifies each one (research / docs / simple code / complex), picks the best agent for the job, creates an isolated worktree, dispatches the agent with a focused prompt, verifies the result against acceptance criteria, opens a PR, waits for CI, merges, closes the issue — and immediately starts the next one.

You come back to merged PRs.

Time: you typed one command.

Same issues. One command. Brain free.

Third-party first — burns Codex, Gemini, and Copilot quota before touching Claude tokens
Parallel workers — up to 20 issues in flight simultaneously
Task-type routing — classifies issues into 8 categories (incl. rust_unsafe), routes each to the best agent
Live routing config — edit AUTOSHIP.md front matter to change agent priorities, takes effect immediately
Verification pipeline — every result reviewed against acceptance criteria before a PR opens
Token ledger — per-issue and per-session token spend tracked in .autoship/token-ledger.json
Event-driven — bash monitors watch agent output, PR CI, and GitHub issues in real time
Durable state — survives restarts via .autoship/state.json and GitHub labels
Project context injection — CLAUDE.md/AGENTS.md conventions extracted at startup and injected into every dispatch prompt
Opus pre-dispatch advisor — complex/unsafe issues get a 200-word architectural brief before worker dispatch
Codex health-check & stuck tracking — fast-fail probe + per-tool stuck_count; exhausted at 3 consecutive STUCKs
Rust/Windows routing — rust_unsafe task type + rust_windows profile detection route unsafe Rust to Claude

Install

claude plugin marketplace add Maleick/AutoShip && claude plugin install autoship@autoship

Done. Start a new session and run /autoship:start.

Requirements

jq — JSON processing (brew install jq)
gh — GitHub CLI, authenticated (brew install gh && gh auth login)
tmux — terminal multiplexer (brew install tmux)
Git repo with a GitHub remote and open issues

Optional agents (more dispatch power)

Tool	What it adds	Install
`codex`	OpenAI-powered workers via JSON-RPC app-server (preferred for simple/medium)	Codex CLI
`gemini`	Google-powered workers	Gemini CLI
`gh copilot`	GitHub Copilot workers	`gh extension install github/gh-copilot`
Claude fallback	Always available — no install needed	built-in

AutoShip detects available tools at startup and routes work accordingly.

Commands

Command	What it does
`/autoship:start`	Launch orchestration — classify issues, dispatch agents, loop until done
`/autoship:plan`	Dry run — analyze issues and show dispatch plan without executing
`/autoship:stop`	Gracefully stop all agents, save state, add `autoship:paused` labels
`/autoship:status`	Live dashboard — active agents, quota bars, PR backlog, token spend

How It Works

GitHub issue opened
      ↓
Agent tier assigned automatically:
  • Config / YAML / docs   →  Claude Haiku
  • Single-module feature  →  Gemini Flash
  • Docs / README          →  GPT-4o Mini
  • Multi-file / security  →  Claude Sonnet
  • Architecture / advisor →  Claude Opus
      ↓
PR opened, CI runs, branch auto-deleted on merge
      ↓
You review decisions. Not boilerplate.

flowchart TD
    A([GitHub Issues]) --> B["/autoship:start"]
    B --> C[Classify Issues\ntask-type classifier]

    C -->|research / docs| D[Gemini · Haiku]
    C -->|simple_code / mechanical| E[Codex · Gemini · Copilot]
    C -->|medium_code| F[Codex-GPT · Sonnet]
    C -->|complex| G[Sonnet + Opus advisor]

    D & E & F & G --> H[Create Worktree\nWrite Prompt]
    H --> I[Agent Works]

    I --> J{Status Word}
    J -->|COMPLETE| K[Reviewer verifies\nAUTOSHIP_RESULT.md]
    J -->|BLOCKED\nSTUCK| L[Re-dispatch\nor Escalate]

    K -->|PASS| M[Open PR]
    K -->|FAIL| L
    L --> H

    M --> N[Wait for CI]
    N --> O[Merge + Close Issue]
    O --> B

Every agent writes AUTOSHIP_RESULT.md and emits COMPLETE, BLOCKED, or STUCK as its final line. AutoShip never trusts conversation output — only the result file.

Metrics can be refreshed from the same lifecycle boundary: merge, cleanup, and session close. A scheduled job can read .autoship/state.json, .autoship/token-ledger.json, and GitHub PR metadata, then regenerate the metrics snapshot in the README or wiki without hand-editing numbers.

Dispatch Matrix

Task type classification drives agent selection. Edit AUTOSHIP.md front matter to customize:

Task Type	Primary Agent	Fallback	Last Resort
`research`	Gemini	Claude Haiku	—
`docs`	Gemini	Claude Haiku	—
`simple_code`	Codex Spark	Gemini · Copilot	Claude Haiku
`medium_code`	Codex GPT	Claude Sonnet	—
`mechanical`	Claude Haiku	Gemini	Codex Spark
`ci_fix`	Claude Haiku	Gemini	—
`complex`	Claude Sonnet + Opus advisor	Claude Sonnet retry	Opus re-slice

Quota thresholds gate dispatch: if a tool hits 0% estimated quota, it's skipped and the next in line picks up the work. Quotas are estimated via decay model and reset daily.

How Codex dispatch works (no tmux)

AutoShip drives Codex via codex app-server using the JSON-RPC protocol over stdin/stdout FIFOs — no terminal pane required.

# What happens under the hood:
mkfifo .codex-stdin .codex-stdout
codex app-server < .codex-stdin > .codex-stdout &

# AutoShip sends:
{"jsonrpc":"2.0","method":"initialize",...}
{"jsonrpc":"2.0","method":"thread/start",...}
{"jsonrpc":"2.0","method":"turn/start","params":{"input":[{"type":"text","text":"<issue prompt>"}]}}

# Waits for:
{"method":"turn/completed",...}   # → COMPLETE
{"method":"turn/failed",...}      # → STUCK
{"method":"thread/tokenUsage/updated","params":{"totalTokens":1247}}

Token counts from thread/tokenUsage/updated events feed directly into the token ledger.

Verification pipeline

After every agent reports COMPLETE, a dedicated Sonnet reviewer runs:

Validates AUTOSHIP_RESULT.md exists and passes content check
Runs git diff main...HEAD to read the actual changes
Checks every acceptance criterion against the diff
Runs the test suite if one is detected
Returns VERDICT: PASS | FAIL with confidence level

On FAIL: re-dispatches with failure context appended. After 2 fails: escalates to Sonnet. After 3 fails with Sonnet: spawns Opus advisor to re-slice the issue.

State durability

AutoShip keeps state in two places simultaneously:

.autoship/state.json — local, fast, rebuilt from GitHub labels on restart
GitHub labels — autoship:queued, autoship:in-progress, autoship:paused — durable, visible in the GitHub UI

If you kill the session and restart, /autoship:start reads the labels and picks up where it left off.

Architecture

Four-tier model: Bash watches → Haiku thinks → Sonnet orchestrates → Opus advises

flowchart LR
    subgraph Monitors["🔭 Monitors (bash)"]
        M1["monitor-agents.sh\nevery 5s"]
        M2["monitor-prs.sh\nevery 30s"]
        M3["monitor-issues.sh\nevery 60s"]
    end

    subgraph Triage["🧠 Triage"]
        H["Claude Haiku\nEvent Interpreter"]
    end

    subgraph Executor["⚙️ Executor"]
        S["Claude Sonnet\nOrchestrator"]
    end

    subgraph Advisor["🎯 Advisor"]
        O["Claude Opus\nStrategic Decisions"]
    end

    subgraph Workers["🤖 Workers"]
        W1[Codex]
        W2[Gemini]
        W3[Copilot]
        W4[Claude Haiku\nor Sonnet]
    end

    Monitors -->|events| H
    H -->|"event-queue.json"| S
    S <-->|escalations| O
    S -->|dispatch| Workers
    Workers -->|"AUTOSHIP_RESULT.md"| S

Tier	Role	Model
Monitors	3 bash scripts watching agents, PRs, issues	bash
Triage	Interprets events, categorizes issues, queues actions	Claude Haiku
Executor	Orchestration, dispatch, verification, PR pipeline	Claude Sonnet
Advisor	Strategic decisions, UltraPlan, escalations	Claude Opus
Workers	Actual code changes	Codex / Gemini / Claude

Plugin Structure

.claude-plugin/
  plugin.json         ← hooks + metadata
  marketplace.json    ← one-liner install target
hooks/
  activate.sh         ← SessionStart: init + system context injection
  init.sh             ← .autoship/ directory structure
  detect-tools.sh            ← detect Codex/Gemini/Copilot + quota
  monitor-agents.sh          ← watch pane.log for status words (5s)
  monitor-prs.sh             ← watch PR CI + merge status (30s)
  monitor-issues.sh          ← poll GitHub for new/closed issues (60s)
  update-state.sh            ← write issue state + token counts
  cleanup-worktree.sh        ← archive result, remove worktree, close issue
  quota-update.sh            ← decay-based API quota estimation
  classify-issue.sh          ← label issues by task type (8 categories)
  dispatch-codex-appserver.sh← drive Codex via JSON-RPC (no tmux)
  emit-event.sh              ← atomic flock write to event-queue.json
skills/
  orchestrate/             ← orchestration protocol (v3)
  dispatch/           ← agent dispatch (worktree, prompt, third-party first)
  verify/             ← post-completion pipeline (verify, PR, merge)
  status/             ← status dashboard with quota bars
  poll/               ← GitHub issue sync safety net
agents/
  haiku-triage.md     ← event triage agent
  reviewer.md         ← verification reviewer
commands/
  start.md / stop.md / plan.md / status.md / autoship.md
AUTOSHIP.md             ← routing matrix + quota config (YAML front matter, hot-reload)

Benchmarks

Real dispatch results from AutoShip running on its own codebase:

Issue type	Agent dispatched	Time to merged PR	Notes
Simple bug fix	Codex Spark	~4 min	JSON-RPC dispatch, no interaction
Hook refactor	Claude Haiku	~7 min	2 files changed, tests pass
Routing matrix feat	Claude Sonnet	~18 min	AUTOSHIP.md + 3 hooks updated
Token ledger schema	Claude Sonnet	~12 min	New JSON schema + recording logic
Emit-event refactor	Claude Haiku	~5 min	4 files deduplicated
Archival bug fix	Claude Sonnet	~9 min	Content validation added
Average	—	~9 min	issue open → PR merged

AutoShip shipped all 9 v1.4.0 self-improvement issues — open to merged PR — in a single session. Zero manual PRs. Zero manual merges.

Automated Metrics Snapshot

Metric	Source	Refresh	Notes
Open PRs	`gh pr list --state open`	nightly	Backlog size
Merged PRs	GitHub merge events	on merge	Completed work by session
Time to merge	PR created/merged timestamps	nightly	Median and p90 are useful
Token spend	`.autoship/token-ledger.json`	on session close	Per issue and per session

A cron, GitHub Action, or local scheduled run can regenerate this table and commit it back into the README/wiki as the live metrics snapshot.

Star This Repo

If AutoShip ships issues you didn't have to touch — leave a star. ⭐

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
.autoship		.autoship
.claude-plugin		.claude-plugin
.github		.github
agents		agents
assets		assets
commands		commands
docs		docs
hooks		hooks
scripts		scripts
skills		skills
wiki		wiki
.gitignore		.gitignore
AUTOSHIP.md		AUTOSHIP.md
AUTOSHIP_ARCHITECTURE.md		AUTOSHIP_ARCHITECTURE.md
AUTOSHIP_SPEC.md		AUTOSHIP_SPEC.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Before / After

📋 Without AutoShip (20 issues)

🚀 With AutoShip (20 issues)

Install

Requirements

Optional agents (more dispatch power)

Commands

How It Works

Dispatch Matrix

Architecture

Plugin Structure

Benchmarks

Automated Metrics Snapshot

Star This Repo

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Before / After

📋 Without AutoShip (20 issues)

🚀 With AutoShip (20 issues)

Install

Requirements

Optional agents (more dispatch power)

Commands

How It Works

Dispatch Matrix

Architecture

Plugin Structure

Benchmarks

Automated Metrics Snapshot

Star This Repo

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages