Skip to content

Releases: justrach/codegraff2

codegraff v0.2.17

12 Jun 08:55

Choose a tag to compare

Changes

  • No changes

codegraff v0.2.16

03 Jun 05:08

Choose a tag to compare

codegraff v0.2.16

Released 2026-06-03

A fast follow-up to v0.2.15 that fixes two rough edges in the context stack
that release introduced, and ships the macOS desktop app (signed +
notarized) for the first time.

Fixes

@-mentioning a PDF/binary no longer errors

v0.2.15 tried to reflect a binary @-mention as a file reference, but it gated
that on a null-byte binary check while the file reader rejects binaries with
a magic-number (infer) check. The two disagree on files with an ASCII
header — a PDF's %PDF-1.x bytes have no null bytes — so the mention fell
through to the UTF-8 reader and still failed with "Binary files are not
supported. File detected as application/pdf"
.

Now a non-text/non-image @-mention resolves to the bare absolute path (plain
text, as if you had typed it). The agent treats it like any path and opens it
with the read tool, which already handles PDFs and images. The attachment path
also catches the reader's binary rejection directly, so the two detectors can no
longer disagree their way into an error.

Tool-result offload no longer fights the prompt cache

v0.2.15's lossless tool-result offload ran every turn unconditionally,
rewriting older tool-result messages in place (full → [offloaded] marker).
Because the prompt cache is prefix-based, rewriting a message before the kept
window broke the cache there and forced the whole keep-window to be reprocessed
at full (uncached) price every turn — roughly a 10× cost bump on the large
MCP/codedb results offloading was meant to save, and it fired even at trivial
context usage (e.g. 16k/180k).

Offloading is now gated on real context pressure (≥80% of the lossy
summarization threshold). Below that, the warm prefix cache is worth far more
than the few KB offloading would reclaim, so the context is left untouched and
the cache stays intact; offloading only kicks in as you approach the
summarization point — exactly when the cache would churn anyway.

Desktop app (macOS)

The Codegraff desktop app — a Tauri GUI with first-class slash commands,
contributed by @pranavp311 — is now available as a signed and notarized, universal (arm64 + x86_64) macOS build. It runs on the same core as the CLI/TUI.

Contributors

  • @pranavp311 — the Tauri desktop app (feat(gui): add Tauri desktop app,
    Add desktop slash commands) and the macOS GUI it's built from.

Install

curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | sh

Pin this version with sh install.sh v0.2.16. macOS graff/codegraff binaries
are codesigned with a Developer ID Application certificate (hardened runtime) and
notarized by Apple; the CLI ships for macOS (arm64 + x86_64), Linux
(gnu/musl × x86_64/aarch64), Windows (x86_64/aarch64), and Android (graff). The
desktop app .dmg is a notarized Apple-Silicon (arm64) build.

Upgrade notes

codegraff v0.2.15

03 Jun 03:51

Choose a tag to compare

codegraff v0.2.15

Released 2026-06-03

This release is about reach and resilience. codegraff gains a native desktop
app
(a Tauri GUI with first-class slash commands) to sit alongside the CLI and
TUI, so the agent is now usable from a window as well as a terminal. Under the
hood, 0.2.15 lands the full context engineering stack — a two-tier
reversible tool-result offload
, a lean tools_list that cuts the tool
catalog from ~46 KB to ~11.5 KB, live-turn preservation so compaction stops
re-answering the first prompt, and MCP-server globs surfaced as first-class
tools
(codedb included). It fixes the long-standing papercut where @-mentioning
a PDF or other binary errored out
— those mentions now reflect the file's
absolute path so the read tool takes over. The SDKs keep hardening
(TypeScript can never publish without its native loader again, Python gains
JSON-schema / structured-output mode
, agent selection, and headless MCP trust),
and codex gpt-5.5 moves to a 1,000,000-token context window.

macOS binaries for this release are signed with a Developer ID Application
certificate (hardened runtime) and notarized by Apple
, and every platform's
assets are versioned 1:1 with the v0.2.15 tag so install.sh resolves them
cleanly.

Highlights

  • New Tauri desktop app (codegraff-gui) with desktop slash commands — a
    windowed coding-agent interface powered by the same core as the CLI/TUI.
  • @-mentioning a PDF or binary no longer errors. The mention now reflects
    the file's absolute path as a <file_reference>, handing the agent off to
    the read tool instead of failing with "Binary files are not supported".
  • Two-tier reversible tool-result offload for compaction — large tool
    results are offloaded in tiers and can be brought back, instead of being lost.
  • Lean tools_list: one-line tool summaries shrink the tool catalog from
    ~46 KB to ~11.5 KB, leaving far more room for real work.
  • Live-turn preservation in compaction — the agent no longer re-answers the
    first prompt after a compaction.
  • MCP-server globs are now first-class tools, so specific MCP servers (e.g.
    codedb) surface their tools directly rather than hiding behind a generic
    wrapper.
  • codex gpt-5.5 context window raised to 1,000,000 tokens.
  • TypeScript SDK never ships without its N-API loader again — a publish-time
    guarantee that @codegraff/sdk always includes the native binding it needs.
  • Python SDK gains JSON-schema / structured-output mode, agent selection,
    per-call prompt/temperature overrides, and a headless MCP-trust fix.
  • Signed + notarized macOS binaries (arm64 and x86_64), version-matched to
    the release so the installer never 404s.

Desktop app (codegraff-gui)

  • A new Tauri-based desktop application lands under gui/, giving codegraff a
    native windowed interface in addition to the CLI and TUI. It is built on the
    same forge_* core crates as the rest of the project (consumed as path
    dependencies), so behavior stays consistent across surfaces.
  • Desktop slash commands bring the familiar slash-command workflow into the
    GUI.
  • The GUI is its own decoupled cargo workspace (gui/src-tauri) so its build
    artifacts and lockfile stay isolated from the CLI workspace, and it is versioned
    in lockstep with the CLI (now 0.2.15).

Attachments & @-mentions

  • Binary / PDF @-mentions are no longer a dead end. Previously, mentioning a
    non-text file (a PDF, an image, any binary) returned "Binary files are not
    supported"
    and stopped there. Now the mention reflects the file's absolute
    path
    and emits a <file_reference path=… mime_type=…> marker, which routes the
    agent to the read tool — the correct handler for that content — instead of
    erroring.

Context engineering & compaction

This release completes the context stack that 0.2.14 began:

  • Two-tier reversible tool-result offload. Oversized tool results are offloaded
    in two tiers and remain recoverable, so compaction reclaims context without
    permanently dropping information the agent may still need.
  • Live-turn preservation. Compaction now protects the in-flight turn, fixing
    the bug where the agent would re-answer the very first prompt after
    compacting.
  • Lean tools_list. Tool definitions are summarized to one line each,
    taking the serialized tool catalog from ~46 KB down to ~11.5 KB — a large
    reduction in fixed per-request overhead.
  • MCP-server globs as first-class tools. Specific MCP servers' tools (such as
    codedb's) are now surfaced directly in the tool list rather than behind a
    single generic glob, so they're discoverable and callable like any built-in.

Providers & models

  • codex gpt-5.5 is configured with a 1,000,000-token context window,
    unlocking long-context runs on that model.

TypeScript SDK (@codegraff/sdk)

  • Never publish without the native loader. The publish pipeline now guarantees
    @codegraff/sdk ships with its N-API loader every time — the package can no
    longer be released in a broken, loader-less state.
  • Publishing fixes to ensure the main @codegraff/sdk package is the one that
    goes out, with the SDK moving to 0.2.4.
  • response_format plumbing added to the core so the 0.1.3 Python SDK's
    structured-output mode has the support it depends on.

Python SDK (codegraff)

  • JSON-schema / structured-output mode (SDK 0.1.3): constrain model output
    to a schema for reliable structured results.
  • Agent selection plus per-call prompt and temperature overrides.
  • Headless MCP-trust fix, so MCP servers work correctly in non-interactive
    (SDK/headless) usage.
  • BYOK fix: extra_params are now passed through to upsert_credential in the
    BYOK constructor.
  • Packaging & CI: full-matrix PyPI release, pyo3 upgraded 0.24 → 0.28 for
    Python 3.14 / 3.14t, protoc installed inside the manylinux container,
    aarch64-linux built on a native arm64 runner, a PyPI readme field so the
    package renders its README, and an honest platform-support note
    (mac-arm64 wheels today, no sdist).

CI, telemetry & docs

  • GRAFF_TRAJECTORY_ENDPOINT is baked into release builds from a CI secret, so
    released binaries point at the correct trajectory endpoint.
  • README rewritten around the CLI + SDKs, with in-depth TypeScript and
    Python guides.
  • Housekeeping: forge.md renamed to codegraff.md, and duplicate pipeline
    tests dropped.

Install

curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | sh

The installer detects your platform and pulls the matching graff and codegraff
binaries plus fzf and codedb. To pin this exact version:

curl -fsSL https://github.com/justrach/codegraff/releases/download/v0.2.15/install.sh | sh v0.2.15

Platforms & artifacts

Platform Targets
macOS aarch64-apple-darwin, x86_64-apple-darwin (signed + notarized)
Linux x86_64/aarch64, both gnu and musl
Windows x86_64/aarch64 (pc-windows-msvc)
Android aarch64-linux-android (graff only)

macOS signing & notarization

  • The macOS graff and codegraff binaries are codesigned with the Developer ID
    Application: Rachit Pradhan (Team WWP9DLJ27P)
    certificate using a hardened
    runtime
    and a secure timestamp, then notarized by Apple.
  • Because they are bare CLI executables (not .app/.dmg/.pkg bundles), the
    notarization ticket is validated online by Gatekeeper on first run; there is
    nothing to staple. The install.sh path additionally clears the quarantine
    attribute on the installed binary.
  • Each tool is published both as a raw binary (e.g. graff-aarch64-apple-darwin,
    consumed by install.sh) and as a .zip (e.g. graff-aarch64-apple-darwin.zip,
    the notarized archive). All four binaries (graff/codegraff × arm64/x86_64)
    report version 0.2.15, matching the v0.2.15 tag so install.sh — whether it
    resolves latest or a pinned v0.2.15 — always finds the right asset.

Upgrade notes

  • This is a non-breaking release; existing configuration and credentials carry
    over. Re-run the installer (above) or download the new binaries from the release
    assets.
  • The full runbook for cutting the signed + notarized macOS artifacts lives at
    docs/releases/macos-build-and-notarize.md.

v0.2.14

01 Jun 02:30

Choose a tag to compare

codegraff v0.2.14

Released 2026-06-01

This release is about SDKs growing up. codegraff now ships a mouldable, Next.js-ready TypeScript SDK (@codegraff/sdk) and a new Python SDK (codegraff), both guarded by dhi-backed input validation so bad options fail fast with clear errors instead of surprising you at runtime. Alongside the SDKs, 0.2.14 introduces client trajectory upload with opt-in OTLP GenAI telemetry and RLVR reward labels, makes MCP / tool-grammar calls visible across the stream, hooks, and trajectory, hardens the orchestrator against the pending-todos doom loop, tightens compaction behavior for MCP-heavy and image-heavy contexts, and reworks the usage tracker to be privacy-respecting by default. It also folds in three upstream fixes from tailcallhq/forgecode.

Highlights

  • New first-class TypeScript SDK (@codegraff/sdk) with a one-call BYOK entry point, a mouldable system prompt, a cloud Sandbox class, and Next.js-ready packaging.
  • New first-class Python SDK (codegraff) sharing the same validation contract as the TypeScript SDK.
  • dhi-backed input validation in both SDKs validates Graff.init and chat options before any request leaves the client.
  • Client trajectory upload plus opt-in OTLP GenAI telemetry, with RLVR sparse-ORM reward labels attached to uploaded trajectories.
  • MCP / tool-grammar calls are now surfaced consistently across the stream, hooks, and trajectory.
  • Orchestrator reliability fixes that break the pending-todos doom loop.
  • Compaction now scales to 90% of the context window (codex parity) and accounts for JSON/image tool results.
  • The usage tracker no longer harvests user email, and the opt-out env var is renamed to a real opt-out: GRAFF_TRACK.
  • SDK versions are now 1:1 with the CLI, bumping @codegraff/sdk from 0.2.0 to 0.2.3 with native codedb on install.

TypeScript SDK (@codegraff/sdk)

  • Ships as a proper N-API package built and published through cross-platform CI, with native codedb installed on npm install.
  • One-call BYOK initialization:
    Graff.init({ provider, apiKey, model, maxTokens })
    bring-your-own-key auth with thin JS wrappers over the native core.
  • Mouldable system prompt so integrators can shape the agent's base instructions to fit their product.
  • New Sandbox class for managing cloud sandboxes from the SDK.
  • Next.js-ready packaging, with a companion Next.js example app demonstrating end-to-end usage.
  • dhi-backed validation of Graff.init/chat options (see "Validation & verification").
  • Versioned 1:1 with the CLI: @codegraff/sdk moves from 0.2.0 to 0.2.3.

Python SDK (codegraff)

  • New Python SDK sharing the same option contract and BYOK model as the TypeScript SDK.
  • dhi-backed input validation for Graff.init/chat options, pinned to dhi >= 1.3.3 (native cp314 / cp314t wheels).
  • New turboAPI example: a dhi-validated, SSE-streaming server showing how to put the Python SDK behind an HTTP API.

Telemetry & RLVR

  • Client trajectory upload: completed agent trajectories can be uploaded from the client.
  • Opt-in OTLP GenAI telemetry: standards-aligned OpenTelemetry GenAI signals, off unless you explicitly enable them.
  • RLVR sparse-ORM reward labels on uploaded trajectories, derived from the run outcome:
    • accepted → reward 1.0
    • error → reward 0.0
    • incomplete → reward null (masked, so partial/unfinished runs don't pollute the reward signal)
  • This makes uploaded trajectories directly usable for Reinforcement Learning from Verifiable Rewards workflows, where only verifiably-terminal outcomes carry a dense reward and everything else is masked.
  • Test coverage for all three outcome paths (accepted / incomplete / error).

MCP & tool calls

  • MCP / tool-grammar calls are now surfaced across the stream, hooks, and trajectory, so tool activity that previously stayed hidden is now observable end-to-end.
  • Unwrap double-encoded tool args: arguments that arrived JSON-encoded-inside-JSON are now correctly decoded before dispatch.
  • MCP tools are sent non-strict, improving compatibility with MCP servers whose schemas don't satisfy strict tool-call constraints.

Orchestrator & reliability

  • Break the pending-todos doom loop: the orchestrator no longer gets stuck re-running because of lingering pending todos.
    • Bounded End-hook rearms cap how many times the End hook can re-arm itself.
    • A doom-loop detector strips volatile keys before comparing state, so cosmetic churn no longer looks like genuine new work.

Compaction

  • Threshold scaled to 90% of the context window for parity with codex behavior.
  • JSON and image tool results are now counted toward context size, so MCP-heavy contexts actually compact instead of silently overflowing.
  • Cost-only ping frames no longer shadow real token usage, fixing cases where bookkeeping frames masked the true token count used to drive compaction.

Privacy

  • The usage tracker no longer harvests user email.
  • The opt-out environment variable is renamed from FORGE_TRACKER to GRAFF_TRACK, and is now a real opt-out that genuinely disables tracking.

Upstream sync (tailcallhq/forgecode)

Pulled in three fixes from upstream tailcallhq/forgecode:

  • #3418 — apply the Opus 4.7 API contract to Claude Opus 4.8, keeping the newer model on a known-good request/response contract.
  • #3414 — add provider.json + vertex.json model entries for newly supported models.
  • #3350replay reasoning_content for Xiaomi MiMo tool calls, so reasoning is preserved correctly across tool-call turns for that model.

Validation & verification

  • Both SDKs validate inputs with dhi, a Zod-4- / Pydantic-compatible, SIMD-WASM validator. Options passed to Graff.init and chat are checked against a shared schema contract in both the TypeScript and Python SDKs, so invalid configuration is rejected at the boundary rather than deep inside a request.
  • dhi is pinned to >= 1.3.3, which provides native cp314 / cp314t wheels for the Python side.
  • Added regression tests covering RLVR outcome labeling for the accepted, incomplete, and error paths.

Packaging & versioning

  • @codegraff/sdk bumped 0.2.0 -> 0.2.3, now versioned 1:1 with the CLI.
  • Native codedb is installed on package install, so the SDK is usable without a separate build step.
  • New example apps land alongside the SDKs: a Next.js example (TypeScript) and a turboAPI example (Python).

Install

curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | sh

Once v0.2.14 is published it becomes latest; until then pin the tag: .../releases/download/v0.2.14/install.sh.

Prebuilt binaries are attached for every supported platform:

Platform Arch Assets
Linux (gnu) x86_64, aarch64 graff-*-unknown-linux-gnu, codegraff-*-unknown-linux-gnu
Linux (musl) x86_64, aarch64 graff-*-unknown-linux-musl, codegraff-*-unknown-linux-musl
Windows x86_64, aarch64 graff-*-pc-windows-msvc.exe, codegraff-*-pc-windows-msvc.exe
Android aarch64 graff-aarch64-linux-android
macOS x86_64, aarch64 graff-*-apple-darwin, codegraff-*-apple-darwin (+ .zip)

macOS binaries — signed & notarized

The macOS graff and codegraff binaries (both x86_64-apple-darwin and aarch64-apple-darwin) are:

  • Signed with Developer ID Application: Rachit Pradhan (WWP9DLJ27P)
  • Built with hardened runtime + secure timestamp
  • Notarized by Apple — submission 34a24b8c-e5f4-454a-905d-20cace04840a, status Accepted

They launch without Gatekeeper warnings. As bare CLI tools they can't be stapled, so first launch performs a one-time online notarization check (needs network).

v0.2.13

28 May 18:37

Choose a tag to compare

Highlights

  • Fix: Codex response.completed / response.incomplete events are now parsed correctly — restores gpt-5.5 (and other models) over the Codex backend (HTTP/SSE and WebSocket transports). Without this, codex turns would fail to deserialize because the backend omits oai::Response.output on terminal events. Cherry-picked from upstream #3405; extended on our branch to handle the same dispatch over the Codex WebSocket transport (which upstream didn't carry when the fix landed).
  • New: Python SDK (codegraff on PyPI — published separately under the sdk/python-v0.1.0 tag). PyO3 bindings exposing the codegraff agent to Python; mirrors the TS SDK's Graff / GraffSession / Sandbox surface.

Install

curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | sh

macOS binaries

All four macOS binaries (graff and codegraff, x86_64 + arm64) are codesigned with Developer ID Application: Rachit Pradhan (WWP9DLJ27P) (hardened runtime + Apple timestamp) and notarized with Apple. Gatekeeper accepts them online on first launch.

Full changelog

v0.2.12...v0.2.13

v0.2.12

26 May 06:10

Choose a tag to compare

Full Changelog: v0.2.1...v0.2.12

v0.2.1

25 May 05:33

Choose a tag to compare

What's new

  • Codegraff gateway providergraff provider login codegraff authenticates via device flow (opens browser, approve in one click, done)
  • 4 models available through the gateway: DeepSeek V4 Pro, GPT-5.5, Grok Build (xAI), Kimi K2.6 (Moonshot)
  • Pay-as-you-go credits, no subscription — top up from the dashboard at codegraff.com/dashboard/billing

Install

curl -fsSL https://codegraff.com/install-graff.sh | sh

Or download the macOS arm64 binary from the assets below.

v0.2.0

21 May 07:29

Choose a tag to compare

Highlights

WebSocket codex-parity sweep — all 8 deepwiki audit gaps closed. The chatgpt.com Codex backend's WebSocket transport now ships at full wire-level parity with upstream openai/codex for the things that affect production behavior. Together with everything from v0.1.9, this is a substantial minor release.

What's new vs v0.1.9

  • Wrapped WebSocket errors now map to typed errors (#117): when the chatgpt.com Codex backend sends a type: "error" text frame instead of a proper HTTP-level status (429 plan-usage, 401 token-expiry, 5xx, websocket_connection_limit_reached), we now decode the envelope and surface the same Error::UsageLimitReached / Error::InvalidStatusCode / forge_domain::Error::Retryable the HTTP path would have. Before: the orchestrator retried with generic EmptyCompletion. Now: existing retry / refresh / fallback policies fire identically across transports.
  • x-codex-beta-features (#118): comma-separated session-scoped beta-feature opt-ins via openai_responses_beta_features = ["foo", "bar"] in .forge.toml.
  • x-codex-turn-metadata (#125): per-turn observability metadata via openai_responses_turn_metadata = { repo = "...", env = "..." }. JSON-encoded, BTreeMap-stable key ordering.
  • x-openai-attestation (#125): client attestation token forwarding via openai_responses_attestation_token = "v1.<opaque>". Static-token model (upstream uses rotating tokens via JSON-RPC; followed-up separately if needed).
  • x-codex-turn-state (#125): sticky backend-routing token captured from WS upgrade response headers and replayed on subsequent reconnects within the conversation. Mirrors ModelClientSession::turn_state's capture-then-replay pattern.
  • Handshake probe + close-frame diagnostic (#125): 50ms post-upgrade poll for an immediate Close frame, surfacing the close code + reason as a typed ConnectError so policy rejections (rate-limited, auth-invalid) come with actionable context instead of an opaque "stream ended".
  • Structured WebSocket telemetry (#125): tracing::info!/warn! events tagged event.kind = "codex.websocket_{connect,request,event}" with duration_ms, success, error.message, sub-kinds for idle_timeout / transport_error / stream_end / response_completed / response_failed. Wire your own OTel / JSON / stdout subscriber.
  • Connection-only WebSocket prewarm (#125): opt-in via openai_responses_prewarm = true; the new OpenAIResponsesProvider::preconnect_websocket(conv_id) API opens + stashes the socket so the first real turn skips TLS+upgrade latency. Connection-only for now (no generate=false stub roundtrip yet).

Carried over from v0.1.9

  • Codex Responses-API parity rounds 1–3 (#66#106): wire-level parity for parallel_tool_calls, client_metadata body fields incl. W3C trace context, SSE output_item.done + reasoning_summary_part.added, structured 429 UsageLimitReached / UsageNotIncluded, reactive 401→refresh + proactive refresh-before-expiry.
  • ReadWithoutWriteDetector (#109, closes #27): orchestrator hook for the analysis-paralysis loop pattern.
  • macOS screenshot drag-drop (#52, closes #51).
  • MCP completeness (#108, closes #26): Audio / ResourceLink / Resource / structuredContent / output_schema variant coverage.
  • Subagent trajectory recording (#112/#114, closes #33): /trace <root_conversation_id> walks the whole subagent tree with parent_agent_id linkage. Live-verified.
  • Credentials hardening (#69): ~/forge/credentials.json is now chmod 0o600.
  • WS protocol pin (#66): OpenAI-Beta: responses_websockets=2026-02-06 on the upgrade.

Install

Recommended (POSIX shell installer, auto-detects OS + arch + libc):

curl -fsSL https://github.com/justrach/codegraff/releases/download/v0.2.0/install.sh | sh

Supported binary downloads

Platform graff codegraff
macOS arm64 (Apple Silicon) graff-aarch64-apple-darwin codegraff-aarch64-apple-darwin
Linux x86_64 (glibc) graff-x86_64-unknown-linux-gnu codegraff-x86_64-unknown-linux-gnu
Linux x86_64 (musl, static) graff-x86_64-unknown-linux-musl codegraff-x86_64-unknown-linux-musl
Linux aarch64 (glibc) graff-aarch64-unknown-linux-gnu codegraff-aarch64-unknown-linux-gnu
Linux aarch64 (musl, static) graff-aarch64-unknown-linux-musl codegraff-aarch64-unknown-linux-musl
Windows x86_64 (MSVC) graff-x86_64-pc-windows-msvc.exe codegraff-x86_64-pc-windows-msvc.exe
Windows aarch64 (MSVC) graff-aarch64-pc-windows-msvc.exe codegraff-aarch64-pc-windows-msvc.exe

CodeDB-bundled tarballs for the Linux-x86_64 line are available as graff-x86_64-unknown-linux-{gnu,musl}-bundle.tar.gz.

Signing / build provenance

Platform Status
macOS arm64 Codesigned (Developer ID WWP9DLJ27P, hardened runtime, RFC 3161 timestamp) and notarized via Apple notary service. Built locally on the maintainer's workstation from the v0.2.0 tag.
Linux + Windows Unsigned. Built by this CI run via the tag-driven Multi Channel Release workflow on ubuntu-latest / windows-latest runners.
macOS x86_64 (Intel) Not shipped in this release — pending CODEDB_LOCAL_APPLE_* GitHub Secrets configuration for CI signing.

Verification

graff --version  # → graff 0.2.0
codesign -dvv $(which graff)  # macOS only → Authority=Developer ID Application: Rachit Pradhan (WWP9DLJ27P)

Known gaps (not shipped, deferred follow-ups)

  • aarch64-linux-android — build fails because arboard (clipboard library in forge_main) does not compile on Android. Needs #[cfg(not(target_os = "android"))] guards.
  • x86_64-apple-darwin (Intel mac) binary — needs the Apple CI secrets so it can land in the same release pipeline as the other platforms.
  • TUI startup wiring for preconnect_websocket() — building blocks shipped (#125), but no caller fires it from session-start yet. Suggested follow-up: tokio::spawn it after auth resolves, before the TUI's main event loop blocks on input.
  • HTTP/SSE-side x-codex-turn-state capture — WS-only today (#125); SSE response-header extraction needs an eventsource-stream plumbing change.
  • prewarm_websocket generate=false stub — the additional roundtrip that primes previous_response_id. Benchmark first.
  • Rotating attestation tokens — openai_responses_attestation_token is static today. If rotation is needed, add openai_responses_attestation_command that shells out.

v0.1.9 — Codex Responses parity · analysis-paralysis fix · MCP coverage · macOS screenshots

20 May 19:07
cf5bb63

Choose a tag to compare

Note: this release was re-cut on 2026-05-21 to include the trajectory-recording fixes for subagents (#112, #114, closes #33). If you downloaded graff from the original v0.1.9 binary on 2026-05-20, please re-install — your /trace won't walk subagent trees correctly.

Highlights

  • Codex Responses-API parity round 1–3 (#66#106): wire-level parity with upstream openai/codex for the chatgpt.com Codex backend across body fields (parallel_tool_calls, client_metadata with installation/window IDs + W3C trace context), SSE event coverage (output_item.done, reasoning_summary_part.added), structured 429 envelopes (UsageLimitReached / UsageNotIncluded), reactive 401 → token refresh, and proactive refresh-before-expiry.
  • ReadWithoutWriteDetector (#109, closes #27 P0): new request-phase orchestrator hook that catches the analysis-paralysis pattern — re-reading the same files without writing code — and injects a forcing-function reminder.
  • Parallel agent dispatch — and observability for it (#112, #114, closes #33): the orchestrator has long fanned Task tool calls out in parallel via futures::join_all, but until this release the child agents' work was invisible to /trace. The trajectory recorder now records every dispatched child under the root's conversation_id with parent_agent_id linked to its dispatcher, so a single /trace <root_conversation_id> walks the whole fan-out tree. See Spotlight: parallel agent dispatch below.
  • macOS screenshot drag-drop (#52, closes #51): TUI now correctly recognises temporary screenshot paths and file:// URLs and attaches them as images instead of pasting raw text.
  • MCP completeness (#108, closes #26): full content-variant coverage (Audio, ResourceLink, Resource, structuredContent, output_schema/annotations/title metadata) on the rmcp adapter.
  • Credentials hardening (#69): ~/forge/credentials.json is now chmod 0o600 after writes.
  • WS protocol pin (#66): OpenAI-Beta: responses_websockets=2026-02-06 header for the chatgpt.com WS upgrade.

Spotlight: parallel agent dispatch

graff has three layers of parallelism for sub-agents, and v0.1.9 closes the missing one (observability). They compose as follows:

Layer 1 — wire-level: the LLM emits multiple tool calls per turn

By default we send parallel_tool_calls: true to OpenAI / Codex / Anthropic, and the per-model supports_parallel_tool_calls capability in forge_domain/src/agent.rs declares which models accept it:

// crates/forge_app/src/dto/openai/request.rs:408
parallel_tool_calls: Some(true),   // transformers downgrade if a model
                                   // doesn't support it

This was wired across the Codex Responses backend in #95 (round-2 parity, backported via #100). Without it the model emits one tool call per turn and there's no parallelism to dispatch.

Layer 2 — orchestrator: fan Task calls out, run the rest sequentially

Orchestrator::execute_tool_calls partitions the tool calls the model emitted into Task (dispatch-a-subagent) versus everything else. Task calls run concurrently via futures::join_all; everything else stays sequential so the UI notifier handshake and per-tool hooks behave the same as before:

// crates/forge_app/src/orch.rs:108–135
let (task_calls, other_calls): (Vec<_>, Vec<_>) =
    tool_calls.iter().partition(is_task_call);

// record dispatches on parent's trajectory *before* kicking them off
if let Some(recorder) = &self.trajectory_recorder {
    for tc in &task_calls {
        recorder.record_tool_call(tc).await;
    }
}

let task_results = join_all(
    task_calls.iter().map(|tc|
        self.services.call(&self.agent, tool_context, (*tc).clone())
    ),
).await;

When ≥ 2 tool calls land in one assistant turn the REPL surfaces them with a banner so you can see the batch as a group:

⇉ 3 parallel tool calls (2× Task, read)

Layer 3 — observability: child events under the root's conversation

This is the piece v0.1.9 adds. Before #112/#114, Task dispatch trajectories looked like this: the parent's view of the dispatch (tool_call + tool_result rows for the Task itself) recorded fine, but every child agent's internal tool calls were dropped on the floor because AgentExecutor::execute constructed a fresh ToolRegistry via Services::call(...) with no trajectory repo threaded in.

PR #111/#112 plumbed the repo through ForgeApp::tool_registry, but the orchestrator's actual dispatch path goes via services.call(...) which builds a fresh registry per call through the blanket AgentService::call impl — so the recorder never reached the children in production.

PR #113/#114 (this release) finishes the wiring:

  • adds Services::trajectory_repo() so the blanket AgentService::call impl can thread the repo into the per-call ToolRegistry, and
  • threads parent_conversation_id through ToolCallContext so the child agent's events land under the root's conversation_id.

What /trace <root_id> now looks like for a parent that fanned out 3 Task calls in parallel:

     0  run     agent=forge
     1  call    task   agent=forge        ⇉ dispatched in parallel
     2  call    task   agent=forge        ⇉ dispatched in parallel
     3  call    task   agent=forge        ⇉ dispatched in parallel
       0  run     agent=sage     parent=forge
       1  call    read   agent=sage
       2  result read   agent=sage  duration=1ms
       3  end     agent=sage
       0  run     agent=grep     parent=forge
       1  call    grep   agent=grep
       2  result grep   agent=grep  duration=43ms
       3  end     agent=grep
       0  run     agent=read     parent=forge
       1  call    read   agent=read
       2  result read   agent=read  duration=2ms
       3  end     agent=read
     4  result task   agent=forge  duration=7984ms
     5  result task   agent=forge  duration=210ms
     6  result task   agent=forge  duration=87ms
     7  end     agent=forge

The three children are properly nested under the parent and timestamped independently, so you can see at a glance which fork dominated the latency budget.

Verifying it on your own runs

In the REPL:

/trace 20            # last 20 events on the current conversation
/trace all           # whole tree, walks subagent dispatches

Pipe-friendly from the shell (e.g. for grepping or diffing across runs):

graff conversation list                              # find the root id
graff conversation trace <root_conversation_id>      # mirrors /trace

Direct against the SQLite store:

-- ~/forge/.forge.db
SELECT seq, kind, agent_id, parent_agent_id
FROM trajectory_events
WHERE conversation_id = '<root_conversation_id>'
ORDER BY id;

You should see one root-agent run plus N child-agent runs, each carrying parent_agent_id linked back to its dispatcher, and child rows interleaving with the parent's tool_call/tool_result rows in seq order.

🚀 Features

  • feat(orch): ReadWithoutWriteDetector hook for analysis-paralysis loops (closes #27) (#109)
  • feat(openai-responses): inject W3C trace context into client_metadata (closes #104) (#106)
  • feat(openai-responses): per-model default_reasoning_level + prefer_websockets metadata (closes #102 / #103) (#105)
  • feat(provider): proactive OAuth refresh-before-expiry on credential load (closes #89) (#99)
  • feat(openai-responses): reactive 401 → token refresh + retry on Codex backend (closes #88) (#98)
  • feat(openai-responses): parse 429 UsageErrorResponse envelope from Codex backend (closes #90) (#97)
  • feat(openai-responses): handle output_item.done + reasoning_summary_part.added SSE events (closes #93 / #94) (#96)
  • feat(openai-responses): send parallel_tool_calls + Codex client_metadata body fields (closes #91 / #92) (#95)
  • feat(openai-responses): x-codex-window-id + opt-in timing metrics (#83)
  • feat(openai-responses): per-model default_verbosity / support_verbosity for Codex (#81)
  • feat(providers): add codex-auto-review to Codex catalog (#79)
  • feat(openai-responses): send Codex identity headers (originator, x-codex-installation-id) on chatgpt.com requests (#77)
  • feat(openai-responses): wire text.verbosity for gpt-5.x Codex models (#75)

🐛 Bug Fixes

  • fix(trajectory): record subagent runs under parent's conversation_id (closes #33) (#112, #114)
  • fix(openai-responses): pin Codex Responses WS protocol via OpenAI-Beta (#66)
  • fix(security): chmod credentials file to 0o600 after write (closes #68) (#69)
  • fix(openai-responses): Codex header fidelity bugs from deepwiki audit (#85)
  • fix: support macOS screenshot image drops (closes #51) (#52)

🧰 Maintenance

  • test(mcp): cover Audio + ResourceLink JSON fallback paths (closes #26) (#108)
  • fix(ci): enable pixo simd for linux coverage builds (#71)
  • fix(test): refresh stale tool-description snapshots (#73, #74)

Closed issues

#6, #9, #11, #26, #27, #33, #51, #65, #68, #88, #89, #90, #91, #92, #93, #94, #102, #103, #104

What's deferred

The OpenAI/Codex parity tracker (#65) closed with these still-open follow-ups for the orchestrator-level work:

  • 4 remaining Codex identity headers (x-codex-turn-state, x-codex-turn-metadata, x-codex-parent-thread-id, x-openai-subagent) — need turn-descriptor + subagent-context propagation
  • /responses/compact and /memories/trace_summarize endpoints
  • AgentIdentity auth mode

Install

Recommended (POSIX shell installer, auto-detects OS + arch + libc):

curl -fsSL https://github.com/justrach/codegraff/releases/download/v0.1.9/install.sh | sh

Supported binary downloads

Platform graff codegraff
macOS arm64 (Apple Silicon) graff-aarch64-apple-darwin [codegraff-aarch64-apple-darwin](https://github.com/justrach/codegra...
Read more

v0.1.5 — subagent model override · observability · debug-mcp

07 May 09:12

Choose a tag to compare

What's new

feat(task) — per-spawn model override for subagents

The Task tool now accepts an optional model field, so the parent can
spawn a subagent on a different model from itself in the same run:

{
  "tasks": ["summarize crates/forge_app"],
  "agent_id": "muse",
  "model": "gpt-5.5-medium"
}

The override is validated against the agent's already-authenticated
provider — pass a model that isn't on the parent agent's authenticated
provider list and you get a clean error listing what is, instead of a
silent cross-provider switch or a confused 401 deep inside the request
path. The subagent banner surfaces the override
(MUSE [Agent · gpt-5.5-medium]) so the parent/child story stays
legible in the trace.

Plumbed end-to-end: TaskInput.modelChatRequest.model_override
validated in ForgeApp::chatagent.model(override) → recorded in
the trajectory as a requested-vs-resolved diagnostic round-trip.

feat(observability)agent_run + agent_run_end trajectory events

Two new TrajectoryPayload variants make per-spawn behaviour
inspectable:

  • agent_run captures the spawn diagnostic: agent_id,
    parent_agent_id, requested_model, resolved_model, plus a
    agent_version SHA-256 prefix of the agent's system prompt template.
    Two edits to forge.md are still both forge but produce different
    hashes — so a rollup query can group runs by behaviourally distinct
    variants rather than by agent_id alone.
  • agent_run_end carries the per-spawn fitness vector: turns,
    prompt/completion tokens, total tool calls, tool errors, wall-clock
    ms, and an interrupt_reason if the run terminated abnormally. Sums
    what's already computed turn-by-turn so the bottom of run() writes
    the full picture without re-walking events.

/trace renders both, including the requested-vs-resolved diff when
they differ. This is the substrate for an empirical archive of agent
variants — no mutation logic yet, just observation.

feat(debug)graff debug last-mcp-call

A new top-level subcommand that prints recent MCP request/response
pairs as JSONL, no log-spelunking required:

graff debug last-mcp-call -n 5 --server codedb --tool codedb_bundle --pretty

forge_infra::mcp_debug writes a ring buffer at
<base>/debug/mcp-recent.jsonl capturing the literal arguments that
hit rmcp::call_tool, the round-trip duration, and the outcome
(returned vs failed + error). When an MCP server complains "arguments
arrived empty," this pins the loss to the wire vs upstream of the
client.

fix(tools) — rewrite oneOfanyOf for OpenAI Responses

OpenAI's tool-schema validator (including
chatgpt.com/codex/responses) rejects oneOf outright with
'oneOf' is not permitted', regardless of strict mode. anyOf is
accepted by both OpenAI and Anthropic, and for discriminated unions
whose branches pin a property to different const values (like
codedb_bundle's ops schema) the two are functionally equivalent —
no input matches more than one branch anyway.

A new forge_app::utils::rewrite_one_of_to_any_of recursively
rewrites every oneOf to anyOf before strict-mode normalization
runs, in both the legacy chat-completions and Responses-API paths.
Bundled codedb is now started with CODEDB_DISCRIMINATED_SCHEMA=1
so its discriminated branches actually flow through the rewrite (vs
arriving as a bare {type: "object"} and triggering missing-path /
missing-pattern runtime errors). Regression test in
forge_repo covers the codedb_bundle shape end-to-end.

feat(ui) — banner logs path, softer status icons, reasoning hidden by default

  • The interactive banner now shows the log directory under Logs: so
    agents debugging graff can find it without spelunking through
    forge_tracker.
  • Status icons softened: for completion, for error, dimmer
    info dot, brighter timestamps. Reads less like a syslog and more
    like a chat trace.
  • Reasoning summaries (Evaluating ..., Exploring ...) are hidden
    unless --verbose. They're available in the trajectory if you want
    them; the live REPL stays signal-rich.
  • Tracing: FORGE_LOG falls back to RUST_LOG so anyone with Rust
    muscle memory just works. Default filter is module-segment-agnostic
    (debug not forge=debug) so events from forge_infra,
    forge_domain, forge_main, etc. actually land in the log file.

Parallel tool calls — visible header

When the model emits ≥2 tool calls in a single assistant turn, an
⇉ N parallel tool calls (breakdown) header lands above them — the
batch is now visible as a group rather than dissolving into a stream
of unrelated icons.

Release pipeline

The CI release workflow is removed in this version. The previous
generator pushed to antinomyhq/npm-code-forge,
antinomyhq/npm-forgecode, and antinomyhq/homebrew-code-forge
upstream-fork repos this codegraff fork doesn't own. Releases are
manual from 0.1.5 onwards: build locally, gh release upload. We'll
re-add CI builds once we have target distribution channels under our
own org.

Binaries

This release ships macOS arm64 only (codesigned with Developer ID,
notarized by Apple). Other platforms can be built from source:

cargo build --release --bin graff --bin codegraff

Notes

  • Workspace version bumped to 0.1.5.
  • Tag: v0.1.5.

🤖 Generated with Claude Code