Releases: justrach/codegraff2
codegraff v0.2.17
Changes
- No changes
codegraff v0.2.16
codegraff v0.2.16
Released 2026-06-03
A fast follow-up to v0.2.15 that fixes two rough edges in the context stack
that release introduced, and ships the macOS desktop app (signed +
notarized) for the first time.
Fixes
@-mentioning a PDF/binary no longer errors
v0.2.15 tried to reflect a binary @-mention as a file reference, but it gated
that on a null-byte binary check while the file reader rejects binaries with
a magic-number (infer) check. The two disagree on files with an ASCII
header — a PDF's %PDF-1.x bytes have no null bytes — so the mention fell
through to the UTF-8 reader and still failed with "Binary files are not
supported. File detected as application/pdf".
Now a non-text/non-image @-mention resolves to the bare absolute path (plain
text, as if you had typed it). The agent treats it like any path and opens it
with the read tool, which already handles PDFs and images. The attachment path
also catches the reader's binary rejection directly, so the two detectors can no
longer disagree their way into an error.
Tool-result offload no longer fights the prompt cache
v0.2.15's lossless tool-result offload ran every turn unconditionally,
rewriting older tool-result messages in place (full → [offloaded] marker).
Because the prompt cache is prefix-based, rewriting a message before the kept
window broke the cache there and forced the whole keep-window to be reprocessed
at full (uncached) price every turn — roughly a 10× cost bump on the large
MCP/codedb results offloading was meant to save, and it fired even at trivial
context usage (e.g. 16k/180k).
Offloading is now gated on real context pressure (≥80% of the lossy
summarization threshold). Below that, the warm prefix cache is worth far more
than the few KB offloading would reclaim, so the context is left untouched and
the cache stays intact; offloading only kicks in as you approach the
summarization point — exactly when the cache would churn anyway.
Desktop app (macOS)
The Codegraff desktop app — a Tauri GUI with first-class slash commands,
contributed by @pranavp311 — is now available as a signed and notarized, universal (arm64 + x86_64) macOS build. It runs on the same core as the CLI/TUI.
Contributors
- @pranavp311 — the Tauri desktop app (
feat(gui): add Tauri desktop app,
Add desktop slash commands) and the macOS GUI it's built from.
Install
curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | shPin this version with sh install.sh v0.2.16. macOS graff/codegraff binaries
are codesigned with a Developer ID Application certificate (hardened runtime) and
notarized by Apple; the CLI ships for macOS (arm64 + x86_64), Linux
(gnu/musl × x86_64/aarch64), Windows (x86_64/aarch64), and Android (graff). The
desktop app .dmg is a notarized Apple-Silicon (arm64) build.
Upgrade notes
- Non-breaking patch over v0.2.15; configuration and credentials carry over.
- The macOS build/notarization runbook lives at
docs/releases/macos-build-and-notarize.md.
codegraff v0.2.15
codegraff v0.2.15
Released 2026-06-03
This release is about reach and resilience. codegraff gains a native desktop
app (a Tauri GUI with first-class slash commands) to sit alongside the CLI and
TUI, so the agent is now usable from a window as well as a terminal. Under the
hood, 0.2.15 lands the full context engineering stack — a two-tier
reversible tool-result offload, a lean tools_list that cuts the tool
catalog from ~46 KB to ~11.5 KB, live-turn preservation so compaction stops
re-answering the first prompt, and MCP-server globs surfaced as first-class
tools (codedb included). It fixes the long-standing papercut where @-mentioning
a PDF or other binary errored out — those mentions now reflect the file's
absolute path so the read tool takes over. The SDKs keep hardening
(TypeScript can never publish without its native loader again, Python gains
JSON-schema / structured-output mode, agent selection, and headless MCP trust),
and codex gpt-5.5 moves to a 1,000,000-token context window.
macOS binaries for this release are signed with a Developer ID Application
certificate (hardened runtime) and notarized by Apple, and every platform's
assets are versioned 1:1 with the v0.2.15 tag so install.sh resolves them
cleanly.
Highlights
- New Tauri desktop app (
codegraff-gui) with desktop slash commands — a
windowed coding-agent interface powered by the same core as the CLI/TUI. @-mentioning a PDF or binary no longer errors. The mention now reflects
the file's absolute path as a<file_reference>, handing the agent off to
the read tool instead of failing with "Binary files are not supported".- Two-tier reversible tool-result offload for compaction — large tool
results are offloaded in tiers and can be brought back, instead of being lost. - Lean
tools_list: one-line tool summaries shrink the tool catalog from
~46 KB to ~11.5 KB, leaving far more room for real work. - Live-turn preservation in compaction — the agent no longer re-answers the
first prompt after a compaction. - MCP-server globs are now first-class tools, so specific MCP servers (e.g.
codedb) surface their tools directly rather than hiding behind a generic
wrapper. - codex gpt-5.5 context window raised to 1,000,000 tokens.
- TypeScript SDK never ships without its N-API loader again — a publish-time
guarantee that@codegraff/sdkalways includes the native binding it needs. - Python SDK gains JSON-schema / structured-output mode, agent selection,
per-call prompt/temperature overrides, and a headless MCP-trust fix. - Signed + notarized macOS binaries (arm64 and x86_64), version-matched to
the release so the installer never 404s.
Desktop app (codegraff-gui)
- A new Tauri-based desktop application lands under
gui/, giving codegraff a
native windowed interface in addition to the CLI and TUI. It is built on the
sameforge_*core crates as the rest of the project (consumed as path
dependencies), so behavior stays consistent across surfaces. - Desktop slash commands bring the familiar slash-command workflow into the
GUI. - The GUI is its own decoupled cargo workspace (
gui/src-tauri) so its build
artifacts and lockfile stay isolated from the CLI workspace, and it is versioned
in lockstep with the CLI (now0.2.15).
Attachments & @-mentions
- Binary / PDF
@-mentions are no longer a dead end. Previously, mentioning a
non-text file (a PDF, an image, any binary) returned "Binary files are not
supported" and stopped there. Now the mention reflects the file's absolute
path and emits a<file_reference path=… mime_type=…>marker, which routes the
agent to the read tool — the correct handler for that content — instead of
erroring.
Context engineering & compaction
This release completes the context stack that 0.2.14 began:
- Two-tier reversible tool-result offload. Oversized tool results are offloaded
in two tiers and remain recoverable, so compaction reclaims context without
permanently dropping information the agent may still need. - Live-turn preservation. Compaction now protects the in-flight turn, fixing
the bug where the agent would re-answer the very first prompt after
compacting. - Lean
tools_list. Tool definitions are summarized to one line each,
taking the serialized tool catalog from ~46 KB down to ~11.5 KB — a large
reduction in fixed per-request overhead. - MCP-server globs as first-class tools. Specific MCP servers' tools (such as
codedb's) are now surfaced directly in the tool list rather than behind a
single generic glob, so they're discoverable and callable like any built-in.
Providers & models
- codex
gpt-5.5is configured with a 1,000,000-token context window,
unlocking long-context runs on that model.
TypeScript SDK (@codegraff/sdk)
- Never publish without the native loader. The publish pipeline now guarantees
@codegraff/sdkships with its N-API loader every time — the package can no
longer be released in a broken, loader-less state. - Publishing fixes to ensure the main
@codegraff/sdkpackage is the one that
goes out, with the SDK moving to 0.2.4. response_formatplumbing added to the core so the 0.1.3 Python SDK's
structured-output mode has the support it depends on.
Python SDK (codegraff)
- JSON-schema / structured-output mode (SDK 0.1.3): constrain model output
to a schema for reliable structured results. - Agent selection plus per-call prompt and temperature overrides.
- Headless MCP-trust fix, so MCP servers work correctly in non-interactive
(SDK/headless) usage. - BYOK fix:
extra_paramsare now passed through toupsert_credentialin the
BYOK constructor. - Packaging & CI: full-matrix PyPI release,
pyo3upgraded 0.24 → 0.28 for
Python 3.14 / 3.14t,protocinstalled inside the manylinux container,
aarch64-linux built on a native arm64 runner, a PyPIreadmefield so the
package renders its README, and an honest platform-support note
(mac-arm64 wheels today, no sdist).
CI, telemetry & docs
GRAFF_TRAJECTORY_ENDPOINTis baked into release builds from a CI secret, so
released binaries point at the correct trajectory endpoint.- README rewritten around the CLI + SDKs, with in-depth TypeScript and
Python guides. - Housekeeping:
forge.mdrenamed tocodegraff.md, and duplicate pipeline
tests dropped.
Install
curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | shThe installer detects your platform and pulls the matching graff and codegraff
binaries plus fzf and codedb. To pin this exact version:
curl -fsSL https://github.com/justrach/codegraff/releases/download/v0.2.15/install.sh | sh v0.2.15Platforms & artifacts
| Platform | Targets |
|---|---|
| macOS | aarch64-apple-darwin, x86_64-apple-darwin (signed + notarized) |
| Linux | x86_64/aarch64, both gnu and musl |
| Windows | x86_64/aarch64 (pc-windows-msvc) |
| Android | aarch64-linux-android (graff only) |
macOS signing & notarization
- The macOS
graffandcodegraffbinaries are codesigned with the Developer ID
Application: Rachit Pradhan (TeamWWP9DLJ27P) certificate using a hardened
runtime and a secure timestamp, then notarized by Apple. - Because they are bare CLI executables (not
.app/.dmg/.pkgbundles), the
notarization ticket is validated online by Gatekeeper on first run; there is
nothing to staple. Theinstall.shpath additionally clears the quarantine
attribute on the installed binary. - Each tool is published both as a raw binary (e.g.
graff-aarch64-apple-darwin,
consumed byinstall.sh) and as a.zip(e.g.graff-aarch64-apple-darwin.zip,
the notarized archive). All four binaries (graff/codegraff×arm64/x86_64)
report version 0.2.15, matching thev0.2.15tag soinstall.sh— whether it
resolveslatestor a pinnedv0.2.15— always finds the right asset.
Upgrade notes
- This is a non-breaking release; existing configuration and credentials carry
over. Re-run the installer (above) or download the new binaries from the release
assets. - The full runbook for cutting the signed + notarized macOS artifacts lives at
docs/releases/macos-build-and-notarize.md.
v0.2.14
codegraff v0.2.14
Released 2026-06-01
This release is about SDKs growing up. codegraff now ships a mouldable, Next.js-ready TypeScript SDK (@codegraff/sdk) and a new Python SDK (codegraff), both guarded by dhi-backed input validation so bad options fail fast with clear errors instead of surprising you at runtime. Alongside the SDKs, 0.2.14 introduces client trajectory upload with opt-in OTLP GenAI telemetry and RLVR reward labels, makes MCP / tool-grammar calls visible across the stream, hooks, and trajectory, hardens the orchestrator against the pending-todos doom loop, tightens compaction behavior for MCP-heavy and image-heavy contexts, and reworks the usage tracker to be privacy-respecting by default. It also folds in three upstream fixes from tailcallhq/forgecode.
Highlights
- New first-class TypeScript SDK (
@codegraff/sdk) with a one-call BYOK entry point, a mouldable system prompt, a cloudSandboxclass, and Next.js-ready packaging. - New first-class Python SDK (
codegraff) sharing the same validation contract as the TypeScript SDK. - dhi-backed input validation in both SDKs validates
Graff.initand chat options before any request leaves the client. - Client trajectory upload plus opt-in OTLP GenAI telemetry, with RLVR sparse-ORM reward labels attached to uploaded trajectories.
- MCP / tool-grammar calls are now surfaced consistently across the stream, hooks, and trajectory.
- Orchestrator reliability fixes that break the pending-todos doom loop.
- Compaction now scales to 90% of the context window (codex parity) and accounts for JSON/image tool results.
- The usage tracker no longer harvests user email, and the opt-out env var is renamed to a real opt-out:
GRAFF_TRACK. - SDK versions are now 1:1 with the CLI, bumping
@codegraff/sdkfrom0.2.0to0.2.3with native codedb on install.
TypeScript SDK (@codegraff/sdk)
- Ships as a proper N-API package built and published through cross-platform CI, with native codedb installed on
npm install. - One-call BYOK initialization:
bring-your-own-key auth with thin JS wrappers over the native core.
Graff.init({ provider, apiKey, model, maxTokens })
- Mouldable system prompt so integrators can shape the agent's base instructions to fit their product.
- New
Sandboxclass for managing cloud sandboxes from the SDK. - Next.js-ready packaging, with a companion Next.js example app demonstrating end-to-end usage.
- dhi-backed validation of
Graff.init/chatoptions (see "Validation & verification"). - Versioned 1:1 with the CLI:
@codegraff/sdkmoves from0.2.0to0.2.3.
Python SDK (codegraff)
- New Python SDK sharing the same option contract and BYOK model as the TypeScript SDK.
- dhi-backed input validation for
Graff.init/chat options, pinned to dhi >= 1.3.3 (nativecp314/cp314twheels). - New turboAPI example: a dhi-validated, SSE-streaming server showing how to put the Python SDK behind an HTTP API.
Telemetry & RLVR
- Client trajectory upload: completed agent trajectories can be uploaded from the client.
- Opt-in OTLP GenAI telemetry: standards-aligned OpenTelemetry GenAI signals, off unless you explicitly enable them.
- RLVR sparse-ORM reward labels on uploaded trajectories, derived from the run outcome:
accepted→ reward1.0error→ reward0.0incomplete→ rewardnull(masked, so partial/unfinished runs don't pollute the reward signal)
- This makes uploaded trajectories directly usable for Reinforcement Learning from Verifiable Rewards workflows, where only verifiably-terminal outcomes carry a dense reward and everything else is masked.
- Test coverage for all three outcome paths (
accepted/incomplete/error).
MCP & tool calls
- MCP / tool-grammar calls are now surfaced across the stream, hooks, and trajectory, so tool activity that previously stayed hidden is now observable end-to-end.
- Unwrap double-encoded tool args: arguments that arrived JSON-encoded-inside-JSON are now correctly decoded before dispatch.
- MCP tools are sent non-strict, improving compatibility with MCP servers whose schemas don't satisfy strict tool-call constraints.
Orchestrator & reliability
- Break the pending-todos doom loop: the orchestrator no longer gets stuck re-running because of lingering pending todos.
- Bounded End-hook rearms cap how many times the End hook can re-arm itself.
- A doom-loop detector strips volatile keys before comparing state, so cosmetic churn no longer looks like genuine new work.
Compaction
- Threshold scaled to 90% of the context window for parity with codex behavior.
- JSON and image tool results are now counted toward context size, so MCP-heavy contexts actually compact instead of silently overflowing.
- Cost-only ping frames no longer shadow real token usage, fixing cases where bookkeeping frames masked the true token count used to drive compaction.
Privacy
- The usage tracker no longer harvests user email.
- The opt-out environment variable is renamed from
FORGE_TRACKERtoGRAFF_TRACK, and is now a real opt-out that genuinely disables tracking.
Upstream sync (tailcallhq/forgecode)
Pulled in three fixes from upstream tailcallhq/forgecode:
- #3418 — apply the Opus 4.7 API contract to Claude Opus 4.8, keeping the newer model on a known-good request/response contract.
- #3414 — add
provider.json+vertex.jsonmodel entries for newly supported models. - #3350 — replay
reasoning_contentfor Xiaomi MiMo tool calls, so reasoning is preserved correctly across tool-call turns for that model.
Validation & verification
- Both SDKs validate inputs with dhi, a Zod-4- / Pydantic-compatible, SIMD-WASM validator. Options passed to
Graff.initandchatare checked against a shared schema contract in both the TypeScript and Python SDKs, so invalid configuration is rejected at the boundary rather than deep inside a request. - dhi is pinned to >= 1.3.3, which provides native
cp314/cp314twheels for the Python side. - Added regression tests covering RLVR outcome labeling for the accepted, incomplete, and error paths.
Packaging & versioning
@codegraff/sdkbumped0.2.0->0.2.3, now versioned 1:1 with the CLI.- Native codedb is installed on package install, so the SDK is usable without a separate build step.
- New example apps land alongside the SDKs: a Next.js example (TypeScript) and a turboAPI example (Python).
Install
curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | shOnce v0.2.14 is published it becomes latest; until then pin the tag: .../releases/download/v0.2.14/install.sh.
Prebuilt binaries are attached for every supported platform:
| Platform | Arch | Assets |
|---|---|---|
| Linux (gnu) | x86_64, aarch64 | graff-*-unknown-linux-gnu, codegraff-*-unknown-linux-gnu |
| Linux (musl) | x86_64, aarch64 | graff-*-unknown-linux-musl, codegraff-*-unknown-linux-musl |
| Windows | x86_64, aarch64 | graff-*-pc-windows-msvc.exe, codegraff-*-pc-windows-msvc.exe |
| Android | aarch64 | graff-aarch64-linux-android |
| macOS | x86_64, aarch64 | graff-*-apple-darwin, codegraff-*-apple-darwin (+ .zip) |
macOS binaries — signed & notarized
The macOS graff and codegraff binaries (both x86_64-apple-darwin and aarch64-apple-darwin) are:
- Signed with Developer ID Application: Rachit Pradhan (WWP9DLJ27P)
- Built with hardened runtime + secure timestamp
- Notarized by Apple — submission
34a24b8c-e5f4-454a-905d-20cace04840a, status Accepted
They launch without Gatekeeper warnings. As bare CLI tools they can't be stapled, so first launch performs a one-time online notarization check (needs network).
v0.2.13
Highlights
- Fix: Codex
response.completed/response.incompleteevents are now parsed correctly — restoresgpt-5.5(and other models) over the Codex backend (HTTP/SSE and WebSocket transports). Without this, codex turns would fail to deserialize because the backend omitsoai::Response.outputon terminal events. Cherry-picked from upstream #3405; extended on our branch to handle the same dispatch over the Codex WebSocket transport (which upstream didn't carry when the fix landed). - New: Python SDK (
codegraffon PyPI — published separately under thesdk/python-v0.1.0tag). PyO3 bindings exposing the codegraff agent to Python; mirrors the TS SDK'sGraff/GraffSession/Sandboxsurface.
Install
curl -fsSL https://github.com/justrach/codegraff/releases/latest/download/install.sh | shmacOS binaries
All four macOS binaries (graff and codegraff, x86_64 + arm64) are codesigned with Developer ID Application: Rachit Pradhan (WWP9DLJ27P) (hardened runtime + Apple timestamp) and notarized with Apple. Gatekeeper accepts them online on first launch.
Full changelog
v0.2.12
Full Changelog: v0.2.1...v0.2.12
v0.2.1
What's new
- Codegraff gateway provider —
graff provider login codegraffauthenticates via device flow (opens browser, approve in one click, done) - 4 models available through the gateway: DeepSeek V4 Pro, GPT-5.5, Grok Build (xAI), Kimi K2.6 (Moonshot)
- Pay-as-you-go credits, no subscription — top up from the dashboard at codegraff.com/dashboard/billing
Install
curl -fsSL https://codegraff.com/install-graff.sh | shOr download the macOS arm64 binary from the assets below.
v0.2.0
Highlights
WebSocket codex-parity sweep — all 8 deepwiki audit gaps closed. The chatgpt.com Codex backend's WebSocket transport now ships at full wire-level parity with upstream openai/codex for the things that affect production behavior. Together with everything from v0.1.9, this is a substantial minor release.
What's new vs v0.1.9
- Wrapped WebSocket errors now map to typed errors (#117): when the chatgpt.com Codex backend sends a
type: "error"text frame instead of a proper HTTP-level status (429 plan-usage, 401 token-expiry, 5xx,websocket_connection_limit_reached), we now decode the envelope and surface the sameError::UsageLimitReached/Error::InvalidStatusCode/forge_domain::Error::Retryablethe HTTP path would have. Before: the orchestrator retried with genericEmptyCompletion. Now: existing retry / refresh / fallback policies fire identically across transports. x-codex-beta-features(#118): comma-separated session-scoped beta-feature opt-ins viaopenai_responses_beta_features = ["foo", "bar"]in.forge.toml.x-codex-turn-metadata(#125): per-turn observability metadata viaopenai_responses_turn_metadata = { repo = "...", env = "..." }. JSON-encoded,BTreeMap-stable key ordering.x-openai-attestation(#125): client attestation token forwarding viaopenai_responses_attestation_token = "v1.<opaque>". Static-token model (upstream uses rotating tokens via JSON-RPC; followed-up separately if needed).x-codex-turn-state(#125): sticky backend-routing token captured from WS upgrade response headers and replayed on subsequent reconnects within the conversation. MirrorsModelClientSession::turn_state's capture-then-replay pattern.- Handshake probe + close-frame diagnostic (#125): 50ms post-upgrade poll for an immediate Close frame, surfacing the close
code+reasonas a typedConnectErrorso policy rejections (rate-limited, auth-invalid) come with actionable context instead of an opaque "stream ended". - Structured WebSocket telemetry (#125):
tracing::info!/warn!events taggedevent.kind = "codex.websocket_{connect,request,event}"withduration_ms,success,error.message, sub-kinds foridle_timeout/transport_error/stream_end/response_completed/response_failed. Wire your own OTel / JSON / stdout subscriber. - Connection-only WebSocket prewarm (#125): opt-in via
openai_responses_prewarm = true; the newOpenAIResponsesProvider::preconnect_websocket(conv_id)API opens + stashes the socket so the first real turn skips TLS+upgrade latency. Connection-only for now (nogenerate=falsestub roundtrip yet).
Carried over from v0.1.9
- Codex Responses-API parity rounds 1–3 (#66 → #106): wire-level parity for
parallel_tool_calls,client_metadatabody fields incl. W3C trace context, SSEoutput_item.done+reasoning_summary_part.added, structured 429UsageLimitReached/UsageNotIncluded, reactive 401→refresh + proactive refresh-before-expiry. ReadWithoutWriteDetector(#109, closes #27): orchestrator hook for the analysis-paralysis loop pattern.- macOS screenshot drag-drop (#52, closes #51).
- MCP completeness (#108, closes #26): Audio / ResourceLink / Resource / structuredContent / output_schema variant coverage.
- Subagent trajectory recording (#112/#114, closes #33):
/trace <root_conversation_id>walks the whole subagent tree withparent_agent_idlinkage. Live-verified. - Credentials hardening (#69):
~/forge/credentials.jsonis nowchmod 0o600. - WS protocol pin (#66):
OpenAI-Beta: responses_websockets=2026-02-06on the upgrade.
Install
Recommended (POSIX shell installer, auto-detects OS + arch + libc):
curl -fsSL https://github.com/justrach/codegraff/releases/download/v0.2.0/install.sh | shSupported binary downloads
| Platform | graff |
codegraff |
|---|---|---|
| macOS arm64 (Apple Silicon) | graff-aarch64-apple-darwin | codegraff-aarch64-apple-darwin |
| Linux x86_64 (glibc) | graff-x86_64-unknown-linux-gnu | codegraff-x86_64-unknown-linux-gnu |
| Linux x86_64 (musl, static) | graff-x86_64-unknown-linux-musl | codegraff-x86_64-unknown-linux-musl |
| Linux aarch64 (glibc) | graff-aarch64-unknown-linux-gnu | codegraff-aarch64-unknown-linux-gnu |
| Linux aarch64 (musl, static) | graff-aarch64-unknown-linux-musl | codegraff-aarch64-unknown-linux-musl |
| Windows x86_64 (MSVC) | graff-x86_64-pc-windows-msvc.exe | codegraff-x86_64-pc-windows-msvc.exe |
| Windows aarch64 (MSVC) | graff-aarch64-pc-windows-msvc.exe | codegraff-aarch64-pc-windows-msvc.exe |
CodeDB-bundled tarballs for the Linux-x86_64 line are available as graff-x86_64-unknown-linux-{gnu,musl}-bundle.tar.gz.
Signing / build provenance
| Platform | Status |
|---|---|
| macOS arm64 | Codesigned (Developer ID WWP9DLJ27P, hardened runtime, RFC 3161 timestamp) and notarized via Apple notary service. Built locally on the maintainer's workstation from the v0.2.0 tag. |
| Linux + Windows | Unsigned. Built by this CI run via the tag-driven Multi Channel Release workflow on ubuntu-latest / windows-latest runners. |
| macOS x86_64 (Intel) | Not shipped in this release — pending CODEDB_LOCAL_APPLE_* GitHub Secrets configuration for CI signing. |
Verification
graff --version # → graff 0.2.0
codesign -dvv $(which graff) # macOS only → Authority=Developer ID Application: Rachit Pradhan (WWP9DLJ27P)
Known gaps (not shipped, deferred follow-ups)
aarch64-linux-android— build fails becausearboard(clipboard library inforge_main) does not compile on Android. Needs#[cfg(not(target_os = "android"))]guards.x86_64-apple-darwin(Intel mac) binary — needs the Apple CI secrets so it can land in the same release pipeline as the other platforms.- TUI startup wiring for
preconnect_websocket()— building blocks shipped (#125), but no caller fires it from session-start yet. Suggested follow-up:tokio::spawnit after auth resolves, before the TUI's main event loop blocks on input. - HTTP/SSE-side
x-codex-turn-statecapture — WS-only today (#125); SSE response-header extraction needs an eventsource-stream plumbing change. prewarm_websocketgenerate=falsestub — the additional roundtrip that primesprevious_response_id. Benchmark first.- Rotating attestation tokens —
openai_responses_attestation_tokenis static today. If rotation is needed, addopenai_responses_attestation_commandthat shells out.
v0.1.9 — Codex Responses parity · analysis-paralysis fix · MCP coverage · macOS screenshots
Note: this release was re-cut on 2026-05-21 to include the trajectory-recording fixes for subagents (#112, #114, closes #33). If you downloaded
grafffrom the original v0.1.9 binary on 2026-05-20, please re-install — your/tracewon't walk subagent trees correctly.
Highlights
- Codex Responses-API parity round 1–3 (#66 → #106): wire-level parity with upstream
openai/codexfor the chatgpt.com Codex backend across body fields (parallel_tool_calls,client_metadatawith installation/window IDs + W3C trace context), SSE event coverage (output_item.done,reasoning_summary_part.added), structured 429 envelopes (UsageLimitReached/UsageNotIncluded), reactive 401 → token refresh, and proactive refresh-before-expiry. ReadWithoutWriteDetector(#109, closes #27 P0): new request-phase orchestrator hook that catches the analysis-paralysis pattern — re-reading the same files without writing code — and injects a forcing-function reminder.- Parallel agent dispatch — and observability for it (#112, #114, closes #33): the orchestrator has long fanned
Tasktool calls out in parallel viafutures::join_all, but until this release the child agents' work was invisible to/trace. The trajectory recorder now records every dispatched child under the root'sconversation_idwithparent_agent_idlinked to its dispatcher, so a single/trace <root_conversation_id>walks the whole fan-out tree. See Spotlight: parallel agent dispatch below. - macOS screenshot drag-drop (#52, closes #51): TUI now correctly recognises temporary screenshot paths and
file://URLs and attaches them as images instead of pasting raw text. - MCP completeness (#108, closes #26): full content-variant coverage (Audio, ResourceLink, Resource, structuredContent, output_schema/annotations/title metadata) on the rmcp adapter.
- Credentials hardening (#69):
~/forge/credentials.jsonis nowchmod 0o600after writes. - WS protocol pin (#66):
OpenAI-Beta: responses_websockets=2026-02-06header for the chatgpt.com WS upgrade.
Spotlight: parallel agent dispatch
graff has three layers of parallelism for sub-agents, and v0.1.9 closes the missing one (observability). They compose as follows:
Layer 1 — wire-level: the LLM emits multiple tool calls per turn
By default we send parallel_tool_calls: true to OpenAI / Codex / Anthropic, and the per-model supports_parallel_tool_calls capability in forge_domain/src/agent.rs declares which models accept it:
// crates/forge_app/src/dto/openai/request.rs:408
parallel_tool_calls: Some(true), // transformers downgrade if a model
// doesn't support itThis was wired across the Codex Responses backend in #95 (round-2 parity, backported via #100). Without it the model emits one tool call per turn and there's no parallelism to dispatch.
Layer 2 — orchestrator: fan Task calls out, run the rest sequentially
Orchestrator::execute_tool_calls partitions the tool calls the model emitted into Task (dispatch-a-subagent) versus everything else. Task calls run concurrently via futures::join_all; everything else stays sequential so the UI notifier handshake and per-tool hooks behave the same as before:
// crates/forge_app/src/orch.rs:108–135
let (task_calls, other_calls): (Vec<_>, Vec<_>) =
tool_calls.iter().partition(is_task_call);
// record dispatches on parent's trajectory *before* kicking them off
if let Some(recorder) = &self.trajectory_recorder {
for tc in &task_calls {
recorder.record_tool_call(tc).await;
}
}
let task_results = join_all(
task_calls.iter().map(|tc|
self.services.call(&self.agent, tool_context, (*tc).clone())
),
).await;When ≥ 2 tool calls land in one assistant turn the REPL surfaces them with a banner so you can see the batch as a group:
⇉ 3 parallel tool calls (2× Task, read)
Layer 3 — observability: child events under the root's conversation
This is the piece v0.1.9 adds. Before #112/#114, Task dispatch trajectories looked like this: the parent's view of the dispatch (tool_call + tool_result rows for the Task itself) recorded fine, but every child agent's internal tool calls were dropped on the floor because AgentExecutor::execute constructed a fresh ToolRegistry via Services::call(...) with no trajectory repo threaded in.
PR #111/#112 plumbed the repo through ForgeApp::tool_registry, but the orchestrator's actual dispatch path goes via services.call(...) which builds a fresh registry per call through the blanket AgentService::call impl — so the recorder never reached the children in production.
PR #113/#114 (this release) finishes the wiring:
- adds
Services::trajectory_repo()so the blanketAgentService::callimpl can thread the repo into the per-callToolRegistry, and - threads
parent_conversation_idthroughToolCallContextso the child agent's events land under the root'sconversation_id.
What /trace <root_id> now looks like for a parent that fanned out 3 Task calls in parallel:
0 run agent=forge
1 call task agent=forge ⇉ dispatched in parallel
2 call task agent=forge ⇉ dispatched in parallel
3 call task agent=forge ⇉ dispatched in parallel
0 run agent=sage parent=forge
1 call read agent=sage
2 result read agent=sage duration=1ms
3 end agent=sage
0 run agent=grep parent=forge
1 call grep agent=grep
2 result grep agent=grep duration=43ms
3 end agent=grep
0 run agent=read parent=forge
1 call read agent=read
2 result read agent=read duration=2ms
3 end agent=read
4 result task agent=forge duration=7984ms
5 result task agent=forge duration=210ms
6 result task agent=forge duration=87ms
7 end agent=forge
The three children are properly nested under the parent and timestamped independently, so you can see at a glance which fork dominated the latency budget.
Verifying it on your own runs
In the REPL:
/trace 20 # last 20 events on the current conversation
/trace all # whole tree, walks subagent dispatches
Pipe-friendly from the shell (e.g. for grepping or diffing across runs):
graff conversation list # find the root id
graff conversation trace <root_conversation_id> # mirrors /traceDirect against the SQLite store:
-- ~/forge/.forge.db
SELECT seq, kind, agent_id, parent_agent_id
FROM trajectory_events
WHERE conversation_id = '<root_conversation_id>'
ORDER BY id;You should see one root-agent run plus N child-agent runs, each carrying parent_agent_id linked back to its dispatcher, and child rows interleaving with the parent's tool_call/tool_result rows in seq order.
🚀 Features
- feat(orch): ReadWithoutWriteDetector hook for analysis-paralysis loops (closes #27) (#109)
- feat(openai-responses): inject W3C trace context into client_metadata (closes #104) (#106)
- feat(openai-responses): per-model
default_reasoning_level+prefer_websocketsmetadata (closes #102 / #103) (#105) - feat(provider): proactive OAuth refresh-before-expiry on credential load (closes #89) (#99)
- feat(openai-responses): reactive 401 → token refresh + retry on Codex backend (closes #88) (#98)
- feat(openai-responses): parse 429
UsageErrorResponseenvelope from Codex backend (closes #90) (#97) - feat(openai-responses): handle
output_item.done+reasoning_summary_part.addedSSE events (closes #93 / #94) (#96) - feat(openai-responses): send
parallel_tool_calls+ Codexclient_metadatabody fields (closes #91 / #92) (#95) - feat(openai-responses):
x-codex-window-id+ opt-in timing metrics (#83) - feat(openai-responses): per-model
default_verbosity/support_verbosityfor Codex (#81) - feat(providers): add
codex-auto-reviewto Codex catalog (#79) - feat(openai-responses): send Codex identity headers (
originator,x-codex-installation-id) on chatgpt.com requests (#77) - feat(openai-responses): wire
text.verbosityfor gpt-5.x Codex models (#75)
🐛 Bug Fixes
- fix(trajectory): record subagent runs under parent's conversation_id (closes #33) (#112, #114)
- fix(openai-responses): pin Codex Responses WS protocol via
OpenAI-Beta(#66) - fix(security): chmod credentials file to
0o600after write (closes #68) (#69) - fix(openai-responses): Codex header fidelity bugs from deepwiki audit (#85)
- fix: support macOS screenshot image drops (closes #51) (#52)
🧰 Maintenance
- test(mcp): cover Audio + ResourceLink JSON fallback paths (closes #26) (#108)
- fix(ci): enable pixo simd for linux coverage builds (#71)
- fix(test): refresh stale tool-description snapshots (#73, #74)
Closed issues
#6, #9, #11, #26, #27, #33, #51, #65, #68, #88, #89, #90, #91, #92, #93, #94, #102, #103, #104
What's deferred
The OpenAI/Codex parity tracker (#65) closed with these still-open follow-ups for the orchestrator-level work:
- 4 remaining Codex identity headers (
x-codex-turn-state,x-codex-turn-metadata,x-codex-parent-thread-id,x-openai-subagent) — need turn-descriptor + subagent-context propagation /responses/compactand/memories/trace_summarizeendpointsAgentIdentityauth mode
Install
Recommended (POSIX shell installer, auto-detects OS + arch + libc):
curl -fsSL https://github.com/justrach/codegraff/releases/download/v0.1.9/install.sh | shSupported binary downloads
| Platform | graff |
codegraff |
|---|---|---|
| macOS arm64 (Apple Silicon) | graff-aarch64-apple-darwin | [codegraff-aarch64-apple-darwin](https://github.com/justrach/codegra... |
v0.1.5 — subagent model override · observability · debug-mcp
What's new
feat(task) — per-spawn model override for subagents
The Task tool now accepts an optional model field, so the parent can
spawn a subagent on a different model from itself in the same run:
{
"tasks": ["summarize crates/forge_app"],
"agent_id": "muse",
"model": "gpt-5.5-medium"
}The override is validated against the agent's already-authenticated
provider — pass a model that isn't on the parent agent's authenticated
provider list and you get a clean error listing what is, instead of a
silent cross-provider switch or a confused 401 deep inside the request
path. The subagent banner surfaces the override
(MUSE [Agent · gpt-5.5-medium]) so the parent/child story stays
legible in the trace.
Plumbed end-to-end: TaskInput.model → ChatRequest.model_override →
validated in ForgeApp::chat → agent.model(override) → recorded in
the trajectory as a requested-vs-resolved diagnostic round-trip.
feat(observability) — agent_run + agent_run_end trajectory events
Two new TrajectoryPayload variants make per-spawn behaviour
inspectable:
agent_runcaptures the spawn diagnostic:agent_id,
parent_agent_id,requested_model,resolved_model, plus a
agent_versionSHA-256 prefix of the agent's system prompt template.
Two edits toforge.mdare still bothforgebut produce different
hashes — so a rollup query can group runs by behaviourally distinct
variants rather than byagent_idalone.agent_run_endcarries the per-spawn fitness vector: turns,
prompt/completion tokens, total tool calls, tool errors, wall-clock
ms, and aninterrupt_reasonif the run terminated abnormally. Sums
what's already computed turn-by-turn so the bottom ofrun()writes
the full picture without re-walking events.
/trace renders both, including the requested-vs-resolved diff when
they differ. This is the substrate for an empirical archive of agent
variants — no mutation logic yet, just observation.
feat(debug) — graff debug last-mcp-call
A new top-level subcommand that prints recent MCP request/response
pairs as JSONL, no log-spelunking required:
graff debug last-mcp-call -n 5 --server codedb --tool codedb_bundle --prettyforge_infra::mcp_debug writes a ring buffer at
<base>/debug/mcp-recent.jsonl capturing the literal arguments that
hit rmcp::call_tool, the round-trip duration, and the outcome
(returned vs failed + error). When an MCP server complains "arguments
arrived empty," this pins the loss to the wire vs upstream of the
client.
fix(tools) — rewrite oneOf → anyOf for OpenAI Responses
OpenAI's tool-schema validator (including
chatgpt.com/codex/responses) rejects oneOf outright with
'oneOf' is not permitted', regardless of strict mode. anyOf is
accepted by both OpenAI and Anthropic, and for discriminated unions
whose branches pin a property to different const values (like
codedb_bundle's ops schema) the two are functionally equivalent —
no input matches more than one branch anyway.
A new forge_app::utils::rewrite_one_of_to_any_of recursively
rewrites every oneOf to anyOf before strict-mode normalization
runs, in both the legacy chat-completions and Responses-API paths.
Bundled codedb is now started with CODEDB_DISCRIMINATED_SCHEMA=1
so its discriminated branches actually flow through the rewrite (vs
arriving as a bare {type: "object"} and triggering missing-path /
missing-pattern runtime errors). Regression test in
forge_repo covers the codedb_bundle shape end-to-end.
feat(ui) — banner logs path, softer status icons, reasoning hidden by default
- The interactive banner now shows the log directory under
Logs:so
agents debugging graff can find it without spelunking through
forge_tracker. - Status icons softened:
✓for completion,✗for error, dimmer
info dot, brighter timestamps. Reads less like a syslog and more
like a chat trace. - Reasoning summaries (
Evaluating ...,Exploring ...) are hidden
unless--verbose. They're available in the trajectory if you want
them; the live REPL stays signal-rich. - Tracing:
FORGE_LOGfalls back toRUST_LOGso anyone with Rust
muscle memory just works. Default filter is module-segment-agnostic
(debugnotforge=debug) so events fromforge_infra,
forge_domain,forge_main, etc. actually land in the log file.
Parallel tool calls — visible header
When the model emits ≥2 tool calls in a single assistant turn, an
⇉ N parallel tool calls (breakdown) header lands above them — the
batch is now visible as a group rather than dissolving into a stream
of unrelated icons.
Release pipeline
The CI release workflow is removed in this version. The previous
generator pushed to antinomyhq/npm-code-forge,
antinomyhq/npm-forgecode, and antinomyhq/homebrew-code-forge —
upstream-fork repos this codegraff fork doesn't own. Releases are
manual from 0.1.5 onwards: build locally, gh release upload. We'll
re-add CI builds once we have target distribution channels under our
own org.
Binaries
This release ships macOS arm64 only (codesigned with Developer ID,
notarized by Apple). Other platforms can be built from source:
cargo build --release --bin graff --bin codegraffNotes
- Workspace version bumped to
0.1.5. - Tag:
v0.1.5.
🤖 Generated with Claude Code