Add Codex CLI driver for OpenAI agents#61
Merged
moodmosaic merged 24 commits intomasterfrom Apr 13, 2026
Merged
Conversation
New driver (lib/drivers/codex-cli.sh) implements the full 13-function interface: agent_run via `codex exec --json`, stats extraction by summing turn.completed events, activity streaming for command_execution/ reasoning/file_change/mcp_tool_call/web_search items, fatal/retriable error detection, and OPENAI_API_KEY auth injection. Dockerfile refactored to share Node.js 22 install between gemini-cli and codex-cli drivers. Test configs (codex-only.json, codex-mixed.json) and 92 new unit tests across test_drivers.sh and test_config.sh.
file_change path lives in .changes[].path not .file_path (verified from production logs). Remove phantom reasoning item type -- Codex thinks internally with no event emitted. Update tests to match real event structure with proper fields (id, status, changes, etc).
Cover all 15 Codex-compatible models: gpt-5.4 family (5.4/mini/nano), codex-specific (gpt-5.3-codex, gpt-5.2-codex), gpt-5.2, gpt-5 family (5/mini/nano), gpt-4.1 family (4.1/mini/nano), and reasoning models (o3, o4-mini, o3-mini). Prices per 1M tokens from OpenAI's standard tier as of April 2026.
Shows "codex" in the Driver column, matching how claude-code shows "claude" and gemini-cli shows "gemini".
Drivers like Codex CLI that lack native timing in their JSONL output now get wall-clock elapsed time for dur and api_ms, so the dashboard shows Tok/s and Time instead of blanks. Drivers that report their own timing (Claude Code, Gemini CLI) are unaffected.
Exercise multiple Codex model variants in a single mixed-driver swarm: claude-opus-4-6, gpt-5.4, gpt-5.3-codex, and gpt-5.2.
Codex CLI agent_docker_auth now supports three auth modes: - chatgpt: bind-mounts ~/.codex/auth.json into containers - apikey: passes OPENAI_API_KEY only - auto (default): uses whichever credentials are available CODEX_AUTH_JSON env var overrides the default auth.json path. Docs updated with per-driver auth tables and usage examples.
New configs: codex-chatgpt.json (chatgpt-only auth) and codex-auth-mixed.json (chatgpt + apikey + auto-detect). test_config.sh gains sections 31-33 that parse these configs and exercise agent_docker_auth resolution for each auth mode, including CODEX_AUTH_JSON env var override and default path.
- agent_settings writes config.toml to ~/.codex/ (where CLI looks) instead of /workspace/.codex/ - Dockerfile pre-creates /home/agent/.codex/ with correct ownership so bind-mounted auth.json doesn't create a root-owned directory - agent_settings falls back to sudo mkdir if dir isn't writable - agent_detect_fatal excludes harmless "could not update PATH" warning
Docker's -v silently creates a directory when the source file is missing, corrupting the host filesystem. --mount type=bind errors out cleanly instead, preventing this footgun.
Adds gpt-5.4-pro, gpt-5.2-pro, gpt-5-pro, gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-max, gpt-5.1-codex-mini, gpt-5-codex, and o3-pro to both codex-only and codex-mixed pricing configs. Pro models omit cached pricing (unsupported).
Pass effort level into Codex containers via CODEX_EFFORT env var and relay it to `codex exec -c model_reasoning_effort=...`. Add effort fields to Codex test configs and corresponding assertions.
Detect when ~/.codex/auth.json has been turned into a directory (by a stale Docker -v mount) and print a recovery command.
1 task
The tail -20 CI output window hides earlier failures. Collect failed assertion labels and print them in the summary block.
jq 1.8+ preserves source formatting (2.50 stays 2.50) while jq 1.6 normalizes to 2.5. Use values without trailing zeros in the test config so the assertion works on both versions. Also collect failed test names in summary for CI tail-20 visibility.
Context slim/none was broken for non-Claude drivers: the harness stripped .claude/ once on checkout, but git pull --rebase restored the files. Adds post-merge, post-checkout, and post-rewrite hooks that re-strip automatically. Also adds "usage limit" and "hit your...limit" to Codex CLI's retriable error patterns so ChatGPT subscription caps trigger backoff retry instead of fatal exit.
codex --version prints "codex-cli 0.120.0"; extract the version number after the last space instead of before the first.
Codex activity filter selected both item.started and item.completed, causing every shell command/edit to appear twice. Now shows commands on start (immediate feedback) and edits/mcp on completion (full path data). Push safety net now cleans up stale .git/rebase-merge before retrying, preventing "already a rebase-merge directory" from failing all retries.
codex exec with --skip-git-repo-check does not load .codex/instructions.md, so the git coordination rules were never reaching the model. Prepend them directly into the prompt argument instead.
OpenAI includes cached tokens in input_tokens, but Claude does not. The harness pricing formula adds tok_in * input_price + cache * cached_price, so cached tokens were charged twice. Subtract cached from input in Codex stats extraction for consistency.
Member
Author
|
This resolves #26. |
Codex CLI reads AGENTS.md for project instructions (not .claude/CLAUDE.md) and .agents/skills/ for skills (not .claude/skills/). In agent_settings, bridge both when the Codex locations are absent: - Copy .claude/CLAUDE.md (or root CLAUDE.md) to AGENTS.md - Symlink .claude/skills/ to .agents/skills/ Both are added to .git/info/exclude so the agent doesn't commit the bridged files. The skills symlink only fires when .claude/skills/ exists (context=full); slim/none strip it so it's a natural no-op.
ee4c0f7 to
3bbd5a1
Compare
Adds changelog entry covering the full Codex CLI driver feature set: driver implementation, ChatGPT subscription auth, .claude/ convention bridging, context stripping hooks, stale rebase cleanup, inline system prompt, and per-driver effort documentation.
Member
Author
|
Tested extensively overnight on gethfuzz with 3 Codex agents
|
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
codex-clidriver (lib/drivers/codex-cli.sh) implementing the full 13-function interface: non-interactivecodex exec --json, stats extraction by summingturn.completedevents, activity streaming for command/reasoning/file_change/mcp/web_search items, fatal and retriable error detection, andOPENAI_API_KEYauth injectiongemini-cliandcodex-cliinstead of duplicating itcodex-only.json,codex-mixed.json) and 92 new unit tests acrosstest_drivers.shandtest_config.shTest plan
bash tests/test_drivers.sh— 242 passed, 0 failedbash tests/test_config.sh— 235 passed, 0 failedbash tests/test_setup.sh— 66 passed, 0 failed./tests/test.sh --config tests/configs/codex-only.jsonwithOPENAI_API_KEYset./tests/test.sh --config tests/configs/codex-mixed.jsonwith bothCLAUDE_CODE_OAUTH_TOKENandOPENAI_API_KEYsetSWARM_AGENTS=codex-cli