0x0funky · nicoecheza · Apr 29, 2026
diff --git a/README.md b/README.md
@@ -274,7 +274,7 @@ Pipeline:
 image_gen tileset + prop_pack_3x3 + layered_tilemap + separate_props + trigger_zones + Godot_TileMap
 ```
 
-Codex-first 2D game asset skills for game-ready 2D sprites, props, FX, and playable map scenes.
+Agent-portable 2D game asset skills for game-ready 2D sprites, props, FX, and playable map scenes. Works with Codex (built-in image generation), Claude Code, Cursor, and any other agent that can run a Python script (via `scripts/image_gen.py`, default backend `gpt-image-2`).
 
 This repository currently ships two skills:
 
@@ -283,9 +283,9 @@ This repository currently ships two skills:
 
 `$generate2dmap` uses `$generate2dsprite` when the chosen map pipeline needs reusable transparent props. Small environmental props can be batched into `2x2`, `3x3`, or `4x4` prop packs, then extracted into individual transparent props. Simple maps can stay as a single baked image.
 
-When a visual reference is involved, both skills use the same wrapper rule: make the image visible in the conversation first. Attached images and freshly generated images are already visible; local files should be opened with `view_image` before asking built-in image generation to preserve identity, style, map layout, or sprite lineage.
+When a visual reference is involved, both skills use the same wrapper rule: make the image visible in the conversation first. Attached images and freshly generated images are already visible. Local files should be opened with `view_image` on Codex, with the agent's native file-read tool on Claude Code / Cursor (and similar), or surfaced via `scripts/view_image.py` when no native tool exists, so identity, style, map layout, or sprite lineage is preserved before image edit/reference calls.
 
-Codex is the primary target because Codex already has built-in image generation. That lets one agent handle the full loop:
+Codex remains the most ergonomic host because its built-in image generation lets one agent handle the full loop without a separate API call. Other agents reach the same loop via `scripts/image_gen.py`:
 
 1. Plan the asset or map pipeline.
 2. Generate the raw sprite sheet, prop, or map image.
@@ -314,29 +314,27 @@ The current focus is 2D game assets and map scenes, not full game-pack automatio
 - Flattened map previews for QA and showcase
 - Godot-ready editable maps with `TileMapLayer`, separate props, `Area2D` encounter grass, `StaticBody2D` collision, exit zones, and debug player scenes
 
-## Why Codex First
+## Supported Agents
 
-This repo is intentionally Codex-first because Codex can generate images directly inside the same workflow.
+The skills work with any agent that can run Python scripts. Codex is the most ergonomic host because it ships a built-in `image_gen` tool, but other agents are first-class via a small CLI fallback.
 
-That gives you a much cleaner pipeline:
+| Agent       | Image generation                              | Reference handling                              |
+| ----------- | --------------------------------------------- | ----------------------------------------------- |
+| Codex       | built-in `image_gen`                          | built-in `view_image`                           |
+| Claude Code | `scripts/image_gen.py` (OpenAI / Gemini)      | `Read` tool on the file path                    |
+| Cursor      | `scripts/image_gen.py` (OpenAI / Gemini)      | Cursor's native file-read tool                  |
+| Generic CLI | `scripts/image_gen.py` (OpenAI / Gemini)      | `scripts/view_image.py` shim or stdout metadata |
 
-- No separate image API wiring
-- No external sprite backend
-- No extra prompt-builder service
-- One agent decides the asset plan
-- One local processor handles deterministic cleanup and export
+Either way, the agent stays the creative brain (asset type, action, bundle shape, sheet layout, frame count, alignment) and the Python scripts only perform deterministic pixel operations and (when needed) the API call to the image backend.
 
-The script is not the creative brain. The agent decides:
+### Image generation backends
 
-- Asset type
-- Action type
-- Bundle shape
-- Sheet layout
-- Frame count
-- Alignment strategy
-- Whether detached effects should be kept or filtered
+`scripts/image_gen.py` supports two backends:
 
-The Python script only performs deterministic pixel operations.
+- **OpenAI** (default) — model `gpt-image-2` via the Images API. Set `OPENAI_API_KEY`.
+- **Gemini** — Google `gemini-2.5-flash-image`. Set `GEMINI_API_KEY` (or `GOOGLE_API_KEY`).
+
+Override selection with `SPRITE_FORGE_BACKEND=openai|gemini` and the model with `SPRITE_FORGE_MODEL=<id>`. Codex users do not need to install either SDK.
 
 ## Repository Layout
 
@@ -345,6 +343,9 @@ agent-sprite-forge/
   README.md
   README.zh-TW.md
   requirements.txt
+  scripts/
+    image_gen.py        # agent-agnostic image generation wrapper (OpenAI / Gemini)
+    view_image.py       # optional shim for non-Codex hosts
   src/
   skills/
     generate2dmap/
@@ -371,31 +372,43 @@ agent-sprite-forge/
 
 ## Install
 
-### Option 1: Windows PowerShell
-
-Clone the repo, install the local processor dependencies, then copy both skills into your Codex skills directory:
+Pick the section that matches your agent. All paths below assume you cloned the repo and ran the dependency install once.
 
-```powershell
+```bash
 git clone https://github.com/0x0funky/agent-sprite-forge.git
-cd .\agent-sprite-forge
-python -m pip install -r .\requirements.txt
-New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.codex\skills" | Out-Null
-Copy-Item -Recurse -Force `
-  ".\skills\*" `
-  "$env:USERPROFILE\.codex\skills\"
+cd ./agent-sprite-forge
+python3 -m pip install -r ./requirements.txt
 ```
 
-### Option 2: macOS / Linux
+### Codex (macOS / Linux)
 
 ```bash
-git clone https://github.com/0x0funky/agent-sprite-forge.git
-cd ./agent-sprite-forge
-python3 -m pip install -r ./requirements.txt
 mkdir -p ~/.codex/skills
 cp -R ./skills/* ~/.codex/skills/
 ```
 
-Start a new Codex session after installation so the skill is loaded cleanly.
+### Codex (Windows PowerShell)
+
+```powershell
+New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.codex\skills" | Out-Null
+Copy-Item -Recurse -Force ".\skills\*" "$env:USERPROFILE\.codex\skills\"
+```
+
+### Claude Code (macOS / Linux)
+
+```bash
+python3 -m pip install "openai>=1.50"   # or: "google-genai>=0.3" for Gemini
+mkdir -p ~/.claude/skills
+cp -R ./skills/* ~/.claude/skills/
+cp -R ./scripts ~/.claude/skills/_shared/   # or keep scripts/ in $PWD; the SKILL.md uses a relative path
+export OPENAI_API_KEY=<your key>
+```
+
+### Cursor / generic CLI agent
+
+The skills are plain markdown plus Python scripts. Point your agent at `skills/generate2dsprite/SKILL.md` (and `skills/generate2dmap/SKILL.md`) and ensure `scripts/image_gen.py` is on the agent's allowed-tools/PATH. Set `OPENAI_API_KEY` (or `GEMINI_API_KEY`).
+
+Start a new agent session after installation so the skill is loaded cleanly.
 
 ## Python Requirements
 
@@ -412,6 +425,17 @@ They are listed in [`requirements.txt`](./requirements.txt). Codex handles image
 - Alignment and rescaling
 - Transparent GIF and PNG export
 
+### Optional extras for non-Codex agents
+
+If you are running the skill from Claude Code, Cursor, or any other agent without a built-in image tool, install the SDK that matches your chosen backend:
+
+```bash
+pip install "openai>=1.50"        # OpenAI (default, model: gpt-image-2)
+pip install "google-genai>=0.3"   # Gemini 2.5 Flash Image
+```
+
+These are intentionally **not** in `requirements.txt` so Codex users do not need to install them.
+
 ## Suggested Prompts
 
 ### Basic

diff --git a/requirements-optional.txt b/requirements-optional.txt
@@ -0,0 +1,10 @@
+# Optional extras for non-Codex hosts (Claude Code, Cursor, generic CLI agents).
+# Only install the line that matches the backend you plan to use with
+# scripts/image_gen.py. Codex users do not need either of these because
+# Codex's built-in image_gen handles generation directly.
+
+# OpenAI Images API (default backend, model: gpt-image-2)
+openai>=1.50
+
+# Google Gemini 2.5 Flash Image
+google-genai>=0.3
diff --git a/scripts/image_gen.py b/scripts/image_gen.py
@@ -0,0 +1,228 @@
+#!/usr/bin/env python3
+"""Agent-agnostic image generation wrapper.
+
+Codex provides built-in `image_gen`. Other agents (Claude Code, Cursor, generic
+CLI agents) shell out to this script instead. It produces the same artifact:
+a PNG written to a known path that the local sprite/map post-processor can read.
+
+Backends, in priority order:
+
+1. ``openai`` - OpenAI Images API (default model: ``gpt-image-2``).
+2. ``gemini`` - Google Gemini 2.5 Flash Image.
+
+The backend is chosen via ``--backend`` or the ``SPRITE_FORGE_BACKEND`` env var.
+``auto`` (default) picks the first backend whose API key is present.
+
+Usage:
+
+    python scripts/image_gen.py \\
+        --prompt "fire mage cast 2x3 sheet, solid #FF00FF background" \\
+        --out raw-sheet.png \\
+        --size 1024x1024
+
+    # With a reference image (image edit / variation)
+    python scripts/image_gen.py \\
+        --prompt "same character, walk cycle 4x4" \\
+        --reference path/to/character.png \\
+        --out walk-sheet.png
+
+Env vars:
+
+    OPENAI_API_KEY       Required for the ``openai`` backend.
+    GEMINI_API_KEY       Required for the ``gemini`` backend.
+    SPRITE_FORGE_BACKEND One of: auto | openai | gemini. Default: auto.
+    SPRITE_FORGE_MODEL   Override the default model id for the chosen backend.
+"""
+
+from __future__ import annotations
+
+import argparse
+import base64
+import json
+import os
+import sys
+from pathlib import Path
+from typing import Optional
+
+
+DEFAULT_OPENAI_MODEL = "gpt-image-2"
+DEFAULT_GEMINI_MODEL = "gemini-2.5-flash-image"
+
+
+def _err(msg: str, code: int = 1) -> None:
+    print(f"image_gen: {msg}", file=sys.stderr)
+    sys.exit(code)
+
+
+def _detect_backend(requested: str) -> str:
+    if requested != "auto":
+        return requested
+    if os.environ.get("OPENAI_API_KEY"):
+        return "openai"
+    if os.environ.get("GEMINI_API_KEY") or os.environ.get("GOOGLE_API_KEY"):
+        return "gemini"
+    _err(
+        "no backend available. Set OPENAI_API_KEY or GEMINI_API_KEY, "
+        "or pass --backend explicitly."
+    )
+    return ""  # unreachable
+
+
+def _generate_openai(
+    prompt: str,
+    out_path: Path,
+    size: str,
+    reference: Optional[Path],
+    model: str,
+) -> dict:
+    try:
+        from openai import OpenAI
+    except ImportError:
+        _err(
+            "openai SDK not installed. Run: pip install 'openai>=1.50' "
+            "(or install the optional extras: pip install -r requirements-openai.txt)"
+        )
+
+    api_key = os.environ.get("OPENAI_API_KEY")
+    if not api_key:
+        _err("OPENAI_API_KEY is not set")
+
+    client = OpenAI(api_key=api_key)
+
+    if reference is not None:
+        if not reference.exists():
+            _err(f"reference not found: {reference}")
+        with reference.open("rb") as fh:
+            response = client.images.edit(
+                model=model,
+                image=fh,
+                prompt=prompt,
+                size=size,
+            )
+    else:
+        response = client.images.generate(
+            model=model,
+            prompt=prompt,
+            size=size,
+        )
+
+    data = response.data[0]
+    b64 = getattr(data, "b64_json", None)
+    if b64 is None:
+        # Some models return a URL instead of b64
+        url = getattr(data, "url", None)
+        if not url:
+            _err("OpenAI response contained neither b64_json nor url")
+        import urllib.request
+
+        with urllib.request.urlopen(url) as r:
+            out_path.write_bytes(r.read())
+    else:
+        out_path.write_bytes(base64.b64decode(b64))
+
+    return {"backend": "openai", "model": model, "path": str(out_path)}
+
+
+def _generate_gemini(
+    prompt: str,
+    out_path: Path,
+    size: str,
+    reference: Optional[Path],
+    model: str,
+) -> dict:
+    try:
+        from google import genai
+        from google.genai import types
+    except ImportError:
+        _err(
+            "google-genai SDK not installed. Run: pip install 'google-genai>=0.3'"
+        )
+
+    api_key = os.environ.get("GEMINI_API_KEY") or os.environ.get("GOOGLE_API_KEY")
+    if not api_key:
+        _err("GEMINI_API_KEY (or GOOGLE_API_KEY) is not set")
+
+    client = genai.Client(api_key=api_key)
+
+    contents: list = [prompt]
+    if reference is not None:
+        if not reference.exists():
+            _err(f"reference not found: {reference}")
+        contents.append(
+            types.Part.from_bytes(
+                data=reference.read_bytes(),
+                mime_type="image/png",
+            )
+        )
+
+    response = client.models.generate_content(
+        model=model,
+        contents=contents,
+    )
+
+    for part in response.candidates[0].content.parts:
+        if getattr(part, "inline_data", None) is not None:
+            out_path.write_bytes(part.inline_data.data)
+            return {"backend": "gemini", "model": model, "path": str(out_path)}
+
+    _err("Gemini response contained no inline image data")
+    return {}  # unreachable
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--prompt", required=True, help="Creative image prompt.")
+    parser.add_argument(
+        "--out",
+        required=True,
+        type=Path,
+        help="Output PNG path (will be created or overwritten).",
+    )
+    parser.add_argument(
+        "--size",
+        default="1024x1024",
+        help="Image size (e.g. 1024x1024, 1024x1536). Default: 1024x1024.",
+    )
+    parser.add_argument(
+        "--reference",
+        type=Path,
+        default=None,
+        help="Optional reference image for image-edit / image-to-image flows.",
+    )
+    parser.add_argument(
+        "--backend",
+        default=os.environ.get("SPRITE_FORGE_BACKEND", "auto"),
+        choices=["auto", "openai", "gemini"],
+        help="Which provider to use. Default: auto (first with an API key set).",
+    )
+    parser.add_argument(
+        "--model",
+        default=os.environ.get("SPRITE_FORGE_MODEL"),
+        help="Override the default model id for the chosen backend.",
+    )
+    parser.add_argument(
+        "--quiet",
+        action="store_true",
+        help="Suppress the JSON status line on stdout.",
+    )
+    args = parser.parse_args()
+
+    backend = _detect_backend(args.backend)
+    args.out.parent.mkdir(parents=True, exist_ok=True)
+
+    if backend == "openai":
+        model = args.model or DEFAULT_OPENAI_MODEL
+        result = _generate_openai(args.prompt, args.out, args.size, args.reference, model)
+    elif backend == "gemini":
+        model = args.model or DEFAULT_GEMINI_MODEL
+        result = _generate_gemini(args.prompt, args.out, args.size, args.reference, model)
+    else:
+        _err(f"unknown backend: {backend}")
+        return
+
+    if not args.quiet:
+        print(json.dumps(result))
+
+
+if __name__ == "__main__":
+    main()