diff --git a/README.md b/README.md index 68204ce..9542ea7 100644 --- a/README.md +++ b/README.md @@ -274,7 +274,7 @@ Pipeline: image_gen tileset + prop_pack_3x3 + layered_tilemap + separate_props + trigger_zones + Godot_TileMap ``` -Codex-first 2D game asset skills for game-ready 2D sprites, props, FX, and playable map scenes. +Agent-portable 2D game asset skills for game-ready 2D sprites, props, FX, and playable map scenes. Works with Codex (built-in image generation), Claude Code, Cursor, and any other agent that can run a Python script (via `scripts/image_gen.py`, default backend `gpt-image-2`). This repository currently ships two skills: @@ -283,9 +283,9 @@ This repository currently ships two skills: `$generate2dmap` uses `$generate2dsprite` when the chosen map pipeline needs reusable transparent props. Small environmental props can be batched into `2x2`, `3x3`, or `4x4` prop packs, then extracted into individual transparent props. Simple maps can stay as a single baked image. -When a visual reference is involved, both skills use the same wrapper rule: make the image visible in the conversation first. Attached images and freshly generated images are already visible; local files should be opened with `view_image` before asking built-in image generation to preserve identity, style, map layout, or sprite lineage. +When a visual reference is involved, both skills use the same wrapper rule: make the image visible in the conversation first. Attached images and freshly generated images are already visible. Local files should be opened with `view_image` on Codex, with the agent's native file-read tool on Claude Code / Cursor (and similar), or surfaced via `scripts/view_image.py` when no native tool exists, so identity, style, map layout, or sprite lineage is preserved before image edit/reference calls. -Codex is the primary target because Codex already has built-in image generation. That lets one agent handle the full loop: +Codex remains the most ergonomic host because its built-in image generation lets one agent handle the full loop without a separate API call. Other agents reach the same loop via `scripts/image_gen.py`: 1. Plan the asset or map pipeline. 2. Generate the raw sprite sheet, prop, or map image. @@ -314,29 +314,27 @@ The current focus is 2D game assets and map scenes, not full game-pack automatio - Flattened map previews for QA and showcase - Godot-ready editable maps with `TileMapLayer`, separate props, `Area2D` encounter grass, `StaticBody2D` collision, exit zones, and debug player scenes -## Why Codex First +## Supported Agents -This repo is intentionally Codex-first because Codex can generate images directly inside the same workflow. +The skills work with any agent that can run Python scripts. Codex is the most ergonomic host because it ships a built-in `image_gen` tool, but other agents are first-class via a small CLI fallback. -That gives you a much cleaner pipeline: +| Agent | Image generation | Reference handling | +| ----------- | --------------------------------------------- | ----------------------------------------------- | +| Codex | built-in `image_gen` | built-in `view_image` | +| Claude Code | `scripts/image_gen.py` (OpenAI / Gemini) | `Read` tool on the file path | +| Cursor | `scripts/image_gen.py` (OpenAI / Gemini) | Cursor's native file-read tool | +| Generic CLI | `scripts/image_gen.py` (OpenAI / Gemini) | `scripts/view_image.py` shim or stdout metadata | -- No separate image API wiring -- No external sprite backend -- No extra prompt-builder service -- One agent decides the asset plan -- One local processor handles deterministic cleanup and export +Either way, the agent stays the creative brain (asset type, action, bundle shape, sheet layout, frame count, alignment) and the Python scripts only perform deterministic pixel operations and (when needed) the API call to the image backend. -The script is not the creative brain. The agent decides: +### Image generation backends -- Asset type -- Action type -- Bundle shape -- Sheet layout -- Frame count -- Alignment strategy -- Whether detached effects should be kept or filtered +`scripts/image_gen.py` supports two backends: -The Python script only performs deterministic pixel operations. +- **OpenAI** (default) — model `gpt-image-2` via the Images API. Set `OPENAI_API_KEY`. +- **Gemini** — Google `gemini-2.5-flash-image`. Set `GEMINI_API_KEY` (or `GOOGLE_API_KEY`). + +Override selection with `SPRITE_FORGE_BACKEND=openai|gemini` and the model with `SPRITE_FORGE_MODEL=`. Codex users do not need to install either SDK. ## Repository Layout @@ -345,6 +343,9 @@ agent-sprite-forge/ README.md README.zh-TW.md requirements.txt + scripts/ + image_gen.py # agent-agnostic image generation wrapper (OpenAI / Gemini) + view_image.py # optional shim for non-Codex hosts src/ skills/ generate2dmap/ @@ -371,31 +372,43 @@ agent-sprite-forge/ ## Install -### Option 1: Windows PowerShell - -Clone the repo, install the local processor dependencies, then copy both skills into your Codex skills directory: +Pick the section that matches your agent. All paths below assume you cloned the repo and ran the dependency install once. -```powershell +```bash git clone https://github.com/0x0funky/agent-sprite-forge.git -cd .\agent-sprite-forge -python -m pip install -r .\requirements.txt -New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.codex\skills" | Out-Null -Copy-Item -Recurse -Force ` - ".\skills\*" ` - "$env:USERPROFILE\.codex\skills\" +cd ./agent-sprite-forge +python3 -m pip install -r ./requirements.txt ``` -### Option 2: macOS / Linux +### Codex (macOS / Linux) ```bash -git clone https://github.com/0x0funky/agent-sprite-forge.git -cd ./agent-sprite-forge -python3 -m pip install -r ./requirements.txt mkdir -p ~/.codex/skills cp -R ./skills/* ~/.codex/skills/ ``` -Start a new Codex session after installation so the skill is loaded cleanly. +### Codex (Windows PowerShell) + +```powershell +New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.codex\skills" | Out-Null +Copy-Item -Recurse -Force ".\skills\*" "$env:USERPROFILE\.codex\skills\" +``` + +### Claude Code (macOS / Linux) + +```bash +python3 -m pip install "openai>=1.50" # or: "google-genai>=0.3" for Gemini +mkdir -p ~/.claude/skills +cp -R ./skills/* ~/.claude/skills/ +cp -R ./scripts ~/.claude/skills/_shared/ # or keep scripts/ in $PWD; the SKILL.md uses a relative path +export OPENAI_API_KEY= +``` + +### Cursor / generic CLI agent + +The skills are plain markdown plus Python scripts. Point your agent at `skills/generate2dsprite/SKILL.md` (and `skills/generate2dmap/SKILL.md`) and ensure `scripts/image_gen.py` is on the agent's allowed-tools/PATH. Set `OPENAI_API_KEY` (or `GEMINI_API_KEY`). + +Start a new agent session after installation so the skill is loaded cleanly. ## Python Requirements @@ -412,6 +425,17 @@ They are listed in [`requirements.txt`](./requirements.txt). Codex handles image - Alignment and rescaling - Transparent GIF and PNG export +### Optional extras for non-Codex agents + +If you are running the skill from Claude Code, Cursor, or any other agent without a built-in image tool, install the SDK that matches your chosen backend: + +```bash +pip install "openai>=1.50" # OpenAI (default, model: gpt-image-2) +pip install "google-genai>=0.3" # Gemini 2.5 Flash Image +``` + +These are intentionally **not** in `requirements.txt` so Codex users do not need to install them. + ## Suggested Prompts ### Basic diff --git a/requirements-optional.txt b/requirements-optional.txt new file mode 100644 index 0000000..59ce8d6 --- /dev/null +++ b/requirements-optional.txt @@ -0,0 +1,10 @@ +# Optional extras for non-Codex hosts (Claude Code, Cursor, generic CLI agents). +# Only install the line that matches the backend you plan to use with +# scripts/image_gen.py. Codex users do not need either of these because +# Codex's built-in image_gen handles generation directly. + +# OpenAI Images API (default backend, model: gpt-image-2) +openai>=1.50 + +# Google Gemini 2.5 Flash Image +google-genai>=0.3 diff --git a/scripts/image_gen.py b/scripts/image_gen.py new file mode 100755 index 0000000..bacc32b --- /dev/null +++ b/scripts/image_gen.py @@ -0,0 +1,228 @@ +#!/usr/bin/env python3 +"""Agent-agnostic image generation wrapper. + +Codex provides built-in `image_gen`. Other agents (Claude Code, Cursor, generic +CLI agents) shell out to this script instead. It produces the same artifact: +a PNG written to a known path that the local sprite/map post-processor can read. + +Backends, in priority order: + +1. ``openai`` - OpenAI Images API (default model: ``gpt-image-2``). +2. ``gemini`` - Google Gemini 2.5 Flash Image. + +The backend is chosen via ``--backend`` or the ``SPRITE_FORGE_BACKEND`` env var. +``auto`` (default) picks the first backend whose API key is present. + +Usage: + + python scripts/image_gen.py \\ + --prompt "fire mage cast 2x3 sheet, solid #FF00FF background" \\ + --out raw-sheet.png \\ + --size 1024x1024 + + # With a reference image (image edit / variation) + python scripts/image_gen.py \\ + --prompt "same character, walk cycle 4x4" \\ + --reference path/to/character.png \\ + --out walk-sheet.png + +Env vars: + + OPENAI_API_KEY Required for the ``openai`` backend. + GEMINI_API_KEY Required for the ``gemini`` backend. + SPRITE_FORGE_BACKEND One of: auto | openai | gemini. Default: auto. + SPRITE_FORGE_MODEL Override the default model id for the chosen backend. +""" + +from __future__ import annotations + +import argparse +import base64 +import json +import os +import sys +from pathlib import Path +from typing import Optional + + +DEFAULT_OPENAI_MODEL = "gpt-image-2" +DEFAULT_GEMINI_MODEL = "gemini-2.5-flash-image" + + +def _err(msg: str, code: int = 1) -> None: + print(f"image_gen: {msg}", file=sys.stderr) + sys.exit(code) + + +def _detect_backend(requested: str) -> str: + if requested != "auto": + return requested + if os.environ.get("OPENAI_API_KEY"): + return "openai" + if os.environ.get("GEMINI_API_KEY") or os.environ.get("GOOGLE_API_KEY"): + return "gemini" + _err( + "no backend available. Set OPENAI_API_KEY or GEMINI_API_KEY, " + "or pass --backend explicitly." + ) + return "" # unreachable + + +def _generate_openai( + prompt: str, + out_path: Path, + size: str, + reference: Optional[Path], + model: str, +) -> dict: + try: + from openai import OpenAI + except ImportError: + _err( + "openai SDK not installed. Run: pip install 'openai>=1.50' " + "(or install the optional extras: pip install -r requirements-openai.txt)" + ) + + api_key = os.environ.get("OPENAI_API_KEY") + if not api_key: + _err("OPENAI_API_KEY is not set") + + client = OpenAI(api_key=api_key) + + if reference is not None: + if not reference.exists(): + _err(f"reference not found: {reference}") + with reference.open("rb") as fh: + response = client.images.edit( + model=model, + image=fh, + prompt=prompt, + size=size, + ) + else: + response = client.images.generate( + model=model, + prompt=prompt, + size=size, + ) + + data = response.data[0] + b64 = getattr(data, "b64_json", None) + if b64 is None: + # Some models return a URL instead of b64 + url = getattr(data, "url", None) + if not url: + _err("OpenAI response contained neither b64_json nor url") + import urllib.request + + with urllib.request.urlopen(url) as r: + out_path.write_bytes(r.read()) + else: + out_path.write_bytes(base64.b64decode(b64)) + + return {"backend": "openai", "model": model, "path": str(out_path)} + + +def _generate_gemini( + prompt: str, + out_path: Path, + size: str, + reference: Optional[Path], + model: str, +) -> dict: + try: + from google import genai + from google.genai import types + except ImportError: + _err( + "google-genai SDK not installed. Run: pip install 'google-genai>=0.3'" + ) + + api_key = os.environ.get("GEMINI_API_KEY") or os.environ.get("GOOGLE_API_KEY") + if not api_key: + _err("GEMINI_API_KEY (or GOOGLE_API_KEY) is not set") + + client = genai.Client(api_key=api_key) + + contents: list = [prompt] + if reference is not None: + if not reference.exists(): + _err(f"reference not found: {reference}") + contents.append( + types.Part.from_bytes( + data=reference.read_bytes(), + mime_type="image/png", + ) + ) + + response = client.models.generate_content( + model=model, + contents=contents, + ) + + for part in response.candidates[0].content.parts: + if getattr(part, "inline_data", None) is not None: + out_path.write_bytes(part.inline_data.data) + return {"backend": "gemini", "model": model, "path": str(out_path)} + + _err("Gemini response contained no inline image data") + return {} # unreachable + + +def main() -> None: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument("--prompt", required=True, help="Creative image prompt.") + parser.add_argument( + "--out", + required=True, + type=Path, + help="Output PNG path (will be created or overwritten).", + ) + parser.add_argument( + "--size", + default="1024x1024", + help="Image size (e.g. 1024x1024, 1024x1536). Default: 1024x1024.", + ) + parser.add_argument( + "--reference", + type=Path, + default=None, + help="Optional reference image for image-edit / image-to-image flows.", + ) + parser.add_argument( + "--backend", + default=os.environ.get("SPRITE_FORGE_BACKEND", "auto"), + choices=["auto", "openai", "gemini"], + help="Which provider to use. Default: auto (first with an API key set).", + ) + parser.add_argument( + "--model", + default=os.environ.get("SPRITE_FORGE_MODEL"), + help="Override the default model id for the chosen backend.", + ) + parser.add_argument( + "--quiet", + action="store_true", + help="Suppress the JSON status line on stdout.", + ) + args = parser.parse_args() + + backend = _detect_backend(args.backend) + args.out.parent.mkdir(parents=True, exist_ok=True) + + if backend == "openai": + model = args.model or DEFAULT_OPENAI_MODEL + result = _generate_openai(args.prompt, args.out, args.size, args.reference, model) + elif backend == "gemini": + model = args.model or DEFAULT_GEMINI_MODEL + result = _generate_gemini(args.prompt, args.out, args.size, args.reference, model) + else: + _err(f"unknown backend: {backend}") + return + + if not args.quiet: + print(json.dumps(result)) + + +if __name__ == "__main__": + main() diff --git a/scripts/view_image.py b/scripts/view_image.py new file mode 100755 index 0000000..c7614fd --- /dev/null +++ b/scripts/view_image.py @@ -0,0 +1,69 @@ +#!/usr/bin/env python3 +"""Compatibility shim for ``view_image``. + +Codex has a built-in ``view_image`` tool that surfaces a local image into the +conversation context. Other agents (Claude Code, Cursor, generic CLI agents) +typically use a generic file-read tool that already supports images. + +For non-Codex agents the recommended flow is: + +1. Use the agent's native file-read tool on the path. Claude Code's ``Read`` + tool, Cursor's file-read, and most others render images directly. +2. Then call ``scripts/image_gen.py --reference ...`` so the image is + passed to the image-edit endpoint. + +This script is provided for parity. It validates that the path exists and is +an image, prints its dimensions and format, and exits non-zero if the image is +unusable. Useful inside scripted pipelines where the agent wants a single +"reference is ready" signal. +""" + +from __future__ import annotations + +import argparse +import json +import sys +from pathlib import Path + + +def main() -> None: + parser = argparse.ArgumentParser(description=__doc__) + parser.add_argument("path", type=Path, help="Path to a local image file.") + parser.add_argument( + "--quiet", + action="store_true", + help="Suppress the JSON status line on stdout.", + ) + args = parser.parse_args() + + if not args.path.exists(): + print(f"view_image: not found: {args.path}", file=sys.stderr) + sys.exit(1) + + try: + from PIL import Image + except ImportError: + print( + "view_image: Pillow not installed. Run: pip install -r requirements.txt", + file=sys.stderr, + ) + sys.exit(1) + + try: + with Image.open(args.path) as img: + info = { + "path": str(args.path.resolve()), + "format": img.format, + "mode": img.mode, + "size": list(img.size), + } + except Exception as exc: # noqa: BLE001 + print(f"view_image: cannot open {args.path}: {exc}", file=sys.stderr) + sys.exit(1) + + if not args.quiet: + print(json.dumps(info)) + + +if __name__ == "__main__": + main() diff --git a/skills/generate2dmap/SKILL.md b/skills/generate2dmap/SKILL.md index 7ccf50d..e3392b0 100644 --- a/skills/generate2dmap/SKILL.md +++ b/skills/generate2dmap/SKILL.md @@ -1,6 +1,6 @@ --- name: generate2dmap -description: "Generate and revise production-oriented 2D game maps with built-in image generation as the default visual asset source, choosing a visual model, runtime object model, collision model, art direction, and engine/export target. Use when Codex needs to create or integrate RPG maps, monster-taming maps, tactical arenas, battle backgrounds, side-scroller/parallax scenes, tilemaps, layered raster maps, clean HD hand-painted maps, pixel-inspired maps, prop packs, collision zones, walkable areas, or map previews." +description: "Generate and revise production-oriented 2D game maps with image generation as the default visual asset source, choosing a visual model, runtime object model, collision model, art direction, and engine/export target. Use when an agent (Codex, Claude Code, Cursor, or any CLI agent) needs to create or integrate RPG maps, monster-taming maps, tactical arenas, battle backgrounds, side-scroller/parallax scenes, tilemaps, layered raster maps, clean HD hand-painted maps, pixel-inspired maps, prop packs, collision zones, walkable areas, or map previews." --- # Generate2dmap @@ -20,7 +20,10 @@ Read [references/map-strategies.md](references/map-strategies.md) when the pipel ## Image Generation First -This skill is image-generation-first for visual assets. Use built-in `image_gen` as the default creative art source for base maps, dressed references, prop sheets, prop sprites, tileset art, parallax layers, battle backgrounds, and other visible map assets. +This skill is image-generation-first for visual assets. Use the host agent's image-generation tool as the default creative art source for base maps, dressed references, prop sheets, prop sprites, tileset art, parallax layers, battle backgrounds, and other visible map assets. + +- **Codex**: built-in `image_gen`. +- **Claude Code, Cursor, and other agents without a built-in image tool**: shell out to `scripts/image_gen.py` (see the sibling `generate2dsprite` SKILL.md "Image generation backends" section for backend selection and env vars). The agent must write the creative image prompts itself. Do not use scripts to generate creative prompts or to procedurally draw final visual art. Scripts may assemble, slice, chroma-key, crop, validate, compose previews, emit JSON metadata, and wire image-generated assets into engine-native files such as Godot `.tscn` scenes. @@ -66,8 +69,8 @@ When unspecified: - Treat `hybrid` as a result of combining axes, not as a primary category. 3. Produce assets. - - Write the creative prompts manually and use built-in `image_gen` for visible map art unless the user explicitly chose existing assets or procedural placeholders. - - For baked raster maps, generate one background with built-in `image_gen`, or edit/use an existing image when supplied, then add optional collision/zones metadata. + - Write the creative prompts manually and use the host agent's image-generation tool for visible map art unless the user explicitly chose existing assets or procedural placeholders. On Codex this is built-in `image_gen`; on Claude Code, Cursor, or other agents without a built-in image tool, run `scripts/image_gen.py` (see "Image generation backends" in the `generate2dsprite` SKILL.md). + - For baked raster maps, generate one background with the host agent's image tool, or edit/use an existing image when supplied, then add optional collision/zones metadata. - For layered raster maps, generate a ground-only base map first. Then show that base image in context and generate a dressed reference from the visible base before making final props and placements. - For tilemaps, generate or reuse tileset art first, then follow the engine/editor format for layers, objects, collision, and scene files. Do not script-draw the tileset as the final art source. - For parallax scenes, generate background/midground/foreground visual layers first, then produce scroll metadata. @@ -97,7 +100,7 @@ Prop packs save image-generation calls and prompt overhead, but reduce per-prop For layered maps with generated props, prefer this reference pipeline: 1. Generate `assets/map/-base.png` as ground-only terrain. -2. Make the base image visible in conversation context. If the base is a local file, use `view_image` before calling built-in `image_gen`; do not rely on a path string as the reference. +2. Make the base image visible in conversation context. On Codex, use `view_image` for local paths. On other agents, surface the file with the host agent's native file/image read tool, then pass `--reference ` to `scripts/image_gen.py`. Do not rely on a path string in the prompt as the reference. 3. Generate `assets/map/-dressed-reference.png` from the visible base, preserving camera, terrain, size, road/water shapes, anchor pads, and boundaries. Treat this as a planning/reference image, not the final runtime map. 4. Generate one-by-one props or a prop pack based on the dressed reference. 5. Place extracted props over the original base and compose a flattened preview. diff --git a/skills/generate2dsprite/SKILL.md b/skills/generate2dsprite/SKILL.md index 6c8a238..6aff2c5 100644 --- a/skills/generate2dsprite/SKILL.md +++ b/skills/generate2dsprite/SKILL.md @@ -1,6 +1,6 @@ --- name: generate2dsprite -description: "Generate and postprocess general 2D game assets and animation sheets: pixel-art sprites, clean HD map props, creatures, characters, NPCs, spells, projectiles, impacts, props, summons, and transparent GIF exports. Use when Codex should infer the asset plan from a natural-language request, match a reference or map art style, call built-in `image_gen` for solid-magenta raw sheets, and use the local processor only for chroma-key cleanup, frame extraction, alignment, QC, and transparent exports." +description: "Generate and postprocess general 2D game assets and animation sheets: pixel-art sprites, clean HD map props, creatures, characters, NPCs, spells, projectiles, impacts, props, summons, and transparent GIF exports. Use when an agent (Codex, Claude Code, Cursor, or any CLI agent) should infer the asset plan from a natural-language request, match a reference or map art style, generate solid-magenta raw sheets via the agent's image-generation tool or the bundled `scripts/image_gen.py` fallback, and use the local processor only for chroma-key cleanup, frame extraction, alignment, QC, and transparent exports." --- # Generate2dsprite @@ -34,8 +34,8 @@ Read [references/modes.md](references/modes.md) when the request is ambiguous. - Decide the asset plan yourself. Do not force the user to spell out sheet size, frame count, or bundle structure when the request already implies them. - Write the art prompt yourself. Do not default to the prompt-builder script. -- Use built-in `image_gen` for every raw image. -- When the user provides or implies a visual reference, use built-in image edit/reference semantics only after the reference image is visible in the conversation context. If the reference is a local file, call `view_image` first; do not rely on a filesystem path in the prompt as the visual reference. +- Generate every raw image with whichever image-generation tool the host agent provides. On Codex, that is built-in `image_gen`. On Claude Code, Cursor, or any other agent without a built-in image tool, shell out to `scripts/image_gen.py` (see "Image generation backends" below). +- When the user provides or implies a visual reference, use image-edit/reference semantics only after the reference image is visible in the conversation context. On Codex, call `view_image` for local paths. On other agents, surface the reference with the host agent's native file/image read tool, then pass `--reference ` to `scripts/image_gen.py`. Do not rely on a filesystem path in the prompt as the visual reference. - Do not force pixel art when the asset is a map prop for `$generate2dmap` or when the user/project requests a different style. Match the map or reference style first. - Use the script only as a deterministic processor: magenta cleanup, frame splitting, component filtering, scaling, alignment, QC metadata, transparent sheet export, and GIF export. - Do not use scripts to generate the creative image prompt. If a legacy prompt-builder command exists, treat it as historical compatibility only, not the normal skill workflow. @@ -75,7 +75,7 @@ Choose `art_style` before writing the prompt: If a reference is involved: -- Make the reference visible first. For local paths, use `view_image`; for freshly generated references, rely on the image already shown in context. +- Make the reference visible first. On Codex, use `view_image` for local paths. On other agents (Claude Code, Cursor, generic CLI), use the host agent's native file/image read tool. For freshly generated references, rely on the image already shown in context. - State the reference role explicitly: preserve identity/style, create an animation sheet for the same subject, create an evolution/variant, or derive a matching prop/FX. - Preserve the stable identity markers from the reference: silhouette, palette, face/eye features, costume marks, major accessories, and material language. - Let only the requested action or evolution change. Do not redesign the subject unless the user asks. @@ -91,13 +91,21 @@ Keep the strict parts: ### 3. Generate the raw image -Use built-in `image_gen`. +Use the host agent's image-generation tool. -After generation: +- **Codex**: call built-in `image_gen`. Find the raw PNG under `$CODEX_HOME/generated_images/...`, then copy or reference it from the working output folder. +- **Claude Code, Cursor, and other agents without a built-in image tool**: shell out to `scripts/image_gen.py`. Example: -- find the raw PNG under `$CODEX_HOME/generated_images/...` -- copy or reference it from the working output folder -- keep the original generated image in place + ```bash + python scripts/image_gen.py \ + --prompt "" \ + --out work/raw-sheet.png \ + --size 1024x1024 + ``` + + See "Image generation backends" below for backend selection and env vars. + +In both cases, keep the original raw image on disk so it can be regenerated or QA'd later. ### 4. Postprocess locally @@ -161,8 +169,32 @@ For `spell_bundle` or `unit_bundle`, create one folder per asset in the bundle. - use `shared_scale` by default for any multi-frame asset where frame-to-frame consistency matters - use `largest` component mode when detached sparkles or edge debris make the main body unstable +## Image generation backends + +The skill works with any agent that can either (a) produce a PNG via a built-in image tool, or (b) run a Python script. The repo ships `scripts/image_gen.py` as a unified CLI for case (b). + +Supported backends: + +- `openai` (default) - calls the OpenAI Images API. Default model: `gpt-image-2`. +- `gemini` - calls Google Gemini 2.5 Flash Image. + +Selection: + +- `--backend auto` (default) picks the first backend whose API key is available. +- `SPRITE_FORGE_BACKEND=openai|gemini` overrides the default. +- `SPRITE_FORGE_MODEL=` overrides the default model id. + +Required env vars (only for the backend you actually use): + +- `OPENAI_API_KEY` for `openai`. +- `GEMINI_API_KEY` (or `GOOGLE_API_KEY`) for `gemini`. + +Codex users do not need to install the optional SDKs because Codex's built-in `image_gen` handles generation directly. + ## Resources - `references/modes.md`: asset, action, bundle, and sheet selection - `references/prompt-rules.md`: manual prompt patterns and containment rules - `scripts/generate2dsprite.py`: postprocess primitive for cleanup, extraction, alignment, QC, and GIF export +- `../../scripts/image_gen.py`: agent-agnostic image generation wrapper for non-Codex hosts +- `../../scripts/view_image.py`: optional shim that reports image metadata for non-Codex hosts