Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 59 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ Pipeline:
image_gen tileset + prop_pack_3x3 + layered_tilemap + separate_props + trigger_zones + Godot_TileMap
```

Codex-first 2D game asset skills for game-ready 2D sprites, props, FX, and playable map scenes.
Agent-portable 2D game asset skills for game-ready 2D sprites, props, FX, and playable map scenes. Works with Codex (built-in image generation), Claude Code, Cursor, and any other agent that can run a Python script (via `scripts/image_gen.py`, default backend `gpt-image-2`).

This repository currently ships two skills:

Expand All @@ -283,9 +283,9 @@ This repository currently ships two skills:

`$generate2dmap` uses `$generate2dsprite` when the chosen map pipeline needs reusable transparent props. Small environmental props can be batched into `2x2`, `3x3`, or `4x4` prop packs, then extracted into individual transparent props. Simple maps can stay as a single baked image.

When a visual reference is involved, both skills use the same wrapper rule: make the image visible in the conversation first. Attached images and freshly generated images are already visible; local files should be opened with `view_image` before asking built-in image generation to preserve identity, style, map layout, or sprite lineage.
When a visual reference is involved, both skills use the same wrapper rule: make the image visible in the conversation first. Attached images and freshly generated images are already visible. Local files should be opened with `view_image` on Codex, with the agent's native file-read tool on Claude Code / Cursor (and similar), or surfaced via `scripts/view_image.py` when no native tool exists, so identity, style, map layout, or sprite lineage is preserved before image edit/reference calls.

Codex is the primary target because Codex already has built-in image generation. That lets one agent handle the full loop:
Codex remains the most ergonomic host because its built-in image generation lets one agent handle the full loop without a separate API call. Other agents reach the same loop via `scripts/image_gen.py`:

1. Plan the asset or map pipeline.
2. Generate the raw sprite sheet, prop, or map image.
Expand Down Expand Up @@ -314,29 +314,27 @@ The current focus is 2D game assets and map scenes, not full game-pack automatio
- Flattened map previews for QA and showcase
- Godot-ready editable maps with `TileMapLayer`, separate props, `Area2D` encounter grass, `StaticBody2D` collision, exit zones, and debug player scenes

## Why Codex First
## Supported Agents

This repo is intentionally Codex-first because Codex can generate images directly inside the same workflow.
The skills work with any agent that can run Python scripts. Codex is the most ergonomic host because it ships a built-in `image_gen` tool, but other agents are first-class via a small CLI fallback.

That gives you a much cleaner pipeline:
| Agent | Image generation | Reference handling |
| ----------- | --------------------------------------------- | ----------------------------------------------- |
| Codex | built-in `image_gen` | built-in `view_image` |
| Claude Code | `scripts/image_gen.py` (OpenAI / Gemini) | `Read` tool on the file path |
| Cursor | `scripts/image_gen.py` (OpenAI / Gemini) | Cursor's native file-read tool |
| Generic CLI | `scripts/image_gen.py` (OpenAI / Gemini) | `scripts/view_image.py` shim or stdout metadata |

- No separate image API wiring
- No external sprite backend
- No extra prompt-builder service
- One agent decides the asset plan
- One local processor handles deterministic cleanup and export
Either way, the agent stays the creative brain (asset type, action, bundle shape, sheet layout, frame count, alignment) and the Python scripts only perform deterministic pixel operations and (when needed) the API call to the image backend.

The script is not the creative brain. The agent decides:
### Image generation backends

- Asset type
- Action type
- Bundle shape
- Sheet layout
- Frame count
- Alignment strategy
- Whether detached effects should be kept or filtered
`scripts/image_gen.py` supports two backends:

The Python script only performs deterministic pixel operations.
- **OpenAI** (default) — model `gpt-image-2` via the Images API. Set `OPENAI_API_KEY`.
- **Gemini** — Google `gemini-2.5-flash-image`. Set `GEMINI_API_KEY` (or `GOOGLE_API_KEY`).

Override selection with `SPRITE_FORGE_BACKEND=openai|gemini` and the model with `SPRITE_FORGE_MODEL=<id>`. Codex users do not need to install either SDK.

## Repository Layout

Expand All @@ -345,6 +343,9 @@ agent-sprite-forge/
README.md
README.zh-TW.md
requirements.txt
scripts/
image_gen.py # agent-agnostic image generation wrapper (OpenAI / Gemini)
view_image.py # optional shim for non-Codex hosts
src/
skills/
generate2dmap/
Expand All @@ -371,31 +372,43 @@ agent-sprite-forge/

## Install

### Option 1: Windows PowerShell

Clone the repo, install the local processor dependencies, then copy both skills into your Codex skills directory:
Pick the section that matches your agent. All paths below assume you cloned the repo and ran the dependency install once.

```powershell
```bash
git clone https://github.com/0x0funky/agent-sprite-forge.git
cd .\agent-sprite-forge
python -m pip install -r .\requirements.txt
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.codex\skills" | Out-Null
Copy-Item -Recurse -Force `
".\skills\*" `
"$env:USERPROFILE\.codex\skills\"
cd ./agent-sprite-forge
python3 -m pip install -r ./requirements.txt
```

### Option 2: macOS / Linux
### Codex (macOS / Linux)

```bash
git clone https://github.com/0x0funky/agent-sprite-forge.git
cd ./agent-sprite-forge
python3 -m pip install -r ./requirements.txt
mkdir -p ~/.codex/skills
cp -R ./skills/* ~/.codex/skills/
```

Start a new Codex session after installation so the skill is loaded cleanly.
### Codex (Windows PowerShell)

```powershell
New-Item -ItemType Directory -Force -Path "$env:USERPROFILE\.codex\skills" | Out-Null
Copy-Item -Recurse -Force ".\skills\*" "$env:USERPROFILE\.codex\skills\"
```

### Claude Code (macOS / Linux)

```bash
python3 -m pip install "openai>=1.50" # or: "google-genai>=0.3" for Gemini
mkdir -p ~/.claude/skills
cp -R ./skills/* ~/.claude/skills/
cp -R ./scripts ~/.claude/skills/_shared/ # or keep scripts/ in $PWD; the SKILL.md uses a relative path
export OPENAI_API_KEY=<your key>
```

### Cursor / generic CLI agent

The skills are plain markdown plus Python scripts. Point your agent at `skills/generate2dsprite/SKILL.md` (and `skills/generate2dmap/SKILL.md`) and ensure `scripts/image_gen.py` is on the agent's allowed-tools/PATH. Set `OPENAI_API_KEY` (or `GEMINI_API_KEY`).

Start a new agent session after installation so the skill is loaded cleanly.

## Python Requirements

Expand All @@ -412,6 +425,17 @@ They are listed in [`requirements.txt`](./requirements.txt). Codex handles image
- Alignment and rescaling
- Transparent GIF and PNG export

### Optional extras for non-Codex agents

If you are running the skill from Claude Code, Cursor, or any other agent without a built-in image tool, install the SDK that matches your chosen backend:

```bash
pip install "openai>=1.50" # OpenAI (default, model: gpt-image-2)
pip install "google-genai>=0.3" # Gemini 2.5 Flash Image
```

These are intentionally **not** in `requirements.txt` so Codex users do not need to install them.

## Suggested Prompts

### Basic
Expand Down
10 changes: 10 additions & 0 deletions requirements-optional.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Optional extras for non-Codex hosts (Claude Code, Cursor, generic CLI agents).
# Only install the line that matches the backend you plan to use with
# scripts/image_gen.py. Codex users do not need either of these because
# Codex's built-in image_gen handles generation directly.

# OpenAI Images API (default backend, model: gpt-image-2)
openai>=1.50

# Google Gemini 2.5 Flash Image
google-genai>=0.3
228 changes: 228 additions & 0 deletions scripts/image_gen.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
#!/usr/bin/env python3
"""Agent-agnostic image generation wrapper.

Codex provides built-in `image_gen`. Other agents (Claude Code, Cursor, generic
CLI agents) shell out to this script instead. It produces the same artifact:
a PNG written to a known path that the local sprite/map post-processor can read.

Backends, in priority order:

1. ``openai`` - OpenAI Images API (default model: ``gpt-image-2``).
2. ``gemini`` - Google Gemini 2.5 Flash Image.

The backend is chosen via ``--backend`` or the ``SPRITE_FORGE_BACKEND`` env var.
``auto`` (default) picks the first backend whose API key is present.

Usage:

python scripts/image_gen.py \\
--prompt "fire mage cast 2x3 sheet, solid #FF00FF background" \\
--out raw-sheet.png \\
--size 1024x1024

# With a reference image (image edit / variation)
python scripts/image_gen.py \\
--prompt "same character, walk cycle 4x4" \\
--reference path/to/character.png \\
--out walk-sheet.png

Env vars:

OPENAI_API_KEY Required for the ``openai`` backend.
GEMINI_API_KEY Required for the ``gemini`` backend.
SPRITE_FORGE_BACKEND One of: auto | openai | gemini. Default: auto.
SPRITE_FORGE_MODEL Override the default model id for the chosen backend.
"""

from __future__ import annotations

import argparse
import base64
import json
import os
import sys
from pathlib import Path
from typing import Optional


DEFAULT_OPENAI_MODEL = "gpt-image-2"
DEFAULT_GEMINI_MODEL = "gemini-2.5-flash-image"


def _err(msg: str, code: int = 1) -> None:
print(f"image_gen: {msg}", file=sys.stderr)
sys.exit(code)


def _detect_backend(requested: str) -> str:
if requested != "auto":
return requested
if os.environ.get("OPENAI_API_KEY"):
return "openai"
if os.environ.get("GEMINI_API_KEY") or os.environ.get("GOOGLE_API_KEY"):
return "gemini"
_err(
"no backend available. Set OPENAI_API_KEY or GEMINI_API_KEY, "
"or pass --backend explicitly."
)
return "" # unreachable


def _generate_openai(
prompt: str,
out_path: Path,
size: str,
reference: Optional[Path],
model: str,
) -> dict:
try:
from openai import OpenAI
except ImportError:
_err(
"openai SDK not installed. Run: pip install 'openai>=1.50' "
"(or install the optional extras: pip install -r requirements-openai.txt)"
)

api_key = os.environ.get("OPENAI_API_KEY")
if not api_key:
_err("OPENAI_API_KEY is not set")

client = OpenAI(api_key=api_key)

if reference is not None:
if not reference.exists():
_err(f"reference not found: {reference}")
with reference.open("rb") as fh:
response = client.images.edit(
model=model,
image=fh,
prompt=prompt,
size=size,
)
else:
response = client.images.generate(
model=model,
prompt=prompt,
size=size,
)

data = response.data[0]
b64 = getattr(data, "b64_json", None)
if b64 is None:
# Some models return a URL instead of b64
url = getattr(data, "url", None)
if not url:
_err("OpenAI response contained neither b64_json nor url")
import urllib.request

with urllib.request.urlopen(url) as r:
out_path.write_bytes(r.read())
else:
out_path.write_bytes(base64.b64decode(b64))

return {"backend": "openai", "model": model, "path": str(out_path)}


def _generate_gemini(
prompt: str,
out_path: Path,
size: str,
reference: Optional[Path],
model: str,
) -> dict:
try:
from google import genai
from google.genai import types
except ImportError:
_err(
"google-genai SDK not installed. Run: pip install 'google-genai>=0.3'"
)

api_key = os.environ.get("GEMINI_API_KEY") or os.environ.get("GOOGLE_API_KEY")
if not api_key:
_err("GEMINI_API_KEY (or GOOGLE_API_KEY) is not set")

client = genai.Client(api_key=api_key)

contents: list = [prompt]
if reference is not None:
if not reference.exists():
_err(f"reference not found: {reference}")
contents.append(
types.Part.from_bytes(
data=reference.read_bytes(),
mime_type="image/png",
)
)

response = client.models.generate_content(
model=model,
contents=contents,
)

for part in response.candidates[0].content.parts:
if getattr(part, "inline_data", None) is not None:
out_path.write_bytes(part.inline_data.data)
return {"backend": "gemini", "model": model, "path": str(out_path)}

_err("Gemini response contained no inline image data")
return {} # unreachable


def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--prompt", required=True, help="Creative image prompt.")
parser.add_argument(
"--out",
required=True,
type=Path,
help="Output PNG path (will be created or overwritten).",
)
parser.add_argument(
"--size",
default="1024x1024",
help="Image size (e.g. 1024x1024, 1024x1536). Default: 1024x1024.",
)
parser.add_argument(
"--reference",
type=Path,
default=None,
help="Optional reference image for image-edit / image-to-image flows.",
)
parser.add_argument(
"--backend",
default=os.environ.get("SPRITE_FORGE_BACKEND", "auto"),
choices=["auto", "openai", "gemini"],
help="Which provider to use. Default: auto (first with an API key set).",
)
parser.add_argument(
"--model",
default=os.environ.get("SPRITE_FORGE_MODEL"),
help="Override the default model id for the chosen backend.",
)
parser.add_argument(
"--quiet",
action="store_true",
help="Suppress the JSON status line on stdout.",
)
args = parser.parse_args()

backend = _detect_backend(args.backend)
args.out.parent.mkdir(parents=True, exist_ok=True)

if backend == "openai":
model = args.model or DEFAULT_OPENAI_MODEL
result = _generate_openai(args.prompt, args.out, args.size, args.reference, model)
elif backend == "gemini":
model = args.model or DEFAULT_GEMINI_MODEL
result = _generate_gemini(args.prompt, args.out, args.size, args.reference, model)
else:
_err(f"unknown backend: {backend}")
return

if not args.quiet:
print(json.dumps(result))


if __name__ == "__main__":
main()
Loading