Agentic ComfyUI image & video generation — self-evolving skills, human in the loop
Zongxia Li* · Dawei Liu* · Fuxiao Liu · Yuhang Zhou · Xiyang Wu · Jingxi Chen · Jing Xie · Xiaomin Wu · Lichao Sun
ComfyClaw is an agentic harness that drives an unmodified ComfyUI runtime from a panel inside ComfyUI itself. You type a prompt, and the agent builds or improves the workflow as typed graph edits, renders a candidate, and uses a region-level VLM verifier to turn visual failures into targeted repairs. Successful and failed trajectories are distilled into a progressively disclosed skill library that grows across runs, so workflow competence accumulates instead of being rediscovered on every prompt.
📄 This is the reference implementation of the paper An Agentic Harness for Skill-Evolving Image Generation Workflows (2026). For benchmark tables, method figures, and the qualitative study, see
docs/RESULTS.md.
The ComfyClaw panel lives inside ComfyUI — the agent reasons, edits the graph, validates, and renders, all on the live canvas.
- Generate from inside ComfyUI — type a prompt, click Generate, watch the agent work. No terminal interaction needed.
- Build or improve — construct a whole workflow from scratch, or iterate on the one already on your canvas, with nodes appearing one-by-one as it builds.
- Manual / Auto / Co-pilot modes — single pass, full VLM self-optimization loop, or VLM scoring with human accept-or-override per iteration.
- Live scoreboard — every iteration shows a score, verifier critique, and an "Accept now" button to stop early.
- Human-in-the-loop — thumbs up/down, comments, and opt-in skill evolution directly from the panel after generation.
- Self-evolving skills — reusable lessons are distilled from good and bad runs and (with your approval) committed to a growing skill library.
- Any agent backend — LiteLLM (Anthropic, OpenAI, Gemini, Ollama, 100+
providers) or a signed-in CLI agent (
claude,codex,gemini).
ComfyClaw is managed with uv. The flow is:
install the package, install the bundled ComfyUI plugin once, then run the
ComfyClaw server alongside ComfyUI.
| Requirement | Notes |
|---|---|
| Python 3.10+ | 3.12+ recommended |
| ComfyUI | Desktop app, local checkout, or a deployed server reachable over HTTP |
| A model in ComfyUI | ComfyClaw builds workflows; ComfyUI still needs the referenced checkpoints / LoRAs / VAEs |
| An agent backend | A LiteLLM provider key, local Ollama, or a signed-in CLI backend (claude, codex, gemini) |
git clone https://github.com/Moms-Organic-Agent-Lab/comfyclaw.git
cd comfyclaw
uv sync --extra sync # add --extra all for video supportcp .env.example .env
$EDITOR .envSet COMFYUI_ADDR, optionally COMFYUI_DIR, and either a LiteLLM provider key
or a CLI backend. .env is loaded automatically; CLI flags override it.
uv run comfyclaw install-node # local ComfyUI app/checkout
uv run comfyclaw install-node --comfyui-dir /path/to/ComfyUIRestart ComfyUI after this step. For a remote/deployed ComfyUI, copy the
directory printed by uv run comfyclaw node-path into that server's
custom_nodes/ComfyClaw-Sync/ and restart it.
uv run comfyclaw doctor # optional pre-flight check
uv run comfyclaw serveOpen ComfyUI (usually http://127.0.0.1:8188). The ComfyClaw panel appears
in the UI — enter a prompt, choose Scratch or Improve, and click
Generate.
uv run comfyclaw serve --comfyui-addr comfyui.example.com:8188The browser must reach the ComfyClaw WebSocket port (default 8765); use an SSH
tunnel or reverse proxy if needed. If you cannot install the plugin remotely,
use CLI mode (below) against --comfyui-addr.
See docs/USAGE.md for remote networking, panel controls, and
troubleshooting, and docs/LOCAL_LLM_AND_MODELS.md
for local vLLM, Wan2.2 video, and Qwen-Image setup.
Pick the level of automation in the panel (or with --mode):
- Manual — one pass, no verifier.
- Auto — full VLM-driven self-optimization loop.
- Co-pilot — VLM scores each iteration; you accept or override.
Co-pilot mode: each iteration emits a live score card and verifier critique; refine in chat or click Accept now to stop early.
After a generation, the panel can ask for your feedback: it shows the rendered image alongside the VLM's region-level pass/fail checks and detail score. Mark it thumbs-up / thumbs-down, add a comment, and choose whether the case should feed skill evolution.
Human feedback: review the generated image, the VLM score, and the requirement-level checks before rating it.
ComfyClaw's skills follow the Agent Skills spec:
each skill is a directory with a SKILL.md (YAML frontmatter + body).
Progressive disclosure keeps context lean — only name + description
appear at startup, and the agent calls read_skill("name") to load the full
instructions on demand. Manage them in the panel's Skills tab (toggle,
import from folder / .zip / git URL) — imports persist under
~/.comfyclaw/skills/.
After a verified run, ComfyClaw distills reusable lessons from good and bad cases and proposes a new or updated skill. By default the proposal is shown for your review before anything is written; approved skills join your user skill library and are reloaded immediately.
Skill-evolution review: inspect the proposed skill, its rationale and
evidence, and the draft SKILL.md before approving.
A full skill guide (built-in skills, authoring custom skills, the panel
browser) is in docs/USAGE.md.
uv run comfyclaw run --prompt "a red fox at dawn, photorealistic, DSLR" --iterations 3
uv run comfyclaw run --workflow my_workflow_api.json --prompt "make it a rainy neon street"
uv run comfyclaw dry-run --prompt "build a portrait workflow"Outputs are saved under ./comfyclaw_output/ unless --output-dir is set. Run
uv run comfyclaw <command> --help for the full flag list.
| Doc | Contents |
|---|---|
docs/USAGE.md |
Panel controls, deployed ComfyUI, CLI backends, skills, troubleshooting |
docs/ARCHITECTURE.md |
Code map: harness loop, agent tools, backends, verifier, skills, sync protocol |
docs/LOCAL_LLM_AND_MODELS.md |
Local vLLM, video, and Qwen-Image setup |
docs/RESULTS.md |
Benchmark tables, method figures, qualitative study |
docs/REPRODUCING.md |
Step-by-step reproducibility guide |
If you use ComfyClaw in academic work, please cite the paper:
@article{li2026comfyclaw,
title = {An Agentic Harness for Skill-Evolving Image Generation Workflows},
author = {Li, Zongxia and Liu, Dawei and Chen, Jingxi and Wu, Xiyang and
Liu, Fuxiao and Zhou, Yuhang and Xie, Jing and Wu, Xiaomin and
Sun, Lichao},
journal = {TBD},
year = {2026},
note = {Software available at \url{https://github.com/Moms-Organic-Agent-Lab/comfyclaw}}
}Machine-readable metadata is in CITATION.cff. Please update the
BibTeX entry with the final venue / DOI once the paper is posted.
ComfyClaw is released under the GNU General Public License v3.0, matching
ComfyUI's license (ComfyClaw is a
plugin / derivative work). See LICENSE for the full text. The
bundled skill-creator skill is Apache-2.0; see the notice at the end of
LICENSE.



