Fix WebSocket play/training paths + live-play gallery + gitignore hygiene by aifriend · Pull Request #2 · aifriend/atari57-sandbox

aifriend · 2026-05-17T12:34:21Z

Five commits picking up where PR #1 left off — three orthogonal improvements that came out of actually running the console end-to-end against the bundled checkpoints.

Summary

Fix: make WebSocket play + TRAIN 5K actually work (`d2f7a2b`)

Three real bugs that blocked the demo path the UI advertises:

NumPy 2.0 removed np.bool8, but gym 0.25.2 (pinned here) still references it in passive_env_checker.py during env.step. Restored the alias at the top of deep_rl_zoo/gym_env.py so every entry point (run_atari / run_classic / eval_agent / frontend.stream_eval) benefits. Defensive shim kept at the top of stream_eval.py for any future caller that imports gym before deep_rl_zoo.
PER-DQN checkpoint mismatch: frontend/stream_eval.py loaded PER-DQN_Pong_4.ckpt with agent_name="DQN", but the file is stamped "PER-DQN" (matches deep_rl_zoo/prioritized_dqn/run_atari.py:96). ▶ Play silently failed for PER-DQN. Fixed.
No DQN factory in ALGO_FACTORIES — any DQN_<game>.ckpt (including newly TRAIN-spawned ones) couldn't be replayed. Added _build_dqn.

Verified end-to-end by capturing all 4 bundled agents playing live and then training + replaying a fresh DQN/Breakout checkpoint via the same /api/training/start → /api/eval/stream path the UI uses.

Demo: live-play gallery (`9bb6cb9`)

screenshots/live_play/ now contains a self-contained gallery of real ALE frames from the live console paths:

5 trained agents (4 bundled + 1 freshly TRAIN-spawned DQN/Breakout) captured via the WebSocket.
6 random-policy renders across visually distinct games: SpaceInvaders, MsPacman, Asteroids, Q*bert, Seaquest, Boxing — captured directly via deep_rl_zoo.gym_env.create_atari_environment.

Each game has both a 5-frame montage PNG (static, with score progression) and an animated 100–120 frame GIF (motion). gallery.html embeds all of them in a single phosphor-styled page — open in any browser to see every agent and game in one place.

Captures use the same env.unwrapped.render(mode="rgb_array") path the WebSocket eval emits, so what's shown is frame-accurate to what the production UI renders.

Chore: clean `git status` for everyone (`1ff25e5`, `a6e11cb`, `c86edda`)

Every push and PR was leaving behind the same handful of untracked / modified files, polluting reviews and risking accidental staging. Root cause was an incomplete .gitignore plus a tracked file that the training subprocess auto-appends to.

Now ignored:

.claude/ — Claude Code worktree config (per-session)
.cursorindexingignore — Cursor IDE metadata
runs/_training_*.log — per-job stdout from frontend/server.py
.venv/ → .venv so the bare-name pattern catches both real directories and symlinks to a shared parent venv

Untracked (files stay on disk locally, historical snapshots remain in git history):

logs/*_atari_results.csv — all 6 per-algorithm result CSVs that deep_rl_zoo.<algo>.run_atari appends to on every iteration. The dqn one was already dirtying the tree after this session's TRAIN; the other 5 (iqn, per_dqn, ppo_rnd, r2d2, rainbow) would have done the same the moment those algos got trained. Collapsed into a glob so any future algorithm's results file is covered with no further work.

Verified: git check-ignore confirms every previously-leaking path now resolves to an entry in .gitignore, and the working tree is clean after a full TRAIN run.

Changes

File group	What
`deep_rl_zoo/gym_env.py`	NumPy `bool8` shim
`frontend/stream_eval.py`	Defensive shim + PER-DQN agent_name fix + new DQN factory
`screenshots/live_play/`	21 new files — montages, GIFs, `gallery.html`, `summary.txt`
`.gitignore`	+13 / −1 — covers leakers and per-algo result CSVs
`logs/*_atari_results.csv`	6 files removed from tracking (kept on disk locally)

Test plan

./start.sh, then click ▶ Play with each bundled checkpoint (Rainbow/Pong, PER-DQN/Pong, IQN/Pong, PPO-RND/MontezumaRevenge) — all four should stream frames; previously IQN/Rainbow/PPO-RND failed on np.bool8 and PER-DQN failed on the agent_name mismatch
▶ TRAIN 5K on DQN/Pong (or any other algo) — the subprocess should complete with returncode 0 and write a checkpoint; previously it failed on np.bool8 at train_env.reset()
Open screenshots/live_play/gallery.html in a browser — all 11 cards render, GIFs animate
Run a TRAIN, then git status — working tree should stay clean

🤖 Generated with Claude Code

Three real bugs that blocked the bundled-checkpoint demo and the TRAIN 5K button: 1. NumPy 2.0 removed np.bool8, but gym 0.25.2 (pinned by this repo) still references it in passive_env_checker.py during env.step. Restored the alias at the top of deep_rl_zoo/gym_env.py so every entry point (run_atari / run_classic / eval_agent / frontend.stream_eval) benefits. Shim is also kept defensively at the top of stream_eval.py for any future caller that imports gym before deep_rl_zoo. 2. The bundled PER-DQN_Pong_4.ckpt is stamped agent_name="PER-DQN" (matches deep_rl_zoo/prioritized_dqn/run_atari.py:96), but frontend/stream_eval.py loaded it with agent_name="DQN", causing PyTorchCheckpoint.restore to reject the file with 'agent_name "PER-DQN" and "DQN" mismatch.' ▶ Play silently failed for PER-DQN. Fixed to use "PER-DQN". 3. The vanilla DQN algorithm had no factory in ALGO_FACTORIES, so any DQN_<game>.ckpt (including newly TRAIN-spawned ones) could not be replayed. Added _build_dqn matching the same shape as _build_prioritized_dqn. Verified end-to-end by capturing all 4 bundled agents playing live and then training + replaying a fresh DQN/Breakout checkpoint via the same /api/training/start -> /api/eval/stream path the UI uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

screenshots/live_play/ now contains a self-contained gallery of real ALE frames captured from the live console paths: - 5 trained agents (4 bundled + 1 freshly TRAIN-spawned DQN/Breakout) captured via the /api/eval/stream WebSocket - 6 random-policy renders across visually distinct games: SpaceInvaders, MsPacman, Asteroids, Q*bert, Seaquest, Boxing — captured directly via deep_rl_zoo.gym_env.create_atari_environment Each game has both a 5-frame montage PNG (static summary with score progression) and an animated 100-120 frame GIF (motion). gallery.html embeds all of them in a single phosphor-styled page so reviewers can see every agent and game in one place — open in any browser. Captures use the same env.unwrapped.render(mode="rgb_array") path the WebSocket eval emits, so what's shown is frame-accurate to what the production UI renders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After each push/PR, git status always showed the same handful of untracked files, polluting reviews and risking accidental staging. Root cause: the .gitignore was missing entries for files that get created automatically by everyday tooling. Now ignored: - .claude/ — Claude Code worktree config (per-session, not part of the project) - .cursorindexingignore — Cursor IDE metadata - runs/_training_*.log — per-job stdout written by frontend/server.py on /api/training/start (the tensorboard pattern already excludes events.out.tfevents.*, but the FastAPI subprocess logs were leaking) Also tightened: - .venv/ -> .venv — a trailing-slash pattern only matches a real directory; a symlink to a shared parent venv (a common dev convenience) slipped through. The bare-name pattern covers both cases. Verified: git check-ignore confirms all five previously-leaking paths now resolve to an entry in this file. Note: logs/dqn_atari_results.csv is a tracked file that the training subprocess appends to on every run, so it will keep showing as "modified" after any TRAIN. That's a separate decision — either untrack it (git rm --cached) or accept it as research output — and is left to a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

deep_rl_zoo.dqn.run_atari appends a row to this CSV on every iteration, so any TRAIN run left the working tree dirty. Untracked (file stays on disk locally) and added to .gitignore so future trainings don't pollute git status. The previously-tracked rows (10 iterations of 100K-step DQN/Pong training) are preserved in git history if anyone needs them — only fresh clones after this commit lose the snapshot. The other five logs/*_atari_results.csv files (iqn, per_dqn, ppo_rnd, r2d2, rainbow) remain tracked for now. They have the same dynamic and will need the same treatment whenever their algos get trained; left for a separate decision so this commit doesn't quietly drop the historical snapshots for algos not exercised this session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Every deep_rl_zoo.<algo>.run_atari appends a row to its own logs/<algo>_atari_results.csv on each training iteration. Five of these were still tracked (iqn, per_dqn, ppo_rnd, r2d2, rainbow) and would dirty the working tree the moment their algorithm gets trained — same dynamic that already bit us with dqn. Untracked all five (files stay on disk locally) and collapsed the .gitignore entry into a glob (logs/*_atari_results.csv) so this also covers any future algorithm's results file with no further work. Historical snapshots remain in git history; this only removes them from fresh clones going forward. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aifriend and others added 5 commits May 17, 2026 14:19

aifriend merged commit cc21a6e into main May 17, 2026
1 check passed

aifriend deleted the claude/romantic-mcnulty-bbe665 branch May 17, 2026 12:38

aifriend mentioned this pull request May 17, 2026

chore: ignore .specstory/ #3

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix WebSocket play/training paths + live-play gallery + gitignore hygiene#2

Fix WebSocket play/training paths + live-play gallery + gitignore hygiene#2
aifriend merged 5 commits into
mainfrom
claude/romantic-mcnulty-bbe665

aifriend commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aifriend commented May 17, 2026

Summary

Fix: make WebSocket play + TRAIN 5K actually work (d2f7a2b)

Demo: live-play gallery (9bb6cb9)

Chore: clean git status for everyone (1ff25e5, a6e11cb, c86edda)

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: make WebSocket play + TRAIN 5K actually work (`d2f7a2b`)

Demo: live-play gallery (`9bb6cb9`)

Chore: clean `git status` for everyone (`1ff25e5`, `a6e11cb`, `c86edda`)