Skip to content

Fix WebSocket play/training paths + live-play gallery + gitignore hygiene#2

Merged
aifriend merged 5 commits into
mainfrom
claude/romantic-mcnulty-bbe665
May 17, 2026
Merged

Fix WebSocket play/training paths + live-play gallery + gitignore hygiene#2
aifriend merged 5 commits into
mainfrom
claude/romantic-mcnulty-bbe665

Conversation

@aifriend
Copy link
Copy Markdown
Owner

Five commits picking up where PR #1 left off — three orthogonal improvements that came out of actually running the console end-to-end against the bundled checkpoints.

Summary

Fix: make WebSocket play + TRAIN 5K actually work (d2f7a2b)

Three real bugs that blocked the demo path the UI advertises:

  1. NumPy 2.0 removed np.bool8, but gym 0.25.2 (pinned here) still references it in passive_env_checker.py during env.step. Restored the alias at the top of deep_rl_zoo/gym_env.py so every entry point (run_atari / run_classic / eval_agent / frontend.stream_eval) benefits. Defensive shim kept at the top of stream_eval.py for any future caller that imports gym before deep_rl_zoo.
  2. PER-DQN checkpoint mismatch: frontend/stream_eval.py loaded PER-DQN_Pong_4.ckpt with agent_name="DQN", but the file is stamped "PER-DQN" (matches deep_rl_zoo/prioritized_dqn/run_atari.py:96). ▶ Play silently failed for PER-DQN. Fixed.
  3. No DQN factory in ALGO_FACTORIES — any DQN_<game>.ckpt (including newly TRAIN-spawned ones) couldn't be replayed. Added _build_dqn.

Verified end-to-end by capturing all 4 bundled agents playing live and then training + replaying a fresh DQN/Breakout checkpoint via the same /api/training/start/api/eval/stream path the UI uses.

Demo: live-play gallery (9bb6cb9)

screenshots/live_play/ now contains a self-contained gallery of real ALE frames from the live console paths:

  • 5 trained agents (4 bundled + 1 freshly TRAIN-spawned DQN/Breakout) captured via the WebSocket.
  • 6 random-policy renders across visually distinct games: SpaceInvaders, MsPacman, Asteroids, Q*bert, Seaquest, Boxing — captured directly via deep_rl_zoo.gym_env.create_atari_environment.

Each game has both a 5-frame montage PNG (static, with score progression) and an animated 100–120 frame GIF (motion). gallery.html embeds all of them in a single phosphor-styled page — open in any browser to see every agent and game in one place.

Captures use the same env.unwrapped.render(mode="rgb_array") path the WebSocket eval emits, so what's shown is frame-accurate to what the production UI renders.

Chore: clean git status for everyone (1ff25e5, a6e11cb, c86edda)

Every push and PR was leaving behind the same handful of untracked / modified files, polluting reviews and risking accidental staging. Root cause was an incomplete .gitignore plus a tracked file that the training subprocess auto-appends to.

Now ignored:

  • .claude/ — Claude Code worktree config (per-session)
  • .cursorindexingignore — Cursor IDE metadata
  • runs/_training_*.log — per-job stdout from frontend/server.py
  • .venv/.venv so the bare-name pattern catches both real directories and symlinks to a shared parent venv

Untracked (files stay on disk locally, historical snapshots remain in git history):

  • logs/*_atari_results.csv — all 6 per-algorithm result CSVs that deep_rl_zoo.<algo>.run_atari appends to on every iteration. The dqn one was already dirtying the tree after this session's TRAIN; the other 5 (iqn, per_dqn, ppo_rnd, r2d2, rainbow) would have done the same the moment those algos got trained. Collapsed into a glob so any future algorithm's results file is covered with no further work.

Verified: git check-ignore confirms every previously-leaking path now resolves to an entry in .gitignore, and the working tree is clean after a full TRAIN run.

Changes

File group What
deep_rl_zoo/gym_env.py NumPy bool8 shim
frontend/stream_eval.py Defensive shim + PER-DQN agent_name fix + new DQN factory
screenshots/live_play/ 21 new files — montages, GIFs, gallery.html, summary.txt
.gitignore +13 / −1 — covers leakers and per-algo result CSVs
logs/*_atari_results.csv 6 files removed from tracking (kept on disk locally)

Test plan

  • ./start.sh, then click ▶ Play with each bundled checkpoint (Rainbow/Pong, PER-DQN/Pong, IQN/Pong, PPO-RND/MontezumaRevenge) — all four should stream frames; previously IQN/Rainbow/PPO-RND failed on np.bool8 and PER-DQN failed on the agent_name mismatch
  • ▶ TRAIN 5K on DQN/Pong (or any other algo) — the subprocess should complete with returncode 0 and write a checkpoint; previously it failed on np.bool8 at train_env.reset()
  • Open screenshots/live_play/gallery.html in a browser — all 11 cards render, GIFs animate
  • Run a TRAIN, then git status — working tree should stay clean

🤖 Generated with Claude Code

aifriend and others added 5 commits May 17, 2026 14:19
Three real bugs that blocked the bundled-checkpoint demo and the
TRAIN 5K button:

1. NumPy 2.0 removed np.bool8, but gym 0.25.2 (pinned by this repo)
   still references it in passive_env_checker.py during env.step.
   Restored the alias at the top of deep_rl_zoo/gym_env.py so every
   entry point (run_atari / run_classic / eval_agent /
   frontend.stream_eval) benefits. Shim is also kept defensively at
   the top of stream_eval.py for any future caller that imports gym
   before deep_rl_zoo.

2. The bundled PER-DQN_Pong_4.ckpt is stamped agent_name="PER-DQN"
   (matches deep_rl_zoo/prioritized_dqn/run_atari.py:96), but
   frontend/stream_eval.py loaded it with agent_name="DQN", causing
   PyTorchCheckpoint.restore to reject the file with
   'agent_name "PER-DQN" and "DQN" mismatch.' ▶ Play silently failed
   for PER-DQN. Fixed to use "PER-DQN".

3. The vanilla DQN algorithm had no factory in ALGO_FACTORIES, so
   any DQN_<game>.ckpt (including newly TRAIN-spawned ones) could
   not be replayed. Added _build_dqn matching the same shape as
   _build_prioritized_dqn.

Verified end-to-end by capturing all 4 bundled agents playing live
and then training + replaying a fresh DQN/Breakout checkpoint via
the same /api/training/start -> /api/eval/stream path the UI uses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
screenshots/live_play/ now contains a self-contained gallery of real
ALE frames captured from the live console paths:

  - 5 trained agents (4 bundled + 1 freshly TRAIN-spawned DQN/Breakout)
    captured via the /api/eval/stream WebSocket
  - 6 random-policy renders across visually distinct games:
    SpaceInvaders, MsPacman, Asteroids, Q*bert, Seaquest, Boxing —
    captured directly via deep_rl_zoo.gym_env.create_atari_environment

Each game has both a 5-frame montage PNG (static summary with score
progression) and an animated 100-120 frame GIF (motion). gallery.html
embeds all of them in a single phosphor-styled page so reviewers can
see every agent and game in one place — open in any browser.

Captures use the same env.unwrapped.render(mode="rgb_array") path the
WebSocket eval emits, so what's shown is frame-accurate to what the
production UI renders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After each push/PR, git status always showed the same handful of
untracked files, polluting reviews and risking accidental staging.
Root cause: the .gitignore was missing entries for files that get
created automatically by everyday tooling.

Now ignored:
  - .claude/                — Claude Code worktree config (per-session,
                              not part of the project)
  - .cursorindexingignore   — Cursor IDE metadata
  - runs/_training_*.log    — per-job stdout written by
                              frontend/server.py on /api/training/start
                              (the tensorboard pattern already excludes
                              events.out.tfevents.*, but the FastAPI
                              subprocess logs were leaking)

Also tightened:
  - .venv/  ->  .venv       — a trailing-slash pattern only matches a
                              real directory; a symlink to a shared
                              parent venv (a common dev convenience)
                              slipped through. The bare-name pattern
                              covers both cases.

Verified: git check-ignore confirms all five previously-leaking paths
now resolve to an entry in this file.

Note: logs/dqn_atari_results.csv is a tracked file that the training
subprocess appends to on every run, so it will keep showing as
"modified" after any TRAIN. That's a separate decision — either
untrack it (git rm --cached) or accept it as research output — and
is left to a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
deep_rl_zoo.dqn.run_atari appends a row to this CSV on every iteration,
so any TRAIN run left the working tree dirty. Untracked (file stays on
disk locally) and added to .gitignore so future trainings don't pollute
git status.

The previously-tracked rows (10 iterations of 100K-step DQN/Pong
training) are preserved in git history if anyone needs them — only
fresh clones after this commit lose the snapshot.

The other five logs/*_atari_results.csv files (iqn, per_dqn, ppo_rnd,
r2d2, rainbow) remain tracked for now. They have the same dynamic and
will need the same treatment whenever their algos get trained; left
for a separate decision so this commit doesn't quietly drop the
historical snapshots for algos not exercised this session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every deep_rl_zoo.<algo>.run_atari appends a row to its own
logs/<algo>_atari_results.csv on each training iteration. Five of
these were still tracked (iqn, per_dqn, ppo_rnd, r2d2, rainbow) and
would dirty the working tree the moment their algorithm gets trained
— same dynamic that already bit us with dqn.

Untracked all five (files stay on disk locally) and collapsed the
.gitignore entry into a glob (logs/*_atari_results.csv) so this also
covers any future algorithm's results file with no further work.

Historical snapshots remain in git history; this only removes them
from fresh clones going forward.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aifriend aifriend merged commit cc21a6e into main May 17, 2026
1 check passed
@aifriend aifriend deleted the claude/romantic-mcnulty-bbe665 branch May 17, 2026 12:38
@aifriend aifriend mentioned this pull request May 17, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant