Fix WebSocket play/training paths + live-play gallery + gitignore hygiene#2
Merged
Conversation
Three real bugs that blocked the bundled-checkpoint demo and the TRAIN 5K button: 1. NumPy 2.0 removed np.bool8, but gym 0.25.2 (pinned by this repo) still references it in passive_env_checker.py during env.step. Restored the alias at the top of deep_rl_zoo/gym_env.py so every entry point (run_atari / run_classic / eval_agent / frontend.stream_eval) benefits. Shim is also kept defensively at the top of stream_eval.py for any future caller that imports gym before deep_rl_zoo. 2. The bundled PER-DQN_Pong_4.ckpt is stamped agent_name="PER-DQN" (matches deep_rl_zoo/prioritized_dqn/run_atari.py:96), but frontend/stream_eval.py loaded it with agent_name="DQN", causing PyTorchCheckpoint.restore to reject the file with 'agent_name "PER-DQN" and "DQN" mismatch.' ▶ Play silently failed for PER-DQN. Fixed to use "PER-DQN". 3. The vanilla DQN algorithm had no factory in ALGO_FACTORIES, so any DQN_<game>.ckpt (including newly TRAIN-spawned ones) could not be replayed. Added _build_dqn matching the same shape as _build_prioritized_dqn. Verified end-to-end by capturing all 4 bundled agents playing live and then training + replaying a fresh DQN/Breakout checkpoint via the same /api/training/start -> /api/eval/stream path the UI uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
screenshots/live_play/ now contains a self-contained gallery of real
ALE frames captured from the live console paths:
- 5 trained agents (4 bundled + 1 freshly TRAIN-spawned DQN/Breakout)
captured via the /api/eval/stream WebSocket
- 6 random-policy renders across visually distinct games:
SpaceInvaders, MsPacman, Asteroids, Q*bert, Seaquest, Boxing —
captured directly via deep_rl_zoo.gym_env.create_atari_environment
Each game has both a 5-frame montage PNG (static summary with score
progression) and an animated 100-120 frame GIF (motion). gallery.html
embeds all of them in a single phosphor-styled page so reviewers can
see every agent and game in one place — open in any browser.
Captures use the same env.unwrapped.render(mode="rgb_array") path the
WebSocket eval emits, so what's shown is frame-accurate to what the
production UI renders.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After each push/PR, git status always showed the same handful of
untracked files, polluting reviews and risking accidental staging.
Root cause: the .gitignore was missing entries for files that get
created automatically by everyday tooling.
Now ignored:
- .claude/ — Claude Code worktree config (per-session,
not part of the project)
- .cursorindexingignore — Cursor IDE metadata
- runs/_training_*.log — per-job stdout written by
frontend/server.py on /api/training/start
(the tensorboard pattern already excludes
events.out.tfevents.*, but the FastAPI
subprocess logs were leaking)
Also tightened:
- .venv/ -> .venv — a trailing-slash pattern only matches a
real directory; a symlink to a shared
parent venv (a common dev convenience)
slipped through. The bare-name pattern
covers both cases.
Verified: git check-ignore confirms all five previously-leaking paths
now resolve to an entry in this file.
Note: logs/dqn_atari_results.csv is a tracked file that the training
subprocess appends to on every run, so it will keep showing as
"modified" after any TRAIN. That's a separate decision — either
untrack it (git rm --cached) or accept it as research output — and
is left to a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
deep_rl_zoo.dqn.run_atari appends a row to this CSV on every iteration, so any TRAIN run left the working tree dirty. Untracked (file stays on disk locally) and added to .gitignore so future trainings don't pollute git status. The previously-tracked rows (10 iterations of 100K-step DQN/Pong training) are preserved in git history if anyone needs them — only fresh clones after this commit lose the snapshot. The other five logs/*_atari_results.csv files (iqn, per_dqn, ppo_rnd, r2d2, rainbow) remain tracked for now. They have the same dynamic and will need the same treatment whenever their algos get trained; left for a separate decision so this commit doesn't quietly drop the historical snapshots for algos not exercised this session. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Every deep_rl_zoo.<algo>.run_atari appends a row to its own logs/<algo>_atari_results.csv on each training iteration. Five of these were still tracked (iqn, per_dqn, ppo_rnd, r2d2, rainbow) and would dirty the working tree the moment their algorithm gets trained — same dynamic that already bit us with dqn. Untracked all five (files stay on disk locally) and collapsed the .gitignore entry into a glob (logs/*_atari_results.csv) so this also covers any future algorithm's results file with no further work. Historical snapshots remain in git history; this only removes them from fresh clones going forward. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Five commits picking up where PR #1 left off — three orthogonal improvements that came out of actually running the console end-to-end against the bundled checkpoints.
Summary
Fix: make WebSocket play + TRAIN 5K actually work (
d2f7a2b)Three real bugs that blocked the demo path the UI advertises:
np.bool8, butgym 0.25.2(pinned here) still references it inpassive_env_checker.pyduringenv.step. Restored the alias at the top ofdeep_rl_zoo/gym_env.pyso every entry point (run_atari/run_classic/eval_agent/frontend.stream_eval) benefits. Defensive shim kept at the top ofstream_eval.pyfor any future caller that importsgymbeforedeep_rl_zoo.frontend/stream_eval.pyloadedPER-DQN_Pong_4.ckptwithagent_name="DQN", but the file is stamped"PER-DQN"(matchesdeep_rl_zoo/prioritized_dqn/run_atari.py:96). ▶ Play silently failed for PER-DQN. Fixed.ALGO_FACTORIES— anyDQN_<game>.ckpt(including newly TRAIN-spawned ones) couldn't be replayed. Added_build_dqn.Verified end-to-end by capturing all 4 bundled agents playing live and then training + replaying a fresh
DQN/Breakoutcheckpoint via the same/api/training/start→/api/eval/streampath the UI uses.Demo: live-play gallery (
9bb6cb9)screenshots/live_play/now contains a self-contained gallery of real ALE frames from the live console paths:deep_rl_zoo.gym_env.create_atari_environment.Each game has both a 5-frame montage PNG (static, with score progression) and an animated 100–120 frame GIF (motion).
gallery.htmlembeds all of them in a single phosphor-styled page — open in any browser to see every agent and game in one place.Captures use the same
env.unwrapped.render(mode="rgb_array")path the WebSocket eval emits, so what's shown is frame-accurate to what the production UI renders.Chore: clean
git statusfor everyone (1ff25e5,a6e11cb,c86edda)Every push and PR was leaving behind the same handful of untracked / modified files, polluting reviews and risking accidental staging. Root cause was an incomplete
.gitignoreplus a tracked file that the training subprocess auto-appends to.Now ignored:
.claude/— Claude Code worktree config (per-session).cursorindexingignore— Cursor IDE metadataruns/_training_*.log— per-job stdout fromfrontend/server.py.venv/→.venvso the bare-name pattern catches both real directories and symlinks to a shared parent venvUntracked (files stay on disk locally, historical snapshots remain in git history):
logs/*_atari_results.csv— all 6 per-algorithm result CSVs thatdeep_rl_zoo.<algo>.run_atariappends to on every iteration. Thedqnone was already dirtying the tree after this session's TRAIN; the other 5 (iqn,per_dqn,ppo_rnd,r2d2,rainbow) would have done the same the moment those algos got trained. Collapsed into a glob so any future algorithm's results file is covered with no further work.Verified:
git check-ignoreconfirms every previously-leaking path now resolves to an entry in.gitignore, and the working tree is clean after a full TRAIN run.Changes
deep_rl_zoo/gym_env.pybool8shimfrontend/stream_eval.pyscreenshots/live_play/gallery.html,summary.txt.gitignorelogs/*_atari_results.csvTest plan
./start.sh, then click ▶ Play with each bundled checkpoint (Rainbow/Pong, PER-DQN/Pong, IQN/Pong, PPO-RND/MontezumaRevenge) — all four should stream frames; previously IQN/Rainbow/PPO-RND failed onnp.bool8and PER-DQN failed on the agent_name mismatchnp.bool8attrain_env.reset()screenshots/live_play/gallery.htmlin a browser — all 11 cards render, GIFs animategit status— working tree should stay clean🤖 Generated with Claude Code