Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
246 changes: 246 additions & 0 deletions USER_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
# Atari57 Research Console — User Guide

A hands-on guide to using the live research console. If you want the API, the architecture, or the internals, read [frontend/README.md](frontend/README.md). This document only covers what you, the operator, see and click.

---

## 1. Launch

From the repo root:

```bash
./start.sh
```

The script kills anything on port 8000, starts the FastAPI sidecar, polls `/api/health`, and opens the browser the moment the server is up. Stop it with `Ctrl+C` — that tears down the server, the browser-poller, and any subprocess it spawned.

The page is a single 1600×1000 console that scales proportionally to your window — resize freely, nothing gets clipped.

---

## 2. The screen at a glance

```
┌──────────── TOP BAR ────────────────────────────────────────────────┐
│ ATARI57 │ EXP │ ALGO │ GAME │ SEED │ DEVICE │ FPS │ UPTIME │ TB │
├─────────┬───────────────────────────────────────┬───────────────────┤
│ LEFT │ CENTER │ RIGHT │
│ │ │ │
│ search │ ┌── viewport (canvas) ──┐ │ 03 RUN STATUS │
│ │ │ agent plays here │ │ 04 HYPERPARAMS │
│ 01 │ └───────────────────────┘ │ 05 NETWORK │
│ AGENTS │ transport: ◀ ▶ ▶ ↻ scrubber speed │ 06 COMPARISON │
│ (algos) │ │ 07 EVENT LOG │
│ │ 08 chart │ 09 action panel │ │
│ 02 │ (episode return) │ (action dist) │ │
│ 57 GAMES│ │ │ │
│ │ │ │ │
├─────────┴───────────────────────────────────────┴───────────────────┤
│ TRAIN status │ CKPT │ REPLAY │ GPU │ ticker (live tail) │ pytorch │
└──────────── BOTTOM BAR ─────────────────────────────────────────────┘
```

Three columns. The **left** chooses what to run, the **center** is where the agent plays and the chart updates, the **right** is the inspector (mostly informational).

---

## 3. Choose what to run (left column)

**Panel 01 · AGENTS** — list of all 20 deep_rl_zoo algorithms grouped by family (policy / value / distributional). The number on each row is the count of bundled checkpoints for that algorithm. Click an algorithm to select it; the selection drives the top-bar `ALGO` cell and the right-side network diagram.

**Panel 02 · ATARI · 57 GAMES** — every Atari 2600 game. Games with at least one bundled checkpoint are highlighted (today: **Pong**, **MontezumaRevenge**). Click a game to select it; the top-bar `GAME` cell and the chart's title update.

> The bundled checkpoints today are **IQN/Pong**, **PER-DQN/Pong**, **Rainbow/Pong**, **PPO-RND/MontezumaRevenge**. ▶ Play needs a bundled checkpoint that matches your `(algo, game)` pair. ▶ TRAIN works for any pair.

The **search** strip at the top of the left column is decorative — it shows the currently active algorithm filter.

---

## 4. Watch an agent play (▶ Play)

1. Pick `(algo, game)` in the left column where a bundled checkpoint exists. Easiest: **Rainbow + Pong**.
2. Hit the **▶ Play** button on the transport bar (just below the viewport).
3. A WebSocket opens, a `frontend/stream_eval.py` subprocess starts, and the canvas begins receiving real ALE frames at the agent's chosen frame stride (default every 2nd frame).

While playing:

- **Viewport HUD** — top-left shows `P1 AGENT` score, top-right `CPU` score, and the bottom row shows `FRAME`, `EPISODE`, and last `REWARD`.
- **Action distribution** (bottom-right of the center column) — bars update with the agent's real action choices for this game.
- **Step counter / ε / γ** — under the transport bar, just informational.

Click ▶ again to pause/stop the stream. Closing the tab also tears the subprocess down.

The **transport bar** has step-back, step-forward, and reset buttons, and a **scrubber** that lets you jump within the recorded episode buffer (most recent eval). The speed selector (¼× ½× 1× 4× ∞) controls playback rate when you're scrubbing — it does not change the trained agent's behavior.

---

## 5. Train a model (▶ TRAIN 5K)

The big secondary button **▶ TRAIN 5K** spawns a real subprocess:

```
python -m deep_rl_zoo.<algo>.run_atari --num_train_steps=5000 --num_eval_steps=500
```

Workflow:

1. Pick `(algo, game)` in the left column. Any pair works — you don't need a bundled checkpoint.
2. Click **▶ TRAIN 5K**. A new entry appears in the **Event Log** (panel 07, right column) with a job id and live tail of the subprocess stdout.
3. The bottom bar's **TRAIN** cell flips to `running`. Wait for it to flip back to `exited` (or `failed`).
4. When the run exits, the **chart** automatically re-fetches tensorboard scalars from `runs/` so the new `episode_return` series appears.

Cancel a running job via the API (`POST /api/training/jobs/<id>/stop`) or by closing `start.sh` (which SIGTERMs everything).

> **5k steps is a smoke run, not real training.** Real training is millions of steps. Use this button to verify the pipeline; for actual research, drive `deep_rl_zoo` from the CLI (see the main README §4) and the chart will pick up the new scalars next time you select that game.

---

## 6. Compare every bundled agent (▶ COMPARE)

The **▶ COMPARE** button runs a 5000-step eval against **every bundled checkpoint** in parallel (~10s on M1 CPU) and renders the actual mean returns in **panel 06 · AGENT COMPARISON** (right column), sorted descending.

Use this when you've added a new checkpoint to `checkpoints/` and want to see where it lands relative to the others. The action is non-blocking — the rest of the UI stays responsive while the parallel evals run.

`cmp-status` in the panel header shows `eval: idle` / `running` / `done`.

---

## 7. Replay a saved game (▶ REPLAY)

The **▶ REPLAY** button overlays the most recent self-play MP4 from `recordings/` above the canvas. MP4s come from CLI eval runs:

```bash
python -m deep_rl_zoo.iqn.eval_agent --environment_name=Pong \
--load_checkpoint_file=./checkpoints/IQN_Pong_2.ckpt \
--num_iterations=1 --num_eval_steps=2000 --nouse_tensorboard
```

The WebSocket play path (▶ Play) does **not** write MP4s — it streams frames directly to the browser. So if ▶ REPLAY says "no recordings", run an eval from the CLI first, or wait until a TRAIN run finishes (each iteration writes one).

---

## 8. The right-column inspector

Five stacked panels. Most are informational; the only interactive one today is the comparison panel (driven by ▶ COMPARE).

**03 · RUN STATUS** — current step / return / eta and a key-value list of the active run's structural choices (replay type, actors/learners, batch size, target sync, distribution atoms, etc.). Reads `state.selected` after you click in the left column.

**04 · HYPERPARAMETERS** — `learning_rate`, `discount γ`, `ε-greedy`, `replay α/β`, `n-step`, `batch_size`, `target_period`, `grad_clip`, `frame_stack`. The sliders are **read-only display** in this build. To change values, pass them via the API:

```bash
curl -X POST http://127.0.0.1:8000/api/training/start \
-H 'Content-Type: application/json' -d '{
"algo": "rainbow", "game": "Pong", "num_train_steps": 10000,
"extra_args": ["--learning_rate=0.0005", "--discount=0.99"]
}'
```

The chart will pick up the new run automatically.

**05 · NETWORK** — ASCII diagram of the loaded model's architecture. Decorative — it shows a typical Rainbow-dueling tree, not the actually-loaded network.

**06 · AGENT COMPARISON** — populated by ▶ COMPARE. Each row is `algorithm · game · mean / max / min episode return` from a parallel eval.

**07 · EVENT LOG** — live tail of the most recent training run. Each line is `[timestamp] [job_id] <stdout>`. Use this to watch a TRAIN 5K run progress in real time.

---

## 9. The chart (center, bottom-left)

Five tabs across the top: **EPISODE RETURN** · **TD-LOSS** · **VALUE** · **ENTROPY** · **FPS**. Only the active tab is wired to real data — it pulls from `tbparse` over `runs/<name>/`.

When the chart can't find a run for the selected `(algo, game)`, it shows the upstream sample run for that game (if one exists). Otherwise it shows an empty plot.

The legend is for visual reference; what's actually drawn is the single series for the selected pair.

---

## 10. Top bar and bottom bar

**Top bar** (left to right):

- `EXP` — current experiment id (synthesized from your selection).
- `ALGO` — selected algorithm.
- `GAME` — selected game.
- `SEED` / `DEVICE` — cosmetic; runs are seeded from the API request body, device is whatever PyTorch picks.
- `FPS` — viewport render rate, not training rate.
- `UPTIME` — wall-clock since the page loaded.
- `TENSORBOARD :6006` — port reminder. Run `tensorboard --logdir=./runs` to launch it.

**Bottom bar:**

- `TRAIN` — `idle` / `running` / `exited` / `failed`.
- `CKPT` — the last checkpoint relevant to your selection.
- `REPLAY` — replay-buffer pressure (cosmetic in eval mode).
- `GPU` — device tag.
- **Ticker** — scrolling tail of recent training events.
- `PYTORCH` — installed torch version.

---

## 11. Common workflows

**A) Quickest demo — watch a trained agent win at Pong**

1. Click **Rainbow** in the algo rail.
2. Click **Pong** in the game grid.
3. Hit **▶ Play**.
4. Watch the agent score 14–6 over a few minutes.

**B) Smoke-test the training pipeline**

1. Click any `(algo, game)` pair — e.g. **DQN + Breakout**.
2. Hit **▶ TRAIN 5K**.
3. Watch the event log for the job id and tail.
4. When `TRAIN` flips to `exited`, the chart updates with your new `episode_return` series.

**C) See where a new checkpoint stands**

1. Drop a `.ckpt` file into `checkpoints/`.
2. Refresh the page.
3. Hit **▶ COMPARE**. Wait ~10s.
4. Read the sorted results in panel 06.

**D) Replay an old run**

1. From the CLI: `python -m deep_rl_zoo.<algo>.eval_agent ...` to write an MP4 under `recordings/`.
2. In the UI, hit **▶ REPLAY** to watch the most recent one.

---

## 12. What's static (don't be fooled)

The original prototype shipped with placeholder content. These cells are still cosmetic in this build — knowing this saves you from chasing ghost values:

- Top-bar `SEED 0x2A1F` and `DEVICE mps:0` — placeholders.
- Hyperparameter sliders (panel 04) — read-only display, not editable.
- Network architecture diagram (panel 05) — generic Rainbow-dueling tree, not the loaded model.
- Bottom-bar `REPLAY 874,332 / 1,000,000` — decorative.
- The `chart-tabs` other than EPISODE RETURN are visual; only the active tab is wired.

Everything else — left column selectors, the four ▶ buttons, the chart, the comparison panel, the event log, the top-bar `EXP/ALGO/GAME` cells, and the bottom-bar `TRAIN/CKPT` cells — reflects real backend state.

---

## 13. Troubleshooting

| Symptom | Cause | Fix |
|---|---|---|
| ▶ Play does nothing | No bundled checkpoint for the selected `(algo, game)`. | Pick a pair from §3, or click any pair and use ▶ TRAIN 5K instead. |
| ▶ REPLAY says no recording | The WebSocket play path doesn't write MP4s. | Run an eval from the CLI (see §7), or wait for a TRAIN run to finish. |
| Chart is empty | No `runs/<name>/` for the selected pair. | Run ▶ TRAIN 5K once for that pair; the chart re-fetches when the job exits. |
| Event log is silent | No training started yet. | Hit ▶ TRAIN 5K. |
| Page won't load | Port 8000 already in use, or venv not set up. | `start.sh` clears port 8000 automatically. If it still fails, `./setup.sh` first. |
| Browser didn't auto-open | `NO_BROWSER=1` is set, or the browser-poller couldn't reach `/api/health` in time. | Open `http://127.0.0.1:8000` manually. |
| Layout looks tiny | The window is too small — the console scales proportionally. | Resize the window larger; nothing is clipped, just shrunk. |

---

## 14. Where to look next

- **Add a checkpoint to the bundle** — drop the `.ckpt` into `checkpoints/`. The catalog (`frontend/catalog.py`) infers `(algo, game)` from the filename pattern `<Algo>_<Game>_<n>.ckpt`.
- **Train with custom hyperparameters** — see §11 in the main README, or the `extra_args` example in §8 above.
- **Hook a new algorithm to the WebSocket play path** — add a factory in `frontend/stream_eval.py` (`ALGO_FACTORIES`).
- **Wire a new UI panel** — add markup in `frontend/index.html`, fetch logic in `frontend/app.js`, and (if needed) a new endpoint in `frontend/server.py`. Tests live in `frontend/test_server.py`.

For the deep dive on `deep_rl_zoo` itself — algorithms, network architectures, distributed orchestration — see [UPSTREAM_README.md](UPSTREAM_README.md).
17 changes: 17 additions & 0 deletions frontend/app.js
Original file line number Diff line number Diff line change
Expand Up @@ -924,9 +924,26 @@
host.appendChild(compare);
}

/* ───────── responsive scale-to-fit ───────── */

function fitApp() {
const app = document.getElementById("app");
if (!app) return;
const sw = window.innerWidth / 1600;
const sh = window.innerHeight / 1000;
const scale = Math.min(sw, sh);
app.style.setProperty("--app-scale", String(scale));
}

function wireFitApp() {
fitApp();
window.addEventListener("resize", fitApp);
}

/* ───────── init ───────── */

async function init() {
wireFitApp();
await loadCatalog();
renderAlgoRail();
renderGameGrid();
Expand Down
17 changes: 13 additions & 4 deletions frontend/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<head>
<meta charset="utf-8" />
<title>ATARI57 // Research Sandbox</title>
<meta name="viewport" content="width=1600" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=VT323&family=JetBrains+Mono:wght@300;400;500;700&family=Space+Grotesk:wght@400;500;700&display=swap" rel="stylesheet">
Expand Down Expand Up @@ -45,8 +45,15 @@
font-size: 12px;
line-height: 1.45;
letter-spacing: 0.01em;
min-height: 100vh;
overflow-x: auto;
}
html, body {
width: 100vw;
height: 100vh;
overflow: hidden;
}
body {
display: grid;
place-items: center;
}

/* CRT scanlines + vignette over the whole app */
Expand Down Expand Up @@ -98,8 +105,10 @@
min-width: 1600px;
max-width: 1600px;
height: 1000px;
margin: 0 auto;
overflow: hidden;
transform: scale(var(--app-scale, 1));
transform-origin: center center;
flex-shrink: 0;
}
.bar {
border-bottom: 1px solid var(--line);
Expand Down
Loading