OpenGym Copilot

Try it!

https://opengym-copilot.onrender.com/

Introduction

OpenGym Copilot is a realtime Gymnasium environment viewer for RL experimentation. It separates training from rollout playback, exposes reward shaping controls in the UI, and lets you inspect reward terms while agents are learning.

The project is aimed at a workflow where you can:

train a policy
watch rollouts live
tune reward terms without rewriting the environment
compare saved rollout sessions afterward

Current Features

Live rollout streaming over WebSocket from the FastAPI backend
Separate Training Workspace and Rollout Workspace in each viewer
Reward shaping wrapper around Gymnasium environments
Editable reward weights from the frontend sidebar
Reward breakdown logging for rollout episodes and training episodes
Environment-specific reward templates for classic control, Box2D, and MuJoCo envs
Saved rollout export to {cwd}/rollouts/*.json
Saved rollout loading into a new read-only rollout viewer window
Live model upload and model selection for rollout playback
Random rollout policy by default unless a model is explicitly loaded
Clean stop-training flow through the backend callback path
Training progress polling and live training ticks
Training graph term selector and rollout graph term selector
Per-term reward plots for both training and rollout
Training hyperparameter popup before training starts

Reward Tuning

The backend no longer treats reward as one opaque scalar. Rewards are wrapped and split into named terms such as:

native
survival bonuses
position penalties
angle penalties
control penalties
task-specific shaping terms

Those terms are:

logged separately
rendered in the reward sidebar
editable through reward weights in the frontend
applied to both rollout and training environments

Reward configuration is stored per run_id, so different rollout windows can use different reward settings.

Training Controls

Starting training opens a popup that lets you configure:

training output filename inside the managed models/ storage
device (cpu or cuda)
learning rate
learning rate schedule
- constant
- linear
- cosine
PPO hyperparameters such as:
- n_steps
- batch_size
- n_epochs
- gamma
- gae_lambda
- clip_range
- ent_coef
- vf_coef
- max_grad_norm
model size
- small
- medium
- large

These values are sent to the backend and used when constructing the PPO model.

Visualization

Training Graphs

The training viewer can display:

eval_reward
reward_mean
latest episodic reward
reward breakdown totals
individual reward breakdown terms

You can switch between training metrics with:

the < and > graph navigation buttons
the graph dropdown beside the plot

The training graph keeps full history and compresses into a fixed-width chart instead of discarding old points.

Rollout Graphs

The rollout viewer can display:

total rollout reward
breakdown totals
native reward
per-term shaped reward values
per-term raw reward values when available

Rollout graphs also support:

< and > graph navigation
direct graph selection from the dropdown

Saved Rollouts

Rollouts can be saved from the frontend and written to:

{cwd}/rollouts

Saved rollout JSON files can be loaded back into the app. Loading a saved rollout opens a new viewer window instead of overwriting the current live rollout.

Tech Stack

Frontend: React + Vite + Chart.js
Backend: FastAPI + Gymnasium
RL: Stable-Baselines3 PPO
Visualization: WebSocket event streaming + browser-side charting and frame playback

Getting Started

Backend

Create an environment with the required Python packages, then start FastAPI:

pip install -r requirements.txt
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend

cd opengym-frontend
npm install
npm run dev

The dev frontend uses the Vite proxy, so HTTP requests and WebSocket traffic still land on the FastAPI backend at localhost:8000.

Production Website Deployment

This app cannot be deployed as GitHub Pages alone because training, uploads, saved rollouts, and WebSocket streaming all require a live Python backend. The workable path is:

Push this repo to GitHub.
Deploy the repo to a container host such as Render, Railway, Fly.io, or a VM.
Let FastAPI serve the built React frontend and the API from the same origin.

Render via GitHub

This repo now includes:

Dockerfile for a single-container production build
render.yaml for a Render web service with a persistent disk
requirements.txt for the Python backend

Steps:

Create a new GitHub repo and push this project.
In Render, choose New + -> Blueprint and point it at the GitHub repo.
Render will detect render.yaml, build the Docker image, and deploy the app.
Open the generated https://...onrender.com URL and the React app should load from the FastAPI server.

The official demo is at the following link: https://opengym-copilot.onrender.com/.

Local Production-Style Run

Build the frontend once:

cd opengym-frontend
npm install
npm run build

Then run only the backend from the repo root:

uvicorn main:app --host 0.0.0.0 --port 8000

FastAPI will serve:

the API endpoints
the WebSocket endpoint at /ws/rollout
the built frontend at /

Environment Variables

OPEN_GYM_DATA_DIR: storage root for models/, rollouts/, and progress_bar.log
CORS_ALLOW_ORIGINS: comma-separated origins for split frontend/backend deployments
VITE_API_BASE_URL: optional frontend override for API base URL
VITE_WS_BASE_URL: optional frontend override for WebSocket base URL
VITE_DEV_BACKEND_URL: optional Vite dev proxy target

Operational Notes

Public hosting is practical for lighter CPU environments first. MuJoCo and large multi-user training loads will want a stronger host, and often a GPU-backed machine.
Render starter instances sleep when idle and are not a good fit for serious training throughput.
Model files and saved rollouts are now constrained to the app-managed storage directories, which is necessary for an internet-facing deployment.

Demo

Notes

Training preview and rollout playback are intentionally separated in the backend.
Rollout playback stays random by default until a model is explicitly loaded.
Reward shaping support depends on the environment templates defined in train_backend_reward_tuning/.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
demos		demos
opengym-frontend		opengym-frontend
train_backend_reward_tuning		train_backend_reward_tuning
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
behavior_metrics.py		behavior_metrics.py
behavior_tags.py		behavior_tags.py
constants.py		constants.py
deterministic_insights.py		deterministic_insights.py
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
render.yaml		render.yaml
requirements.txt		requirements.txt
run_logging.py		run_logging.py
training_progress_callback.py		training_progress_callback.py
ws_manager.py		ws_manager.py
ws_manager_copuy.py		ws_manager_copuy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenGym Copilot

Try it!

Introduction

Current Features

Reward Tuning

Training Controls

Visualization

Training Graphs

Rollout Graphs

Saved Rollouts

Tech Stack

Getting Started

Backend

Frontend

Production Website Deployment

Render via GitHub

Local Production-Style Run

Environment Variables

Operational Notes

Demo

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenGym Copilot

Try it!

Introduction

Current Features

Reward Tuning

Training Controls

Visualization

Training Graphs

Rollout Graphs

Saved Rollouts

Tech Stack

Getting Started

Backend

Frontend

Production Website Deployment

Render via GitHub

Local Production-Style Run

Environment Variables

Operational Notes

Demo

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages