Skip to content

MK2112/dreamStride

Repository files navigation

DreamStride

Understanding and reconstructing the walking robot dog experiment from the DayDreamer paper by Wu et al. (2022) within a simulated WeBots environment.

Related Papers

Dreamer

This implementation was adapted from DayDreamer Repository and Adityabingi/Dreamer.

DayDreamer learns a world model and an actor critic behavior to train robots from small amounts of experience in the real world, without using simulators. At a high level, DayDreamer consists of two processes. The actor process interacts with the environment and stores experiences into the replay buffer. The learner samples data from the replay buffer to train the world model, and then uses imagined predictions of the world model to train the behavior.

To learn from proprioceptive and visual inputs alike, the world model fuses the sensory inputs of the same time step together into a compact discrete representation. A recurrent neural network predicts the sequence of these representations given actions. From the resulting recurrent states and representations, DayDreamer reconstructs its inputs and predicts rewards and episode ends.

Given the world model, the actor critic learns farsighted behaviors using on-policy reinforcement learning purely inside the representation space of the world model.

For more information:

Setup

  1. Install WeBots.
  2. Create a VEnv or Conda environment. Select the Python version compatible with WeBots (see here, this project was written with Python 3.9.0).
  3. Install the required packages: pip install -r requirements.txt.
  4. This project was written in PyCharm. If you use PyCharm as well, open the project there and set the Python interpreter to the one created in step $2$.
  5. Follow the PyCharm-specific instructions in the WeBots documentation.
    • The WeBots controller file is located at ./spotcontroller/spot_controller.py.

Training

  1. Open WeBots and load the ./SimulationEnv/worlds/DayDreamerWorld.wbt world file.
  2. Within WeBots' components tree, navigate to the spot robot and:
    • Ensure the robot's property controller is set to <extern>
    • Ensure the robot's property supervisor is set to true
  3. Keep WeBots open and running.
  4. Open PyCharm and run the ./spot_controller/spot_controller.py file. If everything is set correctly, you should see the message [+] Spot Controller is alive. Waiting for connections... appear.
  5. While the controller runs, within PyCharm, open a terminal and:
    • Run training: python dreamer.py --env spot-walk --algo Dreamerv2 --exp spot-webots --train
    • Run evaluation: python dreamer.py --env spot-walk --algo Dreamerv2 --exp spot-webots --evaluate
    • If you have a CUDA-compatible GPU, you can add the --gpu flag to the command to use it.

You can list supported environment prefixes via: python dreamer.py --list-envs

You can also select a different JSON config file via: python dreamer.py --config path/to/config.json ...

For training progress visualization, you can additionally enable TensorBoard logging (scalars + videos) via: python dreamer.py --tensorboard ...

Implementation Details

This implementation aims to integrate the operation of the DayDreamer algorithm with the WeBots platform in a performant and modular manner. To do so, we aim for this architecture:

A core part of the training adaptation for WeBots and the Spot robot therein was realized within ./spotcontroller/spot_controller.py. This file is responsible for communicating with the WeBots simulation environment, acting out the actions, perceiving the observations, and providing the rewards to the DayDreamer algorithm. Communication between DayDreamer and WeBots was realized through a double socket connection for modularity and extendability.

The counterpart to the ./spotcontroller/spot_controller.py is the spot_wrapper.py.
It receives actions derived by the algorithm, formats and packages them nicely, and sends them to the controller.
In turn, it also receives observations and rewards, and formats them and then forwards them to the algorithm.

Environment creation is centralized in env_factory.py. It selects an environment based on the --env prefix (e.g. spot-* for the WeBots Spot integration via spot_wrapper.py, and walker-* for DeepMind Control Suite tasks via env_wrapper.DeepMindControl) and then applies common wrappers (action repeat, action normalization, and a time limit).

Additional modifications have been made throughout the implementation, but, in general, spot_controller.py, spot_wrapper.py, and dreamer.py are the main files that can serve as good points of entry.

An example WeBots world for the Spot robot is located at ./SimulationEnv/worlds/DayDreamerWorld.wbt:

Folder Overview

  • ./data will contain training logs and model checkpoints.
  • ./SimulationEnv contains WeBots world files and other directly WeBots-related files.
  • ./spotcontroller contains the WeBots Spot robot controller and a tester file dummy_backend.py

About

DayDreamer making a robot dog walk in Webots

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors