Implementation of Reinforce, Actor-Critic
- Python 3.6
- Pytorch
- OpenAI Gym
- numpy
- scipy
- pybox2d
- gym[box2d]
- matplotlib
- pyglet
- h5py
Please follow the instructions on the homework handout.
python reinforce.py --num-episodes --lr
hw3_part1_plotter.py
For example,
python reinforce.py --num-episodes 50000 --lr 5e-4
The above will run the REINFORCE algorithm for 50000 training episodes and for every 200 training episodes it will output the average test reward (over 100 episodes). The reward is outputted to console (and can be redirected to a file), and can be plotted with hw3_part1_plotter.py.
python a2c.py --num-episodes --lr --critic-lr --n
hw3_part2_plotter.py
For example,
python a2c.py --num-episodes 50000 --lr 5e-4 --critic-lr 1e-4 --n 20
The above will run the advantage-actor critic algorithm for 50000 training episodes and for every 500 training episodes it will output the average test reward (over 100 episodes). The reward is outputted to console (and can be redirected to a file), and can be plotted with hw3_part2_plotter.py.