Skip to content

Simulator failure when training Tailstorm with k=2 #21

@pkel

Description

@pkel

Observed this error after running make train-online on 247c11f.

## Environment (before vectorization) ##
Tailstorm with k=2, constant rewards, and optimal sub-block selection; SSZ'16-like attack space; α=0.25 attacker
public_blocks: 0
private_blocks: 0
diff_blocks: 0
public_votes: 1
private_votes_inclusive: 2
private_votes_exclusive: 1
public_depth: 0
private_depth_inclusive: 1
private_depth_exclusive: 1
event: 2
Actions: (0) Adopt_Prolong | (1) Override_Prolong | (2) Match_Prolong | (3) Wait_Prolong | (4) Adopt_Proceed | (5) Override_Proceed | (6) Match_Proceed | (7) Wait_Proceed
## Training ##
Using cpu device
-----------------------------------
| rollout/           |            |
|    ep_len_mean     | 248        |
|    ep_rew_mean     | 0.63059205 |
| time/              |            |
|    fps             | 10568      |
|    iterations      | 1          |
|    time_elapsed    | 23         |
|    total_timesteps | 245760     |
-----------------------------------
Process ForkServerProcess-20:
Traceback (most recent call last):
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    observation, reward, done, info = env.step(data)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 208, in step
    obs, reward, done, was_info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 184, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 159, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/wrappers.py", line 84, in step
    obs, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/gym/wrappers/order_enforcing.py", line 11, in step
    observation, reward, done, info = self.env.step(action)
  File "/home/patrik/devel/cpr/python/gym/cpr_gym/envs.py", line 47, in step
    obs, r, d, i = engine.step(self.ocaml_env, a)
  File "ocaml/gym/bridge.ml", line 105, in Dune__exe__Bridge.(fun):105
  File "ocaml/gym/engine.ml", line 183, in Dune__exe__Engine.of_module.step:183
  File "ocaml/protocols/tailstorm_ssz.ml", line 293, in Cpr_protocols__Tailstorm_ssz.Make.Agent.apply:293
  File "ocaml/protocols/tailstorm.ml", line 519, in Cpr_protocols__Tailstorm.Make.Honest.next_summary':519
  File "ocaml/protocols/tailstorm.ml", line 415, in Cpr_protocols__Tailstorm.Make.Honest.optimal_quorum:415
  File "ocaml/protocols/combinatorics.ml", line 17, in Cpr_protocols__Combinatorics.n_choose_k:17
ValueError: (Division_by_zero)
Traceback (most recent call last):
  File "/home/patrik/devel/cpr/python/train/ppo.py", line 315, in <module>
    model.learn(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/ppo/ppo.py", line 314, in learn
    return super().learn(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 251, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 185, in collect_rollouts
    if callback.on_step() is False:
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 192, in _on_step
    continue_training = callback.on_step() and continue_training
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 88, in on_step
    return self._on_step()
  File "/home/patrik/devel/cpr/python/train/ppo.py", line 232, in _on_step
    r = super()._on_step()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/callbacks.py", line 435, in _on_step
    episode_rewards, episode_lengths = evaluate_policy(
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/evaluation.py", line 87, in evaluate_policy
    observations, rewards, dones, infos = env.step(actions)
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 162, in step
    return self.step_wait()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/vec_monitor.py", line 76, in step_wait
    obs, rewards, dones, infos = self.venv.step_wait()
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in step_wait
    results = [remote.recv() for remote in self.remotes]
  File "/home/patrik/devel/cpr/_venv/lib64/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in <listcomp>
    results = [remote.recv() for remote in self.remotes]
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib64/python3.9/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions