in the 233 line of the file ddpg_vec.py, you use
next_action_batch = self.select_action( next_state_batch.view(-1, self.obs_dim), action_noise=self.train_noise),
that means you select next_action use current actor instead of the target actor, which is different from the ddpg paper, or you do this design choice for some other reason?
in the 233 line of the file ddpg_vec.py, you use
next_action_batch = self.select_action( next_state_batch.view(-1, self.obs_dim), action_noise=self.train_noise),that means you select next_action use current actor instead of the target actor, which is different from the ddpg paper, or you do this design choice for some other reason?