Hello! I notice that 'reward' is pushed twice in step() in block 4 but the sample() in block 11 get 'done' in the same place. I am a little bit confused about it. I also don't understand why you multiply (1.0-done) in value learning in block 11? Could you please explain it? Thank you in advance!
Hello! I notice that 'reward' is pushed twice in step() in block 4 but the sample() in block 11 get 'done' in the same place. I am a little bit confused about it. I also don't understand why you multiply (1.0-done) in value learning in block 11? Could you please explain it? Thank you in advance!