Skip to content

doubt about the implement of the emitter-receiver scheme #119

@rmwor1d

Description

@rmwor1d

Hi there, I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action $a_{t}$ adopted by supervisor according to state $s_{t}$ will be delivered to robot in timeslot $t+1$, and the new state(observation) caused by $a_{t}$ will be updated and emitted to supervisor in timeslot $t+2$, which is finally presented in supervisor as $s^{\prime}$ in timeslot $t+3$.

On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like $(s_{t}$, $a_{t}$, $r_{t}$, $s_{t+1})$, but in fact, the action which acted on state $s_{t}$ (or the action which robot executed indeed) is somewhat like $a_{t-3}$, there is a difference between $a_{t-3}$ and $a_{t}$ even though timestep is in the scale of millisecond.

To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!

My doubt is somewhat relative with this issue

Metadata

Metadata

Labels

bugSomething isn't workingquestionFurther information is requested

Type

No type
No fields configured for issues without a type.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions