doubt about the implement of the emitter-receiver scheme

Hi there,  I'm deeply confused by the concrete communication process (timing) in the emitter-receiver scheme implemented in deepbots, since in Webots it takes one basic timestep to transmit and deliver the message from emitters to receivers, which means the action $a_{t}$ adopted by supervisor according to state $s_{t}$ will be delivered to robot in timeslot $t+1$, and the new state(observation) caused by  $a_{t}$ will be updated and emitted to supervisor in timeslot $t+2$, which is finally presented in supervisor as $s^{\prime}$ in timeslot $t+3$.

On the basis of the above insight, I find that the transitions saved for RL training in deepbots tutorials is somewhat like $(s_{t}$, $a_{t}$, $r_{t}$, $s_{t+1})$, but in fact, the action which acted on state $s_{t}$ (or the action which robot executed indeed) is somewhat like $a_{t-3}$, there is a difference between $a_{t-3}$ and $a_{t}$ even though timestep is in the scale of millisecond.

To be honest, my question may not be too clear, I'm appreciated if someone could correct me or explain my doubt, thanks a lot!

**My doubt is somewhat relative with [this issue](https://github.com/aidudezzz/deepbots/issues/80#issue-829047303)**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doubt about the implement of the emitter-receiver scheme #119

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

doubt about the implement of the emitter-receiver scheme #119

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions