Why the use of stochastic rounding in forward pass?

Many thanks for the great work!

My understanding is that the deterministic round-to-nearest even is applied in the forward pass for the best accuracy, while stochastic rounding is applied in the backward pass to avoid quantization bias. However, in your paper and implementation where SR is applied in both forward and backward passes. So I was wondering if there is a reason for That?

Kind regards