Hi, thank you for your very interesting work!
While reading your code, I came across something that confused me and I’d like to ask for clarification.
In the rollout generation logic, when do_search is False, the code assigns UIDs as:
batch.non_tensor_batch["uid"] = np.array(
[str(uuid.uuid4()) for _ in range(len(batch.batch))],
dtype=object,
)
However, when do_search is True, it instead uses:
batch.non_tensor_batch["uid"] = batch.non_tensor_batch["index"].copy()
I’m curious about the design rationale behind using two different UID assignment strategies here.
Why do we generate random UUIDs in the non-search case, but reuse the existing index field when do_search is enabled?
Any insight into this difference would be greatly appreciated — thank you again for sharing your work!