Persist live orchestrator trajectories#48
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75854dcb8b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| reward_label=reward_label, | ||
| ) | ||
| self.trace_store.record_lineage(packet.run_id, manifest, lineage) | ||
| self.trace_store.record_trajectory( |
There was a problem hiding this comment.
Avoid storing baseline runs as advisor trajectories
run() now records a trajectory unconditionally, even when routing_decision.arm is "baseline" and no advice was injected into the executor. In that AB-test path, the persisted trajectory still pairs primary_advice with the observed outcome/reward, so downstream consumers of advisor_trajectories can misinterpret off-policy baseline behavior as advisor-attributed training evidence whenever advisor_fraction < 1.0. Gate trajectory persistence to advisor-arm runs or persist routing metadata in the trajectory so baseline samples can be filtered.
Useful? React with 👍 / 👎.
AdvisorTrajectoryrecords for live orchestrator runs, including packet/advice, executor observations, verifier hints, final outcome, and reward evidence.