Evaluation uses params before training update

Looking at the training loop in `rec_mappo.py` (and other algorithms):

```python
learner_output = learn(learner_state)

# Evaluation uses learner_state (input), not learner_output.learner_state(output)
trained_params = unreplicate_batch_dim(learner_state.params.actor_params)
eval_metrics = evaluator(trained_params, ...)

learner_state = learner_output.learner_state  # Update for next iteration
```

The evaluation runs on `learner_state.params` (the params before `learn()` was called), not `learner_output.learner_state.params` (the params after training).

This means:
- First eval uses random/initial params (no training has happened yet)
- Every eval is "one step behind" the actual trained params
- Final eval doesn't reflect the last training update (unless absolute metrics is enabled)

Is this intentional? It seems like we'd want to evaluate the params we just trained, not the ones from the previous iteration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation uses params before training update #1204

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation uses params before training update #1204

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions