Include default values in resolved task args by rasmusfaber · Pull Request #2020 · UKGovernmentBEIS/inspect_ai

rasmusfaber · 2025-06-17T09:19:23Z

This PR contains:

In #1976 the eval-file header was changed to also include default task arguments. But resolved tasks still just uses the provided task args. That means that the task_identifiers are no longer calculated on the same values when tasks have default arguments.

What is the current behavior? (You can also link to an open issue here)

When you run an evalset fix a task with default arguments, the task_args are not correctly validated against the task_args in the .eval-file causing an error like:

ERROR: Existing log file '2025-06-17T07-52-18+00-00_xxx_Vq7LEiypYtvWQLot22hjSG.eval' in log_dir is not associated with a task passed to eval_set (you must run eval_set in a fresh log directory)

What is the new behavior?

The evalset is correctly detected as being the same.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

When generating the data for the task_args (and model_roles) hash the code called Pydantic to_json() with exclude_none=True. But the input to to_json() was a dict. exclude_none only works on Pydantic types, so the None values was included anyway.

I changed that to conform with what I think the code was supposed to do, but that means that eval-files with None task_args now gets a different hash.

Alternatively, we can just retain the old behavior.

Other information:

jjallaire · 2025-06-17T13:26:55Z

Thanks for this! I have merged an alternate fix here: #2023

The reason for the more conservative fix is that people often use non-serializable defaulted arguments (e.g. a class of some sort) and if try to restore these into resolved args the task won't work. So we are just being as sparing as possible with restoration to make as many scenarios work as possible (best practice is to make all of your task params serializable but people don't always do this).

rasmusfaber · 2025-06-17T13:34:27Z

Thanks for this! I have merged an alternate fix here: #2023

The reason for the more conservative fix is that people often use non-serializable defaulted arguments (e.g. a class of some sort) and if try to restore these into resolved args the task won't work. So we are just being as sparing as possible with restoration to make as many scenarios work as possible (best practice is to make all of your task params serializable but people don't always do this).

Thanks. That seems like a better fix 👍

rasmusfaber added 2 commits June 17, 2025 11:06

Include default values in resolved task args

8dfcb62

Ruff

5fe4060

rasmusfaber mentioned this pull request Jun 17, 2025

Evals fail with Existing log file xx.eval in log_dir is not associated with a task passed to eval_set METR/inspect-metr-task-bridge#246

Closed

typing

0ffb2fa

jjallaire closed this Jun 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include default values in resolved task args#2020

Include default values in resolved task args#2020
rasmusfaber wants to merge 3 commits into
UKGovernmentBEIS:mainfrom
rasmusfaber:fix_task_args

rasmusfaber commented Jun 17, 2025

Uh oh!

jjallaire commented Jun 17, 2025

Uh oh!

rasmusfaber commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rasmusfaber commented Jun 17, 2025

This PR contains:

What is the current behavior? (You can also link to an open issue here)

What is the new behavior?

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

Other information:

Uh oh!

jjallaire commented Jun 17, 2025

Uh oh!

rasmusfaber commented Jun 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants