Skip to content

test: cover eval_set bundling after retries#4052

Draft
deepujain wants to merge 1 commit into
UKGovernmentBEIS:mainfrom
deepujain:issue-2795-eval-set-bundle-retry
Draft

test: cover eval_set bundling after retries#4052
deepujain wants to merge 1 commit into
UKGovernmentBEIS:mainfrom
deepujain:issue-2795-eval-set-bundle-retry

Conversation

@deepujain
Copy link
Copy Markdown
Contributor

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor
  • Tests

What is the current behavior? (You can also link to an open issue here)

Issue #2795 reports that eval_set(..., bundle_dir=...) can attempt to create the output bundle on each retry. With the default bundle_overwrite=False, a duplicate bundle creation would fail once the first retry attempt has already created the output directory.

The current test coverage only verifies bundling when all logs are already complete; it does not exercise a retrying eval set with a real bundle output directory.

What is the new behavior?

  • Adds a focused regression test for eval_set() batch retries with bundle_dir.
  • The test makes the first eval-set attempt fail, retries successfully, and writes a real bundle with bundle_overwrite=False.
  • The assertions verify the bundle exists after completion and contains only the successful eval log, so a duplicate bundle creation during retry would fail the test.

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No. This is test-only coverage for the existing eval-set bundling behavior.

Other information:

Validation:

  • Targeted:
    • uv run pytest tests/test_eval_set.py::test_eval_set_bundle_after_retries_once -v
      • Passed. Exercises the reported retry path directly: first batch attempt fails, retry succeeds, and the bundle is created once with overwrite disabled.
    • uv run pytest tests/test_eval_set.py::test_eval_set_bundle_when_all_complete tests/test_eval_set.py::test_eval_set_bundle_after_retries_once -v
      • Passed. Confirms the existing all-complete bundle behavior and the new retry regression coverage both hold together.
  • Regression:
    • uv run make check
      • Passed. Runs ruff, formatting checks, and mypy across source and tests.
    • uv run make test
      • Passed. Full pytest suite completed successfully with 7,921 collected tests.

CI/CD coverage expected:

  • The standard Build workflow should run ruff, mypy, package inspection, and pytest on Python 3.10 and 3.11.
  • The Build Log Viewer workflow should still run schema/type and dist validation; this PR does not touch viewer assets.
  • No sandbox tool files or version gates are affected.

Closes #2795.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

eval_set retries each create a new bundle

1 participant