Skip to content

feat: add user-defined custom CV split scripts#444

Merged
nictru merged 11 commits into
developmentfrom
custom-splits
Jun 29, 2026
Merged

feat: add user-defined custom CV split scripts#444
nictru merged 11 commits into
developmentfrom
custom-splits

Conversation

@nictru

@nictru nictru commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add create_splits(response_data) custom split scripts for advanced CV setups (issue Allow custom split creation procedures #407)
  • Validate produced splits using existing test_mode semantics (LPO, LCO, LDO, LTO)
  • Wire through experiment loop, main CLI, and make-cv-pkls while keeping the existing fold-dict training contract

Test plan

  • pytest tests/datasets/test_custom_splits.py
  • CI green
  • Manual smoke: drevalpy make-cv-pkls --custom_splitter_path examples/custom_split_lco_fraction.py ...

nictru added 5 commits June 26, 2026 13:34
Allow advanced users to supply create_splits(response_data) scripts validated
by test_mode semantics, wired through the experiment loop, CLI, and make-cv-pkls
while preserving the existing fold-dict training contract.
Add sphinx docstrings, replace Protocol with Callable, fix isort/black
formatting, and document tests so CI pre-commit passes.
Satisfy flake8-darglint DAR101 for pytest fixture parameters.
Expose pipeline split settings via a frozen CustomSplitParams dataclass
passed as the second argument to create_splits(response_data, params).
Document private validation helpers with sphinx param/returns/raises sections.
@nictru nictru marked this pull request as ready for review June 27, 2026 12:54
@nictru nictru requested a review from JudithBernett June 27, 2026 12:54
Comment thread drevalpy/visualization/utils.py Outdated
rf"{result_dir_str}/{dataset}/"
r"(LPO|LCO|LDO|LTO)/[^/]+/(predictions|cross_study|randomization|robustness)/.*\.csv$"
rf"{result_dir_str}/{re.escape(dataset)}/"
r"[^/]+/[^/]+/(predictions|cross_study|randomization|robustness)/.*\.csv$"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are some downstream assumptions on this (e.g., line 102 in the same file), but we have to rework the validation, and I don't like the path splitting/ string matching stuff anyway. I am wondering if we should use your manifests also for the non-custom splits and always parse it to determine test mode etc.

@PascalIversen

Copy link
Copy Markdown
Collaborator

Thanks, it looks very good! I left a comment, but would also be okay with a wontfix of that, because the viz is to be reworked anyway

nictru added 2 commits June 29, 2026 15:12
Move built-in and external split logic into drevalpy.datasets.splits with a shared create_and_record_splits path, JSON manifests that record split_label vs test_mode, and explicit result discovery so reports resolve the semantic test mode correctly.
@nictru

nictru commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

I adjusted to use the same manifest creation for both built-in and custom splitters, and cleaned up the visualization a bit. Hope it's clean enough now

nictru added 3 commits June 29, 2026 15:15
Add missing docstrings and isort fixes for the split provider refactor, and sync with development.
Make tests.datasets a proper package for mypy, skip row-overlap validation on trusted built-in splits, and keep external split validation unchanged.
@nictru nictru requested a review from PascalIversen June 29, 2026 14:15
Comment thread drevalpy/datasets/splits/manifest.py Outdated
@PascalIversen

Copy link
Copy Markdown
Collaborator

works for me! looks great!! :) so would approve after that mini fix I commented! Thanks!!

Use len(splits) instead of the requested params value so external scripts that return a different fold count are reflected accurately.

@PascalIversen PascalIversen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!!!

@nictru nictru merged commit 07ff16d into development Jun 29, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants