feat: add user-defined custom CV split scripts by nictru · Pull Request #444 · daisybio/drevalpy

nictru · 2026-06-26T11:35:52Z

Summary

Add create_splits(response_data) custom split scripts for advanced CV setups (issue Allow custom split creation procedures #407)
Validate produced splits using existing test_mode semantics (LPO, LCO, LDO, LTO)
Wire through experiment loop, main CLI, and make-cv-pkls while keeping the existing fold-dict training contract

Test plan

pytest tests/datasets/test_custom_splits.py
CI green
Manual smoke: drevalpy make-cv-pkls --custom_splitter_path examples/custom_split_lco_fraction.py ...

Allow advanced users to supply create_splits(response_data) scripts validated by test_mode semantics, wired through the experiment loop, CLI, and make-cv-pkls while preserving the existing fold-dict training contract.

Add sphinx docstrings, replace Protocol with Callable, fix isort/black formatting, and document tests so CI pre-commit passes.

Satisfy flake8-darglint DAR101 for pytest fixture parameters.

Expose pipeline split settings via a frozen CustomSplitParams dataclass passed as the second argument to create_splits(response_data, params).

Document private validation helpers with sphinx param/returns/raises sections.

PascalIversen · 2026-06-28T12:37:43Z

-        rf"{result_dir_str}/{dataset}/"
-        r"(LPO|LCO|LDO|LTO)/[^/]+/(predictions|cross_study|randomization|robustness)/.*\.csv$"
+        rf"{result_dir_str}/{re.escape(dataset)}/"
+        r"[^/]+/[^/]+/(predictions|cross_study|randomization|robustness)/.*\.csv$"


I think there are some downstream assumptions on this (e.g., line 102 in the same file), but we have to rework the validation, and I don't like the path splitting/ string matching stuff anyway. I am wondering if we should use your manifests also for the non-custom splits and always parse it to determine test mode etc.

PascalIversen · 2026-06-28T12:39:18Z

Thanks, it looks very good! I left a comment, but would also be okay with a wontfix of that, because the viz is to be reworked anyway

Move built-in and external split logic into drevalpy.datasets.splits with a shared create_and_record_splits path, JSON manifests that record split_label vs test_mode, and explicit result discovery so reports resolve the semantic test mode correctly.

nictru · 2026-06-29T13:15:03Z

I adjusted to use the same manifest creation for both built-in and custom splitters, and cleaned up the visualization a bit. Hope it's clean enough now

Add missing docstrings and isort fixes for the split provider refactor, and sync with development.

Make tests.datasets a proper package for mypy, skip row-overlap validation on trusted built-in splits, and keep external split validation unchanged.

PascalIversen · 2026-06-29T14:59:55Z

works for me! looks great!! :) so would approve after that mini fix I commented! Thanks!!

Use len(splits) instead of the requested params value so external scripts that return a different fold count are reflected accurately.

PascalIversen

looks good!!!

nictru added 5 commits June 26, 2026 13:34

feat: add user-defined custom CV split scripts

f2e9cd3

Allow advanced users to supply create_splits(response_data) scripts validated by test_mode semantics, wired through the experiment loop, CLI, and make-cv-pkls while preserving the existing fold-dict training contract.

fix: satisfy pre-commit lint for custom split changes

5c3921a

Add sphinx docstrings, replace Protocol with Callable, fix isort/black formatting, and document tests so CI pre-commit passes.

fix: document tmp_path in custom split test docstrings

28bb116

Satisfy flake8-darglint DAR101 for pytest fixture parameters.

feat: pass CustomSplitParams to user split scripts

44a4670

Expose pipeline split settings via a frozen CustomSplitParams dataclass passed as the second argument to create_splits(response_data, params).

docs: add docstrings to custom_splits internal helpers

3daab58

Document private validation helpers with sphinx param/returns/raises sections.

nictru marked this pull request as ready for review June 27, 2026 12:54

nictru requested a review from JudithBernett June 27, 2026 12:54

PascalIversen reviewed Jun 28, 2026

View reviewed changes

nictru added 2 commits June 29, 2026 15:12

Merge branch 'development' into custom-splits

36540d2

nictru added 3 commits June 29, 2026 15:15

Fix pre-commit lint and merge development into custom-splits.

654cd0d

Add missing docstrings and isort fixes for the split provider refactor, and sync with development.

Fix CI failures for split provider refactor.

ee05c03

Make tests.datasets a proper package for mypy, skip row-overlap validation on trusted built-in splits, and keep external split validation unchanged.

Add package docstrings for tests.datasets mypy layout.

7f10271

nictru requested a review from PascalIversen June 29, 2026 14:15

PascalIversen reviewed Jun 29, 2026

View reviewed changes

Comment thread drevalpy/datasets/splits/manifest.py Outdated

Record actual split count in manifest n_cv_splits.

d62354e

Use len(splits) instead of the requested params value so external scripts that return a different fold count are reflected accurately.

PascalIversen approved these changes Jun 29, 2026

View reviewed changes

nictru merged commit 07ff16d into development Jun 29, 2026
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add user-defined custom CV split scripts#444

feat: add user-defined custom CV split scripts#444
nictru merged 11 commits into
developmentfrom
custom-splits

nictru commented Jun 26, 2026 •

edited

Loading

Uh oh!

PascalIversen Jun 28, 2026

Uh oh!

PascalIversen commented Jun 28, 2026

Uh oh!

nictru commented Jun 29, 2026

Uh oh!

Uh oh!

PascalIversen commented Jun 29, 2026

Uh oh!

PascalIversen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nictru commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

PascalIversen Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

PascalIversen commented Jun 28, 2026

Uh oh!

nictru commented Jun 29, 2026

Uh oh!

Uh oh!

PascalIversen commented Jun 29, 2026

Uh oh!

PascalIversen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nictru commented Jun 26, 2026 •

edited

Loading