Skip to content

Implement CPM-aware threshold tuning with fold-level statistic reuse #44

@psychelzh

Description

@psychelzh

Objective

Implement a CPM-aware tuning path that reuses fold-level computation across threshold candidates.

Why

Threshold sweeps are a core CPM workload. The current generic tuning path recomputes a full fold fit for each threshold value and pays substantial framework overhead. CPM should be able to reuse fold-local statistics and make threshold tuning much cheaper.

Scope

  • Design a tuning runner that stays inside cpmr rather than routing through tune::tune_grid().
  • Reuse per-fold artifacts wherever valid, for example edge-behavior association statistics computed from the analysis split.
  • Support multiple thresh_level values under a fixed thresh_method.
  • Keep leakage safety explicit and testable.
  • Benchmark against:
    • repeated native fits without reuse
    • the current tidymodels tuning path
  • Ensure the design composes cleanly with nested CV work tracked in Follow-up: Nested CV for model selection safety #31.

Open questions

  • How much reuse is possible for alpha and sparsity thresholds respectively?
  • Which preprocessing choices must invalidate cached fold artifacts?
  • Should the first implementation focus only on threshold tuning, or also prepare for broader CPM strategy comparisons?

Non-goals

  • A general-purpose hyperparameter optimization framework.
  • Mirroring every feature of tune.
  • Optimizing the tidymodels adapter enough to erase framework overhead.

Exit criteria

  • A design for reusable fold artifacts is accepted.
  • Benchmarks show a meaningful speedup on threshold grids.
  • Tests/documentation cover the leakage-safe reuse assumptions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions