Skip to content

Modular components stack with models/components split#443

Draft
nictru wants to merge 32 commits into
developmentfrom
modularity
Draft

Modular components stack with models/components split#443
nictru wants to merge 32 commits into
developmentfrom
modularity

Conversation

@nictru

@nictru nictru commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Introduce drevalpy.components for featurizers, predictors, registries, and declarative ModelConfig (Pydantic v2).
  • Keep legacy experiment APIs on drevalpy.models (MODEL_FACTORY, DRPModel subclasses, import-compat shims).
  • Move orchestration (factory, config I/O, zoo, ComposedModel, bridge) to drevalpy.models only — no re-export shims in components.
  • Fix XGBoost OpenMP segfault on macOS for MultiViewXGBoost.

Test plan

  • Local Nox: pre-commit, mypy, tests (433), typeguard
  • CI green on GitHub Actions
  • Legacy import compat tests pass
  • Baseline and literature model smoke tests pass

nictru added 30 commits June 29, 2026 18:32
Introduce featurizer/predictor registries with cell_line/drug terminology, delegate sklearn and naive DRPModel classes via ComponentDRPBridge, and fix single-drug models for cell-line-only feature paths.
Stabilize the modular core with public drevalpy.components exports, serialization hooks for save/load bridges, a built-in model zoo, and external extension registration without changing legacy MODEL_FACTORY workflows.
Move DRP bridge code to the models layer, extract shared feature helpers into drevalpy.data, and split native vs legacy component registration while keeping MODEL_FACTORY and import compatibility.
Relocate literature implementations under components/predictors/literature, wire MODEL_FACTORY through public_models, add native structured predictors and zoo entries, and keep drevalpy.models as compatibility shims.
Keeps models/baselines import paths stable while baselines live next to other predictors for the component bridge and MODEL_FACTORY.
Move Lightning metrics mixin into components with model shims, point predictor
implementations at data.features/preprocessing, fix circular imports via lazy
MODEL_FACTORY, and replace public_models parity tests with factory-focused suites.
Avoid re-fetching nf-core test archives on every pytest session when local toy data is already present.
Poetry is the canonical lockfile; uv.lock and training artifacts are local noise.
Relocate factory, zoo, and composed-model logic under drevalpy.models with thin components wrappers, add legacy literature import shims, compatibility tests, and remove duplicate zoo YAMLs.
Import zoo APIs from drevalpy.models.zoo directly while keeping package-level re-exports on drevalpy.components.
Set conservative native-thread defaults before importing XGBoost so the predictor remains stable when PyTorch/OpenMP is already loaded in the same process.
Apply isort formatting, scoped flake8/mypy configuration for the new components stack, and small import and test fixes so local and GitHub Actions checks pass.
Remove package-wide mypy overrides and flake8 docstring exemptions, fix
types via state helpers and impl delegation, and narrow flake8 ignores to
literature impl and method-level docstrings only.
Replace dataclass config models and manual YAML coercion with validated
Pydantic schemas while keeping semantic registry checks in validate().
Delete re-export modules under drevalpy.components and trim the package
public API to featurizers, predictors, registries, and config schema only.
Keeps one implementation per file for easier navigation while preserving registry names and behavior.
…eaturizers.

Introduces compact featurizer config syntax, per-omics cell-line featurizers, and a unified wrapper that supports both dense concatenation and structured block outputs.
Keep generic PCA at the cell_line level since it applies to any view, not a single omics layer.
Parse `+` in model recipe triples into concatFeaturizers configs and add drug-side concat support so both featurizer slots can be built from compact strings.
…tity to name.

Parse featurizer lists as concatFeaturizers, accept bare predictor strings and one-key hyperparameter mappings, and rewrite zoo entries to the shorter form.
Load presets from drevalpy/models/zoo/*.yaml directly instead of the entries subdirectory.
Eagerly materialize concat child featurizers so sklearn and literature models round-trip preprocessing state correctly, and map legacy drug/cell-line view names to modular featurizers.
Relocate ModelConfig, featurizer/predictor config types, and validation under drevalpy.models so orchestration stays in the models layer and components no longer re-exports config symbols.
…nce.

Route literature and baseline wrappers through composed featurizers and predictors, replace PairContext with identity/tissue featurizers, unify save/load via component state, and add parity tests enforcing factory and zoo invariants.
…ces.

Move DRPModel and literature engines to class-level default hyperparameters
resolved from ModelConfig, add missing sklearn and literature search spaces,
and extend merged tuning keys to support indexed concat featurizer children.
Expose a thin wrapper over ModelConfig.from_spec and ComponentDRPBridge so
callers can build named models from cellLine:drug:predictor recipe strings.
…ning.

Replace grid-search hpam_tune_raytune with component-space sampling, add
experiment controls for trial count and resources, and wire wandb trial logging.
Registry-driven entity_id_only detection avoids falling back to fingerprints or gene expression when identity featurizers are configured, and routes construct_model and zoo preset loading through ID-only CSV loaders.
nictru added 2 commits June 29, 2026 18:35
…meters.

Strip the full featurizer prefix when applying merged structured defaults so construct_model recipes with tunable featurizers no longer pass dotted keys to featurizer constructors.
…adapters.

Centralize flat-to-config conversion in drp_hyperparameters, make search_space apply only canonical prefixed keys, and run construct_model and Ray/Optuna trials config-first while preserving development-style build_model and factory APIs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant