Modular components stack with models/components split#443
Draft
nictru wants to merge 32 commits into
Draft
Conversation
Introduce featurizer/predictor registries with cell_line/drug terminology, delegate sklearn and naive DRPModel classes via ComponentDRPBridge, and fix single-drug models for cell-line-only feature paths.
Stabilize the modular core with public drevalpy.components exports, serialization hooks for save/load bridges, a built-in model zoo, and external extension registration without changing legacy MODEL_FACTORY workflows.
Move DRP bridge code to the models layer, extract shared feature helpers into drevalpy.data, and split native vs legacy component registration while keeping MODEL_FACTORY and import compatibility.
Relocate literature implementations under components/predictors/literature, wire MODEL_FACTORY through public_models, add native structured predictors and zoo entries, and keep drevalpy.models as compatibility shims.
Keeps models/baselines import paths stable while baselines live next to other predictors for the component bridge and MODEL_FACTORY.
Move Lightning metrics mixin into components with model shims, point predictor implementations at data.features/preprocessing, fix circular imports via lazy MODEL_FACTORY, and replace public_models parity tests with factory-focused suites.
Avoid re-fetching nf-core test archives on every pytest session when local toy data is already present.
Poetry is the canonical lockfile; uv.lock and training artifacts are local noise.
Relocate factory, zoo, and composed-model logic under drevalpy.models with thin components wrappers, add legacy literature import shims, compatibility tests, and remove duplicate zoo YAMLs.
Import zoo APIs from drevalpy.models.zoo directly while keeping package-level re-exports on drevalpy.components.
Set conservative native-thread defaults before importing XGBoost so the predictor remains stable when PyTorch/OpenMP is already loaded in the same process.
Apply isort formatting, scoped flake8/mypy configuration for the new components stack, and small import and test fixes so local and GitHub Actions checks pass.
Remove package-wide mypy overrides and flake8 docstring exemptions, fix types via state helpers and impl delegation, and narrow flake8 ignores to literature impl and method-level docstrings only.
Replace dataclass config models and manual YAML coercion with validated Pydantic schemas while keeping semantic registry checks in validate().
Delete re-export modules under drevalpy.components and trim the package public API to featurizers, predictors, registries, and config schema only.
Keeps one implementation per file for easier navigation while preserving registry names and behavior.
…eaturizers. Introduces compact featurizer config syntax, per-omics cell-line featurizers, and a unified wrapper that supports both dense concatenation and structured block outputs.
Keep generic PCA at the cell_line level since it applies to any view, not a single omics layer.
Parse `+` in model recipe triples into concatFeaturizers configs and add drug-side concat support so both featurizer slots can be built from compact strings.
…tity to name. Parse featurizer lists as concatFeaturizers, accept bare predictor strings and one-key hyperparameter mappings, and rewrite zoo entries to the shorter form.
Load presets from drevalpy/models/zoo/*.yaml directly instead of the entries subdirectory.
Eagerly materialize concat child featurizers so sklearn and literature models round-trip preprocessing state correctly, and map legacy drug/cell-line view names to modular featurizers.
Relocate ModelConfig, featurizer/predictor config types, and validation under drevalpy.models so orchestration stays in the models layer and components no longer re-exports config symbols.
…nce. Route literature and baseline wrappers through composed featurizers and predictors, replace PairContext with identity/tissue featurizers, unify save/load via component state, and add parity tests enforcing factory and zoo invariants.
…ces. Move DRPModel and literature engines to class-level default hyperparameters resolved from ModelConfig, add missing sklearn and literature search spaces, and extend merged tuning keys to support indexed concat featurizer children.
Expose a thin wrapper over ModelConfig.from_spec and ComponentDRPBridge so callers can build named models from cellLine:drug:predictor recipe strings.
…ning. Replace grid-search hpam_tune_raytune with component-space sampling, add experiment controls for trial count and resources, and wire wandb trial logging.
Registry-driven entity_id_only detection avoids falling back to fingerprints or gene expression when identity featurizers are configured, and routes construct_model and zoo preset loading through ID-only CSV loaders.
…meters. Strip the full featurizer prefix when applying merged structured defaults so construct_model recipes with tunable featurizers no longer pass dotted keys to featurizer constructors.
…adapters. Centralize flat-to-config conversion in drp_hyperparameters, make search_space apply only canonical prefixed keys, and run construct_model and Ray/Optuna trials config-first while preserving development-style build_model and factory APIs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
drevalpy.componentsfor featurizers, predictors, registries, and declarativeModelConfig(Pydantic v2).drevalpy.models(MODEL_FACTORY,DRPModelsubclasses, import-compat shims).ComposedModel, bridge) todrevalpy.modelsonly — no re-export shims incomponents.MultiViewXGBoost.Test plan