Implement deferred imports#552
Draft
smcolby wants to merge 4 commits into
Draft
Conversation
… load Baseline: import openadmet.models.registries = 6.702s After: import openadmet.models.registries = 0.111s (60x faster) Phase 1 - Split model_base.py: - Create architecture/lightning_model_base.py isolating all torch/lightning imports - Strip model_base.py of torch/lightning/joblib top-level imports - Add PEP 562 module __getattr__ for lazy LightningModelBase re-export - Defer joblib inside save()/load() method bodies - Result: architecture/model_base.py 1.652s -> 0.070s Phase 2 - Deferred estimator class imports: - Replace mod_class: ClassVar[type] = SomeClass with _get_estimator_class() classmethod in all concrete architecture modules (xgboost, catboost, lgbm, rf, svm, tabpfn, dummy) - Remove all top-level 3rd-party imports from these modules - Move _METRIC_TO_LOSS dict initialization inside chemprop build() - Result: each arch module 2-3s -> ~0.1s Phase 3 - Deferred imports in features/split/trainer/eval: - feature_base.py: move molfeat/torch imports to TYPE_CHECKING block - features/chemprop.py: remove self-import bug; defer all chemprop/torch/sklearn imports inside featurize() and _vendor_build_dataloader() - split/scaffold.py: defer splito and sklearn.model_selection inside split() - split/cluster.py: defer useful_rdkit_utils, datamol, molfeat, KMeans inside split(); remove unused GroupShuffleSplit import - trainer/lightning.py: defer torch and lightning imports inside build()/train() - eval/regression.py: defer wandb, scipy.stats, sklearn.metrics, seaborn inside their respective usage methods; convert _metrics class var to _base_metrics() classmethod; fix cross_validation.py to not import removed module-level names - eval/eval_base.py: defer scipy.stats.bootstrap inside stat_and_bootstrap() - Result: registries 6.702s -> 3.548s Phase 4 - Lazy registry loading: - Create _registry_loader.py with idempotent load_group()/load_all() functions using importlib.import_module; zero heavy imports at module level - Rewrite registries.py to only import base registry objects and expose load_all() - Add load_group() call to each get_*_class() function for on-demand loading - Update anvil/specification.py and anvil/workflow_base.py to import load_all instead of wildcard-importing registries - Result: import openadmet.models.registries 6.702s -> 0.111s (60x faster) Before/after summary: import openadmet.models.registries: 6.702s -> 0.111s architecture/model_base.py: 1.652s -> 0.070s architecture/xgboost.py: 2.123s -> 0.099s architecture/chemprop.py: 3.083s -> 0.101s split/cluster.py: 3.524s -> 0.331s split/scaffold.py: 1.476s -> 0.330s trainer/lightning.py: 1.653s -> 0.069s eval/regression.py: 1.582s -> 0.326s registries + load_all(): N/A -> 3.727s (same real cost, deferred) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
- load_group(): add Parameters section with valid group keys - load_all(): expand summary line - get_mod_class(): add full Parameters/Returns/Raises sections - get_featurizer_class(): add full Parameters/Returns/Raises sections - get_ensemble_class(): add full Parameters/Returns/Raises sections - RegressionEvaluator._base_metrics(): add Returns section All 5 D413 blank-line-after-section violations auto-fixed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
get_transform_class() was the only get_*_class() function not calling load_group() before the registry lookup. This caused 'ImputeTransform not found in transform catalogue' in integration tests because the transforms group was never eagerly loaded under the new lazy registry. Also adds the missing Raises section to the docstring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Importing
openadmet.models.registrieswas paying the full cost of every 3rd-party library in the package (~6.7s cold) regardless of which components were actually needed. This PR replaces the "import everything at once" paradigm with deferred/lazy imports across all four component groups (models, featurizers, splitters, trainers/evaluators). Resolves #550 .Results
import openadmet.models.registriesimport openadmet.modelsregistries + load_all()architecture/model_base.pyarchitecture/chemprop.pysplit/cluster.pytrainer/lightning.pyeval/regression.pyChanges
Split
model_base.pyLightningModuleBaseandLightningModelBaseinto a newarchitecture/lightning_model_base.py. Alltorch/lightningimports are isolated there.model_base.pyuses PEP 562 module__getattr__to lazily re-export the Lightning classes, preserving backward-compatiblefrom model_base import LightningModelBasewithout paying the torch cost at import time.joblibinsidesave()/load().Deferred estimator class imports
mod_class: ClassVar[type] = SomeThirdPartyClasswith a_get_estimator_class()classmethod in every concrete pickleable model (xgboost,catboost,lgbm,rf,svm,tabpfn,dummy). Each classmethod contains a local import that fires only at firstbuild()call._METRIC_TO_LOSSdict from module level intochemprop.build().Deferred imports in features / split / trainer / eval
features/feature_base.py: heavymolfeat/torchimports moved toTYPE_CHECKINGblock;from __future__ import annotationsadded.features/chemprop.py: removed a self-import bug (lines importing from itself); allchemprop/torch/sklearnimports deferred insidefeaturize()and_vendor_build_dataloader().split/scaffold.py:splitoandsklearn.model_selection.train_test_splitdeferred inside eachsplit()method.split/cluster.py:useful_rdkit_utils,datamol,molfeat,KMeansdeferred insidesplit(); removed unusedGroupShuffleSplitimport.trainer/lightning.py:torch,lightning, and all callbacks/loggers deferred insidebuild()/train().eval/regression.py:wandbdeferred insideif self.use_wandb:blocks;scipy.stats,sklearn.metrics, andseaborndeferred by converting the class-level_metricsdict into a_base_metrics()classmethod and moving plot imports inside plot methods.eval/eval_base.py:scipy.stats.bootstrapdeferred insidestat_and_bootstrap().eval/cross_validation.py: stopped importing removed module-level names fromregression.py;wrap_ktau/wrap_spearmanrnow do local imports.Lazy registry loading
openadmet/models/_registry_loader.pyexposesload_group(name)andload_all()(both idempotent), usingimportlib.import_module. Zero heavy imports at module level.registries.pyrewritten to only import the six registry objects (models,featurizers,splitters,trainers,evaluators,ensemblers) plus re-exportload_all.get_*_class()function now callsload_group()before the registry lookup, so any single-component usage auto-loads only what it needs.anvil/specification.pyandanvil/workflow_base.pyupdated toimport load_allinstead offrom registries import *.Quality Assurance & AI Policy
To maintain project quality and respect maintainer bandwidth, please confirm the following:
Status
Developers Certificate of Origin
Note to Contributors: We reserve the right to close PRs without review if they appear to lack human validation or do not meet the quality standards described in our
CONTRIBUTING.md.