Implement deferred imports by smcolby · Pull Request #552 · OpenADMET/openadmet-models

smcolby · 2026-05-18T16:20:45Z

Description

Importing openadmet.models.registries was paying the full cost of every 3rd-party library in the package (~6.7s cold) regardless of which components were actually needed. This PR replaces the "import everything at once" paradigm with deferred/lazy imports across all four component groups (models, featurizers, splitters, trainers/evaluators). Resolves #550 .

Results

Benchmark	Before	After	Speedup
`import openadmet.models.registries`	6.702s	0.111s	60×
`import openadmet.models`	0.044s	0.017s	2.6×
`registries + load_all()`	6.702s	3.727s	1.8× (+ fully deferrable)
`architecture/model_base.py`	1.652s	0.070s	24×
`architecture/chemprop.py`	3.083s	0.101s	30×
`split/cluster.py`	3.524s	0.331s	11×
`trainer/lightning.py`	1.653s	0.069s	24×
`eval/regression.py`	1.582s	0.326s	4.9×

Changes

Split model_base.py

Extracted LightningModuleBase and LightningModelBase into a new architecture/lightning_model_base.py. All torch / lightning imports are isolated there.
model_base.py uses PEP 562 module __getattr__ to lazily re-export the Lightning classes, preserving backward-compatible from model_base import LightningModelBase without paying the torch cost at import time.
Deferred joblib inside save() / load().

Deferred estimator class imports

Replaced mod_class: ClassVar[type] = SomeThirdPartyClass with a _get_estimator_class() classmethod in every concrete pickleable model (xgboost, catboost, lgbm, rf, svm, tabpfn, dummy). Each classmethod contains a local import that fires only at first build() call.
Moved _METRIC_TO_LOSS dict from module level into chemprop.build().

Deferred imports in features / split / trainer / eval

features/feature_base.py: heavy molfeat / torch imports moved to TYPE_CHECKING block; from __future__ import annotations added.
features/chemprop.py: removed a self-import bug (lines importing from itself); all chemprop / torch / sklearn imports deferred inside featurize() and _vendor_build_dataloader().
split/scaffold.py: splito and sklearn.model_selection.train_test_split deferred inside each split() method.
split/cluster.py: useful_rdkit_utils, datamol, molfeat, KMeans deferred inside split(); removed unused GroupShuffleSplit import.
trainer/lightning.py: torch, lightning, and all callbacks/loggers deferred inside build() / train().
eval/regression.py: wandb deferred inside if self.use_wandb: blocks; scipy.stats, sklearn.metrics, and seaborn deferred by converting the class-level _metrics dict into a _base_metrics() classmethod and moving plot imports inside plot methods.
eval/eval_base.py: scipy.stats.bootstrap deferred inside stat_and_bootstrap().
eval/cross_validation.py: stopped importing removed module-level names from regression.py; wrap_ktau / wrap_spearmanr now do local imports.

Lazy registry loading

New openadmet/models/_registry_loader.py exposes load_group(name) and load_all() (both idempotent), using importlib.import_module. Zero heavy imports at module level.
registries.py rewritten to only import the six registry objects (models, featurizers, splitters, trainers, evaluators, ensemblers) plus re-export load_all.
Every get_*_class() function now calls load_group() before the registry lookup, so any single-component usage auto-loads only what it needs.
anvil/specification.py and anvil/workflow_base.py updated to import load_all instead of from registries import *.

Quality Assurance & AI Policy

To maintain project quality and respect maintainer bandwidth, please confirm the following:

Manual Verification: I have manually reviewed and tested the code in this PR.
AI-Assisted Content: If AI tools were used (e.g., Copilot, ChatGPT), I have personally verified the logic, edge cases, and compliance with the existing codebase. I confirm the code is not a "blind" AI generation.
Minimal Review: I believe this PR is in a state that requires minimal intervention or correction from maintainers.
Scoped Change: This PR addresses a single, well-scoped issue rather than multiple unrelated changes.

Status

Ready to go (Checking this signals to maintainers that the PR is ready for final review)

Developers Certificate of Origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

Note to Contributors: We reserve the right to close PRs without review if they appear to lack human validation or do not meet the quality standards described in our CONTRIBUTING.md.

… load Baseline: import openadmet.models.registries = 6.702s After: import openadmet.models.registries = 0.111s (60x faster) Phase 1 - Split model_base.py: - Create architecture/lightning_model_base.py isolating all torch/lightning imports - Strip model_base.py of torch/lightning/joblib top-level imports - Add PEP 562 module __getattr__ for lazy LightningModelBase re-export - Defer joblib inside save()/load() method bodies - Result: architecture/model_base.py 1.652s -> 0.070s Phase 2 - Deferred estimator class imports: - Replace mod_class: ClassVar[type] = SomeClass with _get_estimator_class() classmethod in all concrete architecture modules (xgboost, catboost, lgbm, rf, svm, tabpfn, dummy) - Remove all top-level 3rd-party imports from these modules - Move _METRIC_TO_LOSS dict initialization inside chemprop build() - Result: each arch module 2-3s -> ~0.1s Phase 3 - Deferred imports in features/split/trainer/eval: - feature_base.py: move molfeat/torch imports to TYPE_CHECKING block - features/chemprop.py: remove self-import bug; defer all chemprop/torch/sklearn imports inside featurize() and _vendor_build_dataloader() - split/scaffold.py: defer splito and sklearn.model_selection inside split() - split/cluster.py: defer useful_rdkit_utils, datamol, molfeat, KMeans inside split(); remove unused GroupShuffleSplit import - trainer/lightning.py: defer torch and lightning imports inside build()/train() - eval/regression.py: defer wandb, scipy.stats, sklearn.metrics, seaborn inside their respective usage methods; convert _metrics class var to _base_metrics() classmethod; fix cross_validation.py to not import removed module-level names - eval/eval_base.py: defer scipy.stats.bootstrap inside stat_and_bootstrap() - Result: registries 6.702s -> 3.548s Phase 4 - Lazy registry loading: - Create _registry_loader.py with idempotent load_group()/load_all() functions using importlib.import_module; zero heavy imports at module level - Rewrite registries.py to only import base registry objects and expose load_all() - Add load_group() call to each get_*_class() function for on-demand loading - Update anvil/specification.py and anvil/workflow_base.py to import load_all instead of wildcard-importing registries - Result: import openadmet.models.registries 6.702s -> 0.111s (60x faster) Before/after summary: import openadmet.models.registries: 6.702s -> 0.111s architecture/model_base.py: 1.652s -> 0.070s architecture/xgboost.py: 2.123s -> 0.099s architecture/chemprop.py: 3.083s -> 0.101s split/cluster.py: 3.524s -> 0.331s split/scaffold.py: 1.476s -> 0.330s trainer/lightning.py: 1.653s -> 0.069s eval/regression.py: 1.582s -> 0.326s registries + load_all(): N/A -> 3.727s (same real cost, deferred) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

for more information, see https://pre-commit.ci

- load_group(): add Parameters section with valid group keys - load_all(): expand summary line - get_mod_class(): add full Parameters/Returns/Raises sections - get_featurizer_class(): add full Parameters/Returns/Raises sections - get_ensemble_class(): add full Parameters/Returns/Raises sections - RegressionEvaluator._base_metrics(): add Returns section All 5 D413 blank-line-after-section violations auto-fixed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

get_transform_class() was the only get_*_class() function not calling load_group() before the registry lookup. This caused 'ImputeTransform not found in transform catalogue' in integration tests because the transforms group was never eagerly loaded under the new lazy registry. Also adds the missing Raises section to the docstring. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

smcolby and others added 4 commits May 17, 2026 16:30

[pre-commit.ci] auto fixes from pre-commit.com hooks

238add4

for more information, see https://pre-commit.ci

smcolby self-assigned this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement deferred imports#552

Implement deferred imports#552
smcolby wants to merge 4 commits into
mainfrom
enh/550/deferred-imports

smcolby commented May 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

smcolby commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Results

Changes

Quality Assurance & AI Policy

Status

Developers Certificate of Origin

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

smcolby commented May 18, 2026 •

edited

Loading