Skip to content

[ENH] Improve sentrux metrics #519

@smcolby

Description

@smcolby

More an FYI than anything, was just curious what the assessment would yield.

Prompt (Claude Sonnet 4.6 [High]): Generate a plan to improve code quality metrics reported by sentrux, a tool you have MCP access to, for the openadmet-models package.


Plan: Improve Sentrux Code Quality Metrics for openadmet-models

Current State

Quality signal: 6181 / 10000 (geometric mean of 5 component scores)

Metric Score Raw Notes
modularity 4726 0.209 🔴 BOTTLENECK — 130/181 edges cross-package (72%)
acyclicity 5000 1 🟡 Moderate
equality 5573 0.443 🟡 Several files far above average size
depth 8000 2 🟢 Good layering
redundancy 8563 0.143 🟢 Low duplication

Root Cause Analysis

Cross-package dependency graph

__root__ (registries.py, drivers.py, log.py)
  └─ imports: active_learning, architecture, eval, features, split, trainer, transforms

active_learning → architecture
anvil → __root__, active_learning, architecture, eval, features, split, trainer, transforms
architecture → __root__
cli → anvil, comparison, inference
eval → __root__, trainer          ← VIOLATION: eval should not depend on trainer
inference → active_learning, anvil, features
trainer → __root__, architecture

Key violations & issues

  1. evaltrainer (avoidable layer violation)

    • eval/cross_validation.py line 25: from openadmet.models.trainer.lightning import LightningTrainer
    • Eval logic should use the abstract TrainerBase, not a concrete trainer implementation
    • Fix: replace LightningTrainer type reference with TrainerBase
  2. drivers.py at root level creates fan-out

    • DriverType enum sits in openadmet.models.drivers (root package)
    • Imported by architecture/, eval/, trainer/, anvil/ — causing 4 distinct cross-package edges
    • Fix: move drivers.py into architecture/ where it semantically belongs; update all importers
  3. Oversized files hurt equality score

    File Lines Problem
    comparison/posthoc.py 1126 Mixes statistics, S3 I/O, PDF generation, and plotting
    anvil/workflow.py 933 Monolithic workflow orchestration
    eval/cross_validation.py 828 Cross-validation + multi-task + plotting
    anvil/specification.py 811 14 Pydantic model classes in one file
    eval/regression.py 717 Regression metrics + plotting utilities
    active_learning/committee.py 612 Committee logic + multiple strategy classes
  4. No .sentrux/rules.toml — architectural constraints not enforced

Proposed Changes

Phase 1 — Fix layer violations (→ modularity)

1a. eval/cross_validation.py: remove LightningTrainer import

  • Replace the concrete LightningTrainer type annotation and constructor call (lines 25, 533, 632) with the abstract TrainerBase from trainer/trainer_base.py
  • The cross_validation code only needs to call trainer.train() — it doesn't need to know the implementation

1b. Move drivers.py into architecture/

  • Move openadmet/models/drivers.pyopenadmet/models/architecture/drivers.py
  • Update all import sites: architecture/model_base.py, eval/cross_validation.py, trainer/lightning.py, anvil/workflow.py, anvil/specification.py
  • This converts the cross-package edges architecture→__root__, eval→__root__, trainer→__root__ into edges to architecture, which is already a dependency for all of them
  • Removes the __root__ package from the cross-package graph for DriverType

Phase 2 — Split oversized files (→ equality)

2a. Split eval/regression.py (717 lines)

  • Extract plotting helpers into eval/regression_plots.py
  • Keep metric evaluator classes in eval/regression.py
  • Update cross_validation.py and registries.py imports

2b. Split anvil/specification.py (811 lines, 14 classes)

  • anvil/data_spec.pyDataSpec, Metadata, data-related config models
  • anvil/model_spec.pyModelSpec, FeatureSpec, TrainerSpec
  • anvil/eval_spec.pyEvalSpec, CrossValidationSpec, split-related config
  • anvil/specification.py — keep only top-level AnvilSpecification that assembles the above
  • Update anvil/workflow_base.py, anvil/workflow.py, and registries.py imports

2c. Split comparison/posthoc.py (1126 lines)

  • comparison/posthoc_stats.py — statistical test logic
  • comparison/posthoc_io.py — S3 loading and serialization
  • comparison/posthoc_plots.py — seaborn/matplotlib visualization
  • comparison/posthoc.pyPostHocComparison orchestrator that delegates to above
  • Update comparison/__init__.py and CLI imports

Phase 3 — Add architectural rules (→ enforcement)

3a. Create .sentrux/rules.toml

[rules]
# Eval layer must not depend on concrete trainer implementations
[[rules.forbidden]]
from = "openadmet.models.eval"
to   = "openadmet.models.trainer"
reason = "Use TrainerBase abstractions instead of concrete trainers"

# CLI is the only allowed entry point to orchestration
[[rules.forbidden]]
from = "openadmet.models.architecture"
to   = "openadmet.models.anvil"
reason = "Architecture layer must not depend on orchestration layer"

# Base classes must not import from implementations
[[rules.forbidden]]
from = "openadmet.models.architecture.model_base"
to   = "openadmet.models.architecture.chemprop"
reason = "Base must not depend on implementations"

Phase 4 — Minor housekeeping (→ redundancy / acyclicity)

4a. Fix duplicate import in features/pairwise.py

  • Remove one of the two from collections.abc import Sequence lines

4b. Audit registries.py wildcard imports

  • Replace from module import * with explicit named imports where __all__ is defined and the list is manageable
  • For modules where wildcard is truly necessary (side-effect registration), add explicit __all__ to each implementation module

Expected Metric Improvements

Change Affected Metric Mechanism
Remove eval→trainer edge modularity ↑ Eliminates one cross-package edge type
Move drivers.py into architecture/ modularity ↑ Removes __root__ from cross-package graph
Split regression.py, specification.py, posthoc.py equality ↑ Reduces variance in file/module sizes
.sentrux/rules.toml prevents regression Enforces boundaries going forward
Fix duplicate import redundancy slight ↑ Removes one redundant symbol

Implementation Order

Dependencies:

  • Phase 1b (drivers.py move) must complete before Phase 3 rule can be written correctly
  • Phase 2b (specification.py split) affects workflow_base.py and workflow.py — do before running tests
  • Phase 3 can be done independently at any time
  • Phase 4 items are fully independent

Suggested order: 4a → 1a → 1b → 3a → 2a → 2b → 2c

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions