Skip to content

Move files into the organization required#5

Open
Kaftow wants to merge 4 commits into
Open-Book-Genome-Project:mainfrom
Kaftow:refactor/move-shared-kernel
Open

Move files into the organization required#5
Kaftow wants to merge 4 commits into
Open-Book-Genome-Project:mainfrom
Kaftow:refactor/move-shared-kernel

Conversation

@Kaftow
Copy link
Copy Markdown

@Kaftow Kaftow commented Apr 22, 2026

This PR refactors the subject migration pipeline by extracting the shared classification logic out of scripts/migrate_subjects.py into reusable modules.

What changed

  • Added core/json_loader.py to centralize loading of mapping JSON files from scripts/mappings/
  • Added rule_engine/normalization.py for shared normalization helpers and regex-based detection of reading levels and classification codes
  • Added core/subject_classifier.py as the reusable core for subject classification at both subject and work level
  • Simplified scripts/migrate_subjects.py so it acts primarily as the CLI / orchestration layer instead of owning all classification logic
  • Updated scripts/README.md to document the new internal module structure

Why

Previously, the migration script contained mapping loading, normalization utilities, and classification logic all in one file. This made the code harder to reuse, test, and extend. By moving those responsibilities into dedicated modules, the branch makes the classifier easier to maintain and prepares the codebase for reuse outside the CLI script.

Impact

This is primarily a structural refactor. The intended classification behavior remains the same, but the implementation is now split into clearer layers:

  • data loading
  • normalization / low-level detection
  • reusable classification logic
  • CLI entry point

@Kaftow
Copy link
Copy Markdown
Author

Kaftow commented Apr 23, 2026

The CLI remains the entry point, but the reusable tagging logic now lives in a package-style layout:

  • tagging/__init__.py for the public re-export surface
  • tagging/engine.py for the TypedTagger implementation
  • tagging/normalization.py for pure normalization and classification helpers
  • tagging/json_loader.py for mapping JSON I/O
  • tagging/models.py for typed result structures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant