Grouped by work‑stream; tick ☑︎ when done.
- Create Git repo; enable GitHub Actions template.
- Define editable Poetry environment (
pyproject.toml). - Pre‑commit hooks: ruff, black, isort, mdformat.
- Data schema →
datamodels.py(Pydantic) for Variant, AssayResult, RoundMetadata. - Data manager with SQLite / DuckDB backend; CRUD, versioning.
- Embedding cache with mmap HDF5; script
embed.py(batch, multiprocess). - Config system via
yaml+ env overrides (OmegaConf).
- Ridge / MLP head using
sklearn/torch.nn. - Adapter‑LoRA utility to unfreeze ≤ 2 layers (
lora_r=4). - RL policy (A2C): action = {pos, aa}, mask illegal edits.
- Acquisition strategies module: UCB, EI, Thompson, diversity‑penalized.
-
controller.pyorchestrating propose → assay → fit cycle. - Temperature & batch‑size scheduling CLI flags.
- CLI (Typer):
plm propose,plm learn,plm benchmark. - Python API:
from plm import Framework; Framework().fit_round(...). - Dashboard (optional): Streamlit view of embedding UMAP & acquisition scores.
- Port Science 2024 & Nat Commun 2025 datasets (convert to internal schema).
-
scripts/benchmark.pyto generate metrics table & figures. - Unit tests (> 80 % coverage) for head training & acquisition math.
- FP16 / Int8 quantization toggle (
bitsandbytes). - CPU‑only training path; profiling with
torch.profiler. - Automated runtime budget check (< 2 h on 8‑core Intel i7).
- MkDocs site with tutorial notebooks ("0‑shot", "1‑round", RL demo).
- API reference auto‑generated via
mkdocstrings. - Contribution guide & code‑of‑conduct.
- Semantic versioning; generate changelog with
towncrier. - Build & publish wheels to TestPyPI → PyPI.
- Dockerfile (CPU slim) for reproducible runs.
- Multi‑objective optimization (e.g., stability × activity via Pareto front).
- GPT‑4o‑driven candidate filtering (natural‑language constraints).
- Fine‑tune on metagenomic unlabeled sequences (self‑distillation).