WIP: feat(metrics): implementation of metric tasks by project-defiant · Pull Request #134 · opentargets/pts

project-defiant · 2026-05-20T13:12:55Z

Motivation

We need a lightweight, reproducible way to compute and record key counts for every
dataset produced by the pipeline — total rows, distinct values, group distributions —
so we can track data quality and catch regressions across releases. This PR introduces
the pts.metrics framework and a standalone metrics.yaml config that covers all 32
datasets.

What's new

pts.metrics — metric definition framework

A Polars-based framework for declaring, computing, and serialising dataset metrics.

Metric (abstract) — Pydantic base; subclasses declare config fields and implement
compute(df). run() guards against empty DataFrames and wraps with debug logging.
MetricResult (abstract) — to_unified_record() serialises metric-specific fields as a
JSON blob into a flat UnifiedMetricRecord, which is written as one Parquet row per
metric.
Built-in types — count, distinct_count, grouped_count, grouped_count_explode,
grouped_sum.
Custom types — l2g_significant_gene (genes with L2G score ≥ threshold).
MetricType enum — member name = YAML type field, value = implementer class.
MetricType.load(cfg) is the single constructor used throughout; adding a new metric
type is a one-liner in the enum.
Null keys are preserved as their own group in all grouped metrics (no silent data
loss).
Parquet is read with per-metric column projection to minimise memory use; recursive
**/*.parquet glob supports nested partitions.

CollectMetrics — otter task

CollectMetricsSpec validates metric definitions at config-load time.
Metrics are run concurrently via ThreadPoolExecutor — one thread per metric, so
parquet reads and Polars computations overlap.
Output is a single Parquet file with one row per metric, stamped with release, run,
dataset, source, and destination.

Config

metrics.yaml — standalone otter config covering all 32 datasets. Run any step with
pts -s -c metrics.yaml.
config.yaml — collect_metrics steps added alongside the existing transform steps.

Testing

Unit tests for EmptyDatasetError, abstract-class enforcement, spec validation,
scratchpad key checks, YAML config validity, end-to-end Parquet output, relative path
resolution, and bad-column error propagation.
Doctests in count.py and grouped.py covering single- and multi-column cases with null
groups.
All 356 tests pass. All 32 metrics.yaml steps verified against real data.

Notable design decisions

MetricType enum over a registry dict — adding a new type is one line; no separate
registration step.
field_serializer on CollectMetricsSpec.metrics — necessary because Pydantic v2
serialises list[Metric] using only base-class fields, stripping subclass fields during
otter's internal model_dump round-trip.
ThreadPoolExecutor over asyncio — Polars has no async API; threading gives real
concurrency without the complexity of wrapping sync code in asyncio.to_thread.
pts CLI, not otter — the otter CLI only registers built-in task types; pts also
registers pts.tasks, which is where CollectMetrics lives.

Add standalone pts.metrics framework with abstract base classes and four concrete metric types: CountMetric, DistinctCountMetric, DistributionMetric, GroupedCountMetric, GroupedSumMetric. All 31 tests pass (TDD — tests written before implementation). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

DiseaseByTherapeuticAreaMetric explodes the therapeuticAreas array and counts diseases per area. L2GSignificantGeneMetric counts distinct genes with score >= threshold. All 6 custom-metric tests pass (TDD). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Reads parquet from dataset_path, computes each metric, stamps release/run via model_copy, and writes JSON to metrics_root/{dataset}/{name}.json. All 4 runner tests pass; 41/41 across the full metrics suite. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

TransformSpec gains a metrics: list[Metric] field validated at construction time via _load_metric (discriminated union for standard types, importlib for custom). Transform.run() calls MetricRunner after the transformer completes when metrics are configured. Unknown types raise ValueError early. 46/46 tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add release/run to scratchpad and metrics declarations to the disease_efo transform step. Target, drug, association, and GWAS datasets are pyspark steps and need separate pyspark task integration before metrics can be added to those config entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Allows running metric collection on already-produced datasets without re-running a transformer. Reads release/run from scratchpad. 4 TDD tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nfig MetricRunner now reads per-metric with scan_parquet + column projection, dropping memory usage from full-dataset loads (3-17 GB OOM) to only the needed columns. Each Metric declares required_columns; empty list = row count only. Also adds metrics.yaml and scripts/collect_metrics.py for standalone collection. Verified across 32 datasets, 62 JSON files. 50 tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e 9 metrics - Add UnifiedMetricRecord with flat envelope + result JSON blob so all metrics share a single parquet-compatible JSONL schema per dataset - Add pts.metrics.loader with MetricType enum and load_metric dispatcher - Add type: Literal[...] to each Metric subclass and model_dump() override on both Specs so otter's double-parse in TaskRegistry.build() round-trips cleanly - Add CollectMetrics task (standalone metrics-only pipeline step) - Complete Issue 9: add count_by_datasource to all 20 evidence datasets in config.yaml and metrics.yaml; rewrite metrics.yaml as proper otter config - Remove scripts/collect_metrics.py (superseded by pts -c metrics.yaml -s <step>) - Trim 28 redundant tests: pydantic-default assertions, duplicate loader tests, and overlapping unified-output tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

CollectMetrics now handles all post-transform metric collection, so the metrics field, validator, model_dump override, and MetricRunner call in Transform are redundant. Also removes the inline metrics block from the transform disease_efo config step and deletes the now-obsolete test file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Distribution metrics are not needed in the current pipeline. Removes the distribution.py module, its MetricType/loader registration, all 52 distribution blocks from config.yaml and metrics.yaml, and the test file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- GroupedCountExplodeMetric: explodes list columns in group_by then counts per group; replaces DiseaseByTherapeuticAreaMetric custom class - CollectMetrics.__init__ now validates that 'release' and 'run' are present in the scratchpad and raises ValueError early if either is absent - metrics.yaml scratchpad requires explicit release/run values at run time - config.yaml gains a collect_metrics disease step using the new type - Removed pts.metrics.custom.disease (no longer needed) - Parametrized noisy validation and empty-result tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- test_metrics_base: 3→1 test (parametrize Metric + subclass abstractness) - test_metrics_count: 6→5 tests (parametrize row-count cases, drop isinstance) - test_metrics_grouped: 9→7 tests (merge same-data pairs for count and explode) - test_metrics_custom: 4→2 tests (parametrize L2G value cases) - test_collect_metrics: unify _make_context helpers, drop _run_task wrapper Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- D102: add missing docstrings to GroupedCountExplodeMetric.required_columns and .compute - PT006: use tuple form for pytest.mark.parametrize first arg in test_metrics_count and test_metrics_custom - I001: sort imports in test_metrics_base and test_metrics_grouped Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_record() release, run, dataset, source, destination are envelope fields that belong to the runner, not the result. Removing them from MetricResult eliminates the empty-string placeholders that compute() methods were forced to pass. MetricRunner now supplies these directly to to_unified_record() instead of using model_copy(). No empty strings can leak through. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

total_count (row count) is now always prepended automatically by CollectMetricsSpec — no need to declare it in config.yaml or metrics.yaml. Removed all 32 explicit total_count entries from both configs; datasets that only had total_count now omit the metrics key entirely. Uses a model_validator(mode='after') so the injection fires even when metrics defaults to []. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…emove L2G custom metric - Add `filter_expr: str | None` to `Metric` base class; applied via `pl.sql_expr()` in `run()` before `compute()`, with all columns loaded when set - Add `filter_expr` column to `UnifiedMetricRecord` and thread it through `to_unified_record()` - Support SQL expressions with AS aliases in `group_by` lists for all grouped metrics (e.g. `"concat('gwas-', studyType) as label"`); resolved via `_resolve_group_by()` - Remove `required_columns` from grouped metrics (always read all columns) - Delete `L2GSignificantGeneMetric` and the `custom/` directory; replace its YAML entry with a `distinct_count` + `filter_expr: "score >= 0.5"` on both config.yaml and metrics.yaml - Fix GCS path compatibility in `CollectMetrics.run()`: drop `pathlib.Path` wrapping so `gs://` URIs are not mangled; extract dataset name with string operations - Consolidate tests: move spec validation and filter_expr cases to doctests in source modules; remove obsolete `test_bad_column_raises` (tests Polars, not our logic) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

SzymonSzyszkowski and others added 27 commits May 20, 2026 12:18

feat(metrics): implement MetricRunner (Issue 7)

b1d6bf6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tasks): add CollectMetrics task for metrics-only pipeline steps

6847d3c

Allows running metric collection on already-produced datasets without re-running a transformer. Reads release/run from scratchpad. 4 TDD tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: config update

692d19d

feat(metrics): doctests

197da9d

feat(config): align fields

4b5545d

feat(metrics): align docs

ea0e5e7

feat(metrics): change class order

2af0292

feat(metrics): remove filter

2cc3fb1

feat(metrics): write with polars

61c1159

feat(metrics): simplify dispatcher

c62c002

feat(metrics): ensure write with make subfolders

288f4a2

fix(metrics): allow for spec serialization

c06fdf4

feat(metrics): threading

e67fab9

chore: bump version

44eb06d

project-defiant changed the title ~~Metrics~~ feat(metrics): implementation of metric tasks May 21, 2026

project-defiant marked this pull request as ready for review May 21, 2026 00:14

SzymonSzyszkowski and others added 4 commits May 21, 2026 10:43

fix(metrics): swap Path to str for input and output

c8ea627

feat(metrics): add expression filter

6bfbb30

chore(metrics): simplify expressions

80e5147

project-defiant changed the title ~~feat(metrics): implementation of metric tasks~~ WIP: feat(metrics): implementation of metric tasks May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: feat(metrics): implementation of metric tasks#134

WIP: feat(metrics): implementation of metric tasks#134
project-defiant wants to merge 32 commits into
mainfrom
metrics

project-defiant commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

project-defiant commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

What's new

CollectMetrics — otter task

Config

Testing

Notable design decisions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

project-defiant commented May 20, 2026 •

edited

Loading