Skip to content

Optimize set_layer_metrics batching#52

Open
izzet wants to merge 3 commits intollnl:developfrom
izzet:feature/set-layer-metrics-batch
Open

Optimize set_layer_metrics batching#52
izzet wants to merge 3 commits intollnl:developfrom
izzet:feature/set-layer-metrics-batch

Conversation

@izzet
Copy link
Collaborator

@izzet izzet commented Mar 6, 2026

This pull request refactors the set_layer_metrics method in analyzer.py to improve performance and correctness when generating derived metric columns, and adds a new test suite to verify its behavior. The main changes include precomputing column types and numeric representations, building derived columns in-memory before appending, and introducing comprehensive tests for correctness and performance.

Refactor and performance improvements in metric computation

  • Precompute column type information (is_size_col, is_string_col) and numeric representations (numeric_cols) once per source column to avoid repeated computation and improve efficiency in set_layer_metrics.
  • Build all derived metric columns in-memory using a dictionary and append them to the DataFrame in a single operation, reducing fragmentation and improving performance.

Correctness and logic changes

  • Ensure that size-related derived columns are only created for metrics explicitly listed in size_derived_metrics, fixing previous logic that could produce unintended columns.
  • For string-derived columns, use None for non-matching rows so that downstream processing (e.g., unique_set_flatten) correctly skips missing values.

New tests for correctness and performance

  • Add tests/test_set_layer_metrics.py with correctness tests to verify column creation, value propagation, and missing value handling, as well as a performance smoke test to ensure efficient repeated calls.

@izzet izzet self-assigned this Mar 6, 2026
@izzet izzet added the enhancement New feature or request label Mar 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors Analyzer.set_layer_metrics to reduce repeated per-column work and mitigate DataFrame fragmentation during derived-metric column generation, and adds a dedicated test module to validate expected behavior.

Changes:

  • Precomputes column classification and numeric coercions, and evaluates each derived-metric condition once per metric.
  • Builds all derived columns in-memory and appends them to hlm in a single concat.
  • Adds tests/test_set_layer_metrics.py covering correctness plus a basic performance smoke loop.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
python/dftracer/analyzer/analyzer.py Refactors set_layer_metrics to batch derived-column creation and reduce repeated evaluation/coercion.
tests/test_set_layer_metrics.py Introduces tests for derived-column correctness and a repeat-call perf smoke test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants