Skip to content

Add view-scoped additional metrics and console reporting#50

Merged
hariharan-devarajan merged 1 commit intollnl:developfrom
izzet:feature/console-additional-metrics-output
Mar 5, 2026
Merged

Add view-scoped additional metrics and console reporting#50
hariharan-devarajan merged 1 commit intollnl:developfrom
izzet:feature/console-additional-metrics-output

Conversation

@izzet
Copy link
Collaborator

@izzet izzet commented Mar 5, 2026

This pull request introduces support for handling and displaying additional metrics in the analyzer output, with changes spanning configuration, processing, and presentation. The main improvements include refactoring the configuration for additional metrics to support per-view-type metrics, updating the analysis pipeline to process these metrics correctly, and adding a new summary table for additional metrics in the output.

                                                             Time Period Summary                                                              
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric                                                                                  ┃ Unit                ┃                      Value ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Job Time                                                                                │ seconds             │                   1060.279 │
│ Total Count                                                                             │ count               │                    142,462 │
│ Total Files                                                                             │ count               │                         29 │
│ Total Nodes                                                                             │ count               │                          1 │
│ Total Processes                                                                         │ count               │                         12 │
│ Training Count                                                                          │ count               │                          2 │
│ Checkpoint Count                                                                        │ count               │                          4 │
│ Data Loader Count                                                                       │ count               │                     70,270 │
│ Data Loader Fork Count                                                                  │ count               │                         20 │
│ Reader Count                                                                            │ count               │                     10,763 │
│ POSIX - All Count                                                                       │ count               │                      7,647 │
│ POSIX - All Size                                                                        │ MB                  │                 118004.580 │
│ POSIX - All Bandwidth                                                                   │ MB/s                │                   4737.244 │
│ POSIX - All Avg Transfer Size                                                           │ MB                  │                     15.431 │
│ POSIX - Reader Count                                                                    │ count               │                        183 │
│ POSIX - Reader Size                                                                     │ MB                  │                     24.349 │
│ POSIX - Reader Bandwidth                                                                │ MB/s                │                    483.476 │
│ POSIX - Reader Avg Transfer Size                                                        │ MB                  │                      0.133 │
│ POSIX - Checkpoint Count                                                                │ count               │                      7,354 │
│ POSIX - Checkpoint Size                                                                 │ MB                  │                 117976.232 │
│ POSIX - Checkpoint Bandwidth                                                            │ MB/s                │                   4769.542 │
│ POSIX - Checkpoint Avg Transfer Size                                                    │ MB                  │                     16.042 │
└─────────────────────────────────────────────────────────────────────────────────────────┴─────────────────────┴────────────────────────────┘
                                                        Time Period Additional Metrics                                                        
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Metric                               ┃ Unit          ┃                Non-null ┃              Min ┃               Mean ┃               Max ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ image_bw_mbps                        │ MB/s          │                     970 │            0.286 │             31.461 │            43.843 │
└──────────────────────────────────────┴───────────────┴─────────────────────────┴──────────────────┴────────────────────┴───────────────────┘

Configuration and Data Model Updates

  • Refactored the AnalyzerPresetConfig dataclass to allow specifying additional metrics as a dictionary mapping view types to their respective metrics, enabling more granular control over which metrics are tracked for each view type.
  • Updated the AnalyzerResultType dataclass to include an additional_metrics field, which stores the list of additional metrics for each view type.

Analysis Pipeline Changes

  • Modified the _analyze_hlm and _process_flat_view methods in analyzer.py to correctly pass and process additional metrics per view type, ensuring metrics are set and evaluated based on the view type rather than globally. [1] [2]

Output and Presentation Enhancements

  • Added the _create_additional_metrics_table method to output.py, which generates a summary table for additional metrics including unit conversion, non-null counts, and basic statistics (min, mean, max) for each metric. The table is conditionally displayed if additional metrics are present for the current view. [1] [2]
  • Updated imports in output.py to support new unit handling and calculations for additional metrics.

@izzet izzet self-assigned this Mar 5, 2026
@izzet izzet added the enhancement New feature or request label Mar 5, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Member

@hariharan-devarajan hariharan-devarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@hariharan-devarajan hariharan-devarajan merged commit dc8c89f into llnl:develop Mar 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants