diff --git a/code/dashboard-2/README.md b/code/dashboard-2/README.md index d7163127..662fb9e5 100644 --- a/code/dashboard-2/README.md +++ b/code/dashboard-2/README.md @@ -41,6 +41,10 @@ Connects to cluster REST APIs to display job analytics, node status, and resourc - GPU benchmark comparisons across clusters - Filterable by precision, GPU count, and test type +## Screenshots + +See [screenshots/README.md](screenshots/README.md) for captioned screenshots of every page, generated with synthetic data. + ## Architecture ```mermaid diff --git a/code/dashboard-2/screenshots/01-cluster-selection.png b/code/dashboard-2/screenshots/01-cluster-selection.png new file mode 100644 index 00000000..7abad48b Binary files /dev/null and b/code/dashboard-2/screenshots/01-cluster-selection.png differ diff --git a/code/dashboard-2/screenshots/02-cluster-overview.png b/code/dashboard-2/screenshots/02-cluster-overview.png new file mode 100644 index 00000000..f390dbf2 Binary files /dev/null and b/code/dashboard-2/screenshots/02-cluster-overview.png differ diff --git a/code/dashboard-2/screenshots/02b-cluster-overview-scroll-1.png b/code/dashboard-2/screenshots/02b-cluster-overview-scroll-1.png new file mode 100644 index 00000000..8c756a57 Binary files /dev/null and b/code/dashboard-2/screenshots/02b-cluster-overview-scroll-1.png differ diff --git a/code/dashboard-2/screenshots/02c-cluster-overview-scroll-2.png b/code/dashboard-2/screenshots/02c-cluster-overview-scroll-2.png new file mode 100644 index 00000000..d1a532be Binary files /dev/null and b/code/dashboard-2/screenshots/02c-cluster-overview-scroll-2.png differ diff --git a/code/dashboard-2/screenshots/02d-cluster-overview-scroll-3.png b/code/dashboard-2/screenshots/02d-cluster-overview-scroll-3.png new file mode 100644 index 00000000..04051b35 Binary files /dev/null and b/code/dashboard-2/screenshots/02d-cluster-overview-scroll-3.png differ diff --git a/code/dashboard-2/screenshots/02e-cluster-overview-scroll-4.png b/code/dashboard-2/screenshots/02e-cluster-overview-scroll-4.png new file mode 100644 index 00000000..97d0487e Binary files /dev/null and b/code/dashboard-2/screenshots/02e-cluster-overview-scroll-4.png differ diff --git a/code/dashboard-2/screenshots/03-nodes.png b/code/dashboard-2/screenshots/03-nodes.png new file mode 100644 index 00000000..d1356cd5 Binary files /dev/null and b/code/dashboard-2/screenshots/03-nodes.png differ diff --git a/code/dashboard-2/screenshots/04-partitions.png b/code/dashboard-2/screenshots/04-partitions.png new file mode 100644 index 00000000..81cf2157 Binary files /dev/null and b/code/dashboard-2/screenshots/04-partitions.png differ diff --git a/code/dashboard-2/screenshots/05-jobs.png b/code/dashboard-2/screenshots/05-jobs.png new file mode 100644 index 00000000..97d5157a Binary files /dev/null and b/code/dashboard-2/screenshots/05-jobs.png differ diff --git a/code/dashboard-2/screenshots/06a-job-details-overview.png b/code/dashboard-2/screenshots/06a-job-details-overview.png new file mode 100644 index 00000000..78dd0c7c Binary files /dev/null and b/code/dashboard-2/screenshots/06a-job-details-overview.png differ diff --git a/code/dashboard-2/screenshots/06b-job-details-performance.png b/code/dashboard-2/screenshots/06b-job-details-performance.png new file mode 100644 index 00000000..c796fcf9 Binary files /dev/null and b/code/dashboard-2/screenshots/06b-job-details-performance.png differ diff --git a/code/dashboard-2/screenshots/06c-job-details-timeline.png b/code/dashboard-2/screenshots/06c-job-details-timeline.png new file mode 100644 index 00000000..b542cdce Binary files /dev/null and b/code/dashboard-2/screenshots/06c-job-details-timeline.png differ diff --git a/code/dashboard-2/screenshots/06d-job-details-gpu.png b/code/dashboard-2/screenshots/06d-job-details-gpu.png new file mode 100644 index 00000000..59cc1198 Binary files /dev/null and b/code/dashboard-2/screenshots/06d-job-details-gpu.png differ diff --git a/code/dashboard-2/screenshots/06e-job-details-process-tree.png b/code/dashboard-2/screenshots/06e-job-details-process-tree.png new file mode 100644 index 00000000..2f3b7340 Binary files /dev/null and b/code/dashboard-2/screenshots/06e-job-details-process-tree.png differ diff --git a/code/dashboard-2/screenshots/07a-job-query-form.png b/code/dashboard-2/screenshots/07a-job-query-form.png new file mode 100644 index 00000000..ade7cd45 Binary files /dev/null and b/code/dashboard-2/screenshots/07a-job-query-form.png differ diff --git a/code/dashboard-2/screenshots/07b-job-query-results.png b/code/dashboard-2/screenshots/07b-job-query-results.png new file mode 100644 index 00000000..2d6ac5b0 Binary files /dev/null and b/code/dashboard-2/screenshots/07b-job-query-results.png differ diff --git a/code/dashboard-2/screenshots/08-benchmarks.png b/code/dashboard-2/screenshots/08-benchmarks.png new file mode 100644 index 00000000..d0f3ecfb Binary files /dev/null and b/code/dashboard-2/screenshots/08-benchmarks.png differ diff --git a/code/dashboard-2/screenshots/README.md b/code/dashboard-2/screenshots/README.md new file mode 100644 index 00000000..b47295cd --- /dev/null +++ b/code/dashboard-2/screenshots/README.md @@ -0,0 +1,97 @@ +# Dashboard Screenshots + +Screenshots of the NAIC Jobanalyzer Dashboard running with synthetic demo data. + +## Cluster Selection + +![Cluster Selection](01-cluster-selection.png) + +Browse and select from available HPC clusters. Each cluster can have independent authentication via OIDC/PKCE. + +## Cluster Overview + +![Cluster Overview](02-cluster-overview.png) + +Top-level cluster health: reporting nodes, total jobs, running/pending counts, and resource summaries. + +![Cluster Overview - Resource Charts](02b-cluster-overview-scroll-1.png) + +CPU and memory utilization timeseries across all nodes with interactive time range selection. + +![Cluster Overview - GPU and Job Trends](02c-cluster-overview-scroll-2.png) + +GPU utilization heatmap and job submission/completion trends over time. + +![Cluster Overview - Queue and Disk I/O](02d-cluster-overview-scroll-3.png) + +Queue wait time analysis by partition and disk I/O metrics. + +![Cluster Overview - Bottom](02e-cluster-overview-scroll-4.png) + +Additional cluster-wide metrics and resource distribution charts. + +## Nodes + +![Nodes](03-nodes.png) + +Full node inventory with sortable/filterable AG Grid table showing hostname, state, CPU/memory/GPU specs, and current utilization. + +## Partitions + +![Partitions](04-partitions.png) + +Split-panel partition browser with queue overview, node allocation, and GPU availability per partition. + +## Jobs + +![Jobs](05-jobs.png) + +Live jobs table with status badges, resource allocation, elapsed time, and quick navigation to job details. + +## Job Details + +### Overview Tab + +![Job Details - Overview](06a-job-details-overview.png) + +Job summary with SAcct data, resource allocation breakdown, and efficiency metrics. + +### Performance Metrics Tab + +![Job Details - Performance Metrics](06b-job-details-performance.png) + +CPU and memory efficiency gauges comparing allocated vs. used resources. + +### Resource Timeline Tab + +![Job Details - Resource Timeline](06c-job-details-timeline.png) + +Time-series charts for CPU utilization, memory usage, memory utilization percentage, and process count over the job's lifetime. + +### GPU Performance Tab + +![Job Details - GPU Performance](06d-job-details-gpu.png) + +Per-GPU utilization, memory usage, and performance metrics for jobs using GPU resources. + +### Process Tree Tab + +![Job Details - Process Tree](06e-job-details-process-tree.png) + +Interactive process tree visualization showing the hierarchy from SLURM step daemon through to individual training workers and data loaders. + +## Job Query + +![Job Query - Form](07a-job-query-form.png) + +Advanced job search form with filters for user, account, partition, job state, time range, and more. + +![Job Query - Results](07b-job-query-results.png) + +Query results displayed in a sortable AG Grid table with pagination. + +## Benchmarks + +![Benchmarks](08-benchmarks.png) + +GPU benchmark comparisons (A100 vs H100) across ML workloads, filterable by precision and GPU count.