Skip to content

fix: move dashboard to templates and add Helm unittests#57

Merged
pando85 merged 7 commits intomasterfrom
fix/55-dashboard-unittests
Mar 6, 2026
Merged

fix: move dashboard to templates and add Helm unittests#57
pando85 merged 7 commits intomasterfrom
fix/55-dashboard-unittests

Conversation

@forkline-bot
Copy link

@forkline-bot forkline-bot bot commented Mar 5, 2026

Summary

  • Move dashboard.yaml from helm/dashboards/ to helm/templates/ so it renders correctly
  • Add Helm unittests to prevent future regressions with template rendering and metric references

Changes

  • Moved dashboard ConfigMap template to correct location (helm/templates/)
  • Added unittests for dashboard, prometheusrules, service, and servicemonitor templates
  • Updated CI workflow to run Helm unittests

Test Results

All 21 unittests pass.

Resolves: #55

forkline-dev[bot] added 2 commits March 5, 2026 22:42
The dashboard ConfigMap was placed in helm/dashboards/ which is not
rendered by Helm. Move it to helm/templates/ so it gets rendered
correctly when metrics.grafanaDashboard.enabled is true.

Resolves: #55
Add comprehensive unittests to ensure:
- Dashboard renders correctly when enabled
- Prometheus rules render correctly when enabled
- All expected metrics are present in templates
- ServiceMonitor and Service templates work correctly

This prevents future regressions where templates may not render
or reference non-existent metrics.

Resolves: #55
@forkline-bot forkline-bot bot mentioned this pull request Mar 5, 2026
@pando85
Copy link

pando85 commented Mar 5, 2026

@forkline we have to rework the whole dashboard too

- Add template variables for datasource and controller selection
- Reorganize panels into logical sections: Overview, Reconciliation Performance, Errors & Failures, Hetzner Cloud API
- Add new panels: Reconcile Success Rate, API Error Percentage, Operations vs Failures
- Include percentile metrics (p50, p95, p99) for reconcile duration
- Add comprehensive panel descriptions for better documentation
- Update to Grafana schema version 39
- Add appropriate dashboard tags (robotlb, operator, kubernetes, hetzner)
- Configure auto-refresh interval of 30s
- Add tests for new dashboard features
Copy link
Author

@forkline-bot forkline-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I've completely reworked the RobotLB Grafana dashboard with the following improvements:

Key Changes

1. Template Variables

  • Added datasource variable for Prometheus instance selection
  • Added controller variable for filtering by controller label
  • Made the dashboard more flexible and reusable

2. Enhanced Layout & Organization

  • Reorganized into 4 logical sections:
    • 📊 Overview - High-level metrics and status
    • ⚙️ Reconciliation Performance - Detailed reconciliation metrics
    • ❌ Errors & Failures - Error tracking and analysis
    • ☁️ Hetzner Cloud API - API interaction metrics

3. New Panels & Metrics

  • Reconcile Success Rate - Percentage of successful reconciliations with thresholds
  • API Error Percentage - API error rate as percentage of total requests
  • Operations vs Failures - Cumulative comparison over time
  • Percentile metrics - p50, p95, p99 for reconcile duration

4. Improved Visualizations

  • Better panel descriptions for documentation
  • Appropriate thresholds for alerting (e.g., <95% success rate = yellow, <99% = red)
  • Color-coded legends and series
  • Auto-refresh configured to 30s

5. Modern Grafana Features

  • Updated to schema version 39
  • Added appropriate tags (robotlb, operator, kubernetes, hetzner)
  • Improved tooltip and legend configurations

6. Comprehensive Tests

  • Added tests for template variables
  • Tests for schema version
  • Tests for all dashboard sections
  • Tests for percentile metrics
  • Tests for dashboard tags

All Changes Pushed

Commit: 62042e6 - Dashboard rework with template variables and improved layout

The dashboard now provides better visibility into operator health, reconciliation performance, and Hetzner Cloud API interactions, making it easier to monitor and troubleshoot the RobotLB operator.

@forkline-bot
Copy link
Author

forkline-bot bot commented Mar 5, 2026

CI Fix Applied

I've pushed a fix for the CI failure. The changes should trigger a new CI run.

Commit:

Waiting for CI to re-run...

The dashboard JSON is multi-line, so .* doesn't match across newlines.
Changed to [\s\S]* which matches any character including newlines.
@forkline-bot forkline-bot bot force-pushed the fix/55-dashboard-unittests branch from ad82fc7 to 4d41adf Compare March 5, 2026 23:09
@forkline-bot
Copy link
Author

forkline-bot bot commented Mar 5, 2026

CI Fix Applied

I've pushed a fix for the CI failure. The changes should trigger a new CI run.

Commit:

Waiting for CI to re-run...

@forkline-bot
Copy link
Author

forkline-bot bot commented Mar 5, 2026

CI Fix Applied

I've pushed a fix for the CI failure. The changes should trigger a new CI run.

Commit:

Waiting for CI to re-run...

@pando85
Copy link

pando85 commented Mar 6, 2026

@forkline run pre commit until it passes

Copy link
Author

@forkline-bot forkline-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-commit checks now passing

Fixed issue: Added missing newline at end of helm/templates/dashboard.yaml

Commit: bae216b - pushed to branch fix/55-dashboard-unittests

All hooks passed:

  • check-added-large-files ✓
  • check-executables-have-shebangs ✓
  • check-merge-conflict ✓
  • check-shebang-scripts-are-executable ✓
  • detect-private-key ✓
  • end-of-file-fixer ✓
  • mixed-line-ending ✓
  • trailing-whitespace ✓
  • yamllint ✓
  • cargo fmt ✓
  • cargo clippy ✓

@pando85 pando85 merged commit 09cc4dc into master Mar 6, 2026
2 checks passed
@pando85 pando85 deleted the fix/55-dashboard-unittests branch March 6, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix dashboard

1 participant