fix: move dashboard to templates and add Helm unittests#57
Conversation
The dashboard ConfigMap was placed in helm/dashboards/ which is not rendered by Helm. Move it to helm/templates/ so it gets rendered correctly when metrics.grafanaDashboard.enabled is true. Resolves: #55
Add comprehensive unittests to ensure: - Dashboard renders correctly when enabled - Prometheus rules render correctly when enabled - All expected metrics are present in templates - ServiceMonitor and Service templates work correctly This prevents future regressions where templates may not render or reference non-existent metrics. Resolves: #55
|
@forkline we have to rework the whole dashboard too |
- Add template variables for datasource and controller selection - Reorganize panels into logical sections: Overview, Reconciliation Performance, Errors & Failures, Hetzner Cloud API - Add new panels: Reconcile Success Rate, API Error Percentage, Operations vs Failures - Include percentile metrics (p50, p95, p99) for reconcile duration - Add comprehensive panel descriptions for better documentation - Update to Grafana schema version 39 - Add appropriate dashboard tags (robotlb, operator, kubernetes, hetzner) - Configure auto-refresh interval of 30s - Add tests for new dashboard features
There was a problem hiding this comment.
Summary
I've completely reworked the RobotLB Grafana dashboard with the following improvements:
Key Changes
1. Template Variables
- Added
datasourcevariable for Prometheus instance selection - Added
controllervariable for filtering by controller label - Made the dashboard more flexible and reusable
2. Enhanced Layout & Organization
- Reorganized into 4 logical sections:
- 📊 Overview - High-level metrics and status
- ⚙️ Reconciliation Performance - Detailed reconciliation metrics
- ❌ Errors & Failures - Error tracking and analysis
- ☁️ Hetzner Cloud API - API interaction metrics
3. New Panels & Metrics
- Reconcile Success Rate - Percentage of successful reconciliations with thresholds
- API Error Percentage - API error rate as percentage of total requests
- Operations vs Failures - Cumulative comparison over time
- Percentile metrics - p50, p95, p99 for reconcile duration
4. Improved Visualizations
- Better panel descriptions for documentation
- Appropriate thresholds for alerting (e.g., <95% success rate = yellow, <99% = red)
- Color-coded legends and series
- Auto-refresh configured to 30s
5. Modern Grafana Features
- Updated to schema version 39
- Added appropriate tags (robotlb, operator, kubernetes, hetzner)
- Improved tooltip and legend configurations
6. Comprehensive Tests
- Added tests for template variables
- Tests for schema version
- Tests for all dashboard sections
- Tests for percentile metrics
- Tests for dashboard tags
All Changes Pushed
Commit: 62042e6 - Dashboard rework with template variables and improved layout
The dashboard now provides better visibility into operator health, reconciliation performance, and Hetzner Cloud API interactions, making it easier to monitor and troubleshoot the RobotLB operator.
|
✅ CI Fix Applied I've pushed a fix for the CI failure. The changes should trigger a new CI run. Commit: Waiting for CI to re-run... |
The dashboard JSON is multi-line, so .* doesn't match across newlines. Changed to [\s\S]* which matches any character including newlines.
ad82fc7 to
4d41adf
Compare
|
✅ CI Fix Applied I've pushed a fix for the CI failure. The changes should trigger a new CI run. Commit: Waiting for CI to re-run... |
|
✅ CI Fix Applied I've pushed a fix for the CI failure. The changes should trigger a new CI run. Commit: Waiting for CI to re-run... |
|
@forkline run pre commit until it passes |
There was a problem hiding this comment.
✅ Pre-commit checks now passing
Fixed issue: Added missing newline at end of helm/templates/dashboard.yaml
Commit: bae216b - pushed to branch fix/55-dashboard-unittests
All hooks passed:
- check-added-large-files ✓
- check-executables-have-shebangs ✓
- check-merge-conflict ✓
- check-shebang-scripts-are-executable ✓
- detect-private-key ✓
- end-of-file-fixer ✓
- mixed-line-ending ✓
- trailing-whitespace ✓
- yamllint ✓
- cargo fmt ✓
- cargo clippy ✓
Summary
Changes
Test Results
All 21 unittests pass.
Resolves: #55