-
Notifications
You must be signed in to change notification settings - Fork 0
Home
The central nervous system of the Alex-CloudOps observability ecosystem — aggregating telemetry, uptime, incident, and log intelligence data from across four production-grade repositories into a single unified health dashboard with Power BI ready exports.
Every serious CloudOps operation needs a single pane of glass — one place where the health of the entire infrastructure is visible, scored, and actionable.
observability-dashboard is that place.
It pulls live data from all four portfolio repositories, transforms and normalizes each data source, calculates component and ecosystem-wide health scores, and exports the results in three formats — including a structured Power BI dataset ready for executive-level visualization.
This is where the entire portfolio converges. This is the view a CloudOps engineer, SRE, or NOC manager actually needs.
| Page | Description |
|---|---|
| Architecture | System design, data flow, and component breakdown |
| Setup & Configuration | Environment setup and configuration reference |
| Usage Guide | Running the dashboard and interpreting outputs |
| Runbook | NOC-style procedures for ecosystem health response |
| Troubleshooting | Common issues, error messages, and fixes |
| Roadmap | Planned features including full Power BI integration |
| Property | Detail |
|---|---|
| Language | Python 3.x |
| Data Sources | 4 portfolio repositories |
| Health Levels | HEALTHY, DEGRADED, CRITICAL |
| Export Formats | JSON summary, CSV, Power BI dataset |
| Cloud Provider | AWS (Free Tier compatible) |
| Repository | Data Type | Metrics |
|---|---|---|
cloud-telemetry-agent |
Infrastructure telemetry | CPU, memory, disk |
synthetic-uptime-monitor |
Endpoint uptime | Availability, response times |
incident-alert-pipeline |
Incident records | Severity counts, open/closed |
log-intelligence-engine |
Log intelligence | Error rates, health status |
cloud-telemetry-agent ──┐
synthetic-uptime-monitor ──┤
incident-alert-pipeline ──┤──▶ aggregator ──▶ transformer ──▶ summary ──▶ exporter
log-intelligence-engine ──┘ │
▼
dashboard_summary.json
dashboard_export.csv
powerbi_dataset.json
============================================================
🚨 ECOSYSTEM HEALTH: CRITICAL
============================================================
📊 Component Health:
Telemetry : ✅ HEALTHY
Uptime : ⚠️ DEGRADED
Log Intel : ✅ HEALTHY
📈 Key Metrics:
CPU : 8.5%
Memory : 87.0%
Disk : 17.9%
Uptime : 80.0%
Avg Response : 353.9ms
Log Errors : 0.17%
🚨 Incidents:
Open : 5
Critical : 3
High : 1
Medium : 1
Total : 5
============================================================
Individual monitoring tools tell you what's wrong with one thing. A unified dashboard tells you what's wrong with everything — and gives you the context to understand why.
- A CPU spike means more when you can see it alongside an open CRITICAL incident
- A degraded uptime score means more when log error rates are climbing simultaneously
- An executive doesn't need raw metrics — they need a health score and a trend
observability-dashboard provides all three.
The crown jewel of the Alex-CloudOps observability portfolio — where four production-grade repositories converge into a single pane of glass.