Skip to content
Alex-CloudOps edited this page Mar 9, 2026 · 2 revisions

observability-dashboard

The central nervous system of the Alex-CloudOps observability ecosystem — aggregating telemetry, uptime, incident, and log intelligence data from across four production-grade repositories into a single unified health dashboard with Power BI ready exports.


What Is This?

Every serious CloudOps operation needs a single pane of glass — one place where the health of the entire infrastructure is visible, scored, and actionable.

observability-dashboard is that place.

It pulls live data from all four portfolio repositories, transforms and normalizes each data source, calculates component and ecosystem-wide health scores, and exports the results in three formats — including a structured Power BI dataset ready for executive-level visualization.

This is where the entire portfolio converges. This is the view a CloudOps engineer, SRE, or NOC manager actually needs.


Wiki Navigation

Page Description
Architecture System design, data flow, and component breakdown
Setup & Configuration Environment setup and configuration reference
Usage Guide Running the dashboard and interpreting outputs
Runbook NOC-style procedures for ecosystem health response
Troubleshooting Common issues, error messages, and fixes
Roadmap Planned features including full Power BI integration

Quick Stats

Property Detail
Language Python 3.x
Data Sources 4 portfolio repositories
Health Levels HEALTHY, DEGRADED, CRITICAL
Export Formats JSON summary, CSV, Power BI dataset
Cloud Provider AWS (Free Tier compatible)

Ecosystem Data Sources

Repository Data Type Metrics
cloud-telemetry-agent Infrastructure telemetry CPU, memory, disk
synthetic-uptime-monitor Endpoint uptime Availability, response times
incident-alert-pipeline Incident records Severity counts, open/closed
log-intelligence-engine Log intelligence Error rates, health status

At a Glance

cloud-telemetry-agent    ──┐
synthetic-uptime-monitor ──┤
incident-alert-pipeline  ──┤──▶ aggregator ──▶ transformer ──▶ summary ──▶ exporter
log-intelligence-engine  ──┘                                                   │
                                                                                ▼
                                                               dashboard_summary.json
                                                               dashboard_export.csv
                                                               powerbi_dataset.json

Sample Dashboard Output

============================================================
  🚨 ECOSYSTEM HEALTH: CRITICAL
============================================================

  📊 Component Health:
    Telemetry    : ✅ HEALTHY
    Uptime       : ⚠️ DEGRADED
    Log Intel    : ✅ HEALTHY

  📈 Key Metrics:
    CPU          : 8.5%
    Memory       : 87.0%
    Disk         : 17.9%
    Uptime       : 80.0%
    Avg Response : 353.9ms
    Log Errors   : 0.17%

  🚨 Incidents:
    Open         : 5
    Critical     : 3
    High         : 1
    Medium       : 1
    Total        : 5
============================================================

Why a Unified Dashboard?

Individual monitoring tools tell you what's wrong with one thing. A unified dashboard tells you what's wrong with everything — and gives you the context to understand why.

  • A CPU spike means more when you can see it alongside an open CRITICAL incident
  • A degraded uptime score means more when log error rates are climbing simultaneously
  • An executive doesn't need raw metrics — they need a health score and a trend

observability-dashboard provides all three.


The crown jewel of the Alex-CloudOps observability portfolio — where four production-grade repositories converge into a single pane of glass.

Clone this wiki locally