Observability Platform

Full-stack observability and incident response platform built around metrics, logs, traces, alerting, runbooks, and chaos validation. The repo demonstrates how to move from passive monitoring to an active reliability practice with clearer signals and faster response workflows.

Why this repo matters

A lot of monitoring demos stop at dashboards. This project goes further by tying together the three pillars of observability, SLI/SLO thinking, alert routing, runbooks, and fault injection.

What is included

instrumented application source under app/
Prometheus, Alertmanager, Grafana, ELK, and OpenTelemetry deployment assets
Grafana dashboards and alerting rules
chaos scripts for latency and error scenarios
runbooks and postmortem templates for operational response
local Docker assets plus Kubernetes manifests

Observability scope

Metrics: Prometheus and Grafana for collection, dashboards, and alerting
Logs: ELK stack for centralized log aggregation and search
Traces: OpenTelemetry and Jaeger-style tracing pipeline
Response: Alertmanager, runbooks, and postmortem templates
Validation: chaos scripts to test the system under failure conditions

Quick start

# Deploy the stack
kubectl apply -f k8s/app/
kubectl apply -f k8s/prometheus/
kubectl apply -f k8s/grafana/
kubectl apply -f k8s/elk/
kubectl apply -f k8s/otel-collector/

# Inject faults and observe behavior
./chaos-scripts/chaos-runner.sh latency
./chaos-scripts/chaos-runner.sh errors
./chaos-scripts/chaos-runner.sh reset

Repository layout

.
|-- app/                  # instrumented application
|-- chaos-scripts/        # fault injection scripts
|-- dashboards/           # Grafana dashboard assets
|-- docker/               # local container assets
|-- k8s/                  # Kubernetes deployment manifests
|-- postmortem-templates/ # incident review templates
|-- runbooks/             # operational runbooks
|-- docs/                 # diagrams and supporting docs
`-- .github/              # validation workflows

What this demonstrates

end-to-end observability design across metrics, logs, and traces
operational maturity through alerting, runbooks, and postmortems
SLI/SLO-oriented monitoring instead of dashboard sprawl
validation of reliability assumptions through controlled chaos testing

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
app/src		app/src
chaos-scripts		chaos-scripts
dashboards/grafana		dashboards/grafana
docker		docker
docs		docs
k8s		k8s
postmortem-templates		postmortem-templates
runbooks		runbooks
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Observability Platform

Why this repo matters

What is included

Observability scope

Quick start

Repository layout

What this demonstrates

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Observability Platform

Why this repo matters

What is included

Observability scope

Quick start

Repository layout

What this demonstrates

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages