I specialize in building systems that don't just work—they scale, reason, and recover. My passion lies in the intersection of High-Performance APIs and Autonomous Observability. I thrive when designing complex backend architectures that transform raw telemetry into actionable intelligence. I do like ui as well, so you will see a very consistent ui theme I go for :)
- Distributed Systems: Designing for resilience, consistency, and low-latency.
- Observability Stack: Deep integration with the LGTM stack (Loki, Grafana, Tempo, Mimir).
- AI-Native Engineering: Building reasoning engines for automated Root Cause Analysis (RCA).
- Infrastructure as Code: Orchestrating scalable, cloud-native environments.
I am currently developing a suite of interconnected observability tools designed to eliminate the friction in modern SRE workflows.
The Control Plane. A unified platform for metrics, logs, traces, and alerts. It acts as the "Single Pane of Glass" for distributed systems, enforcing RBAC and multi-tenancy across the entire LGTM stack.
The Analyst. A Python-based reasoning engine that processes telemetry data to provide automated Root Cause Analysis (RCA), anomaly detection, and predictive forecasting.
The Messenger. An intelligent alerting and incident orchestration service. It manages the lifecycle of an alert—from the moment a threshold is crossed in Mimir to the final resolution note in Jira.
| Category | Technologies |
|---|---|
| Languages | Python (FastAPI/Flask), TypeScript/JS, C++ |
| Data & Storage | PostgreSQL, Redis, VictoriaMetrics, Mimir, Loki |
| Infrastructure | Docker, Kubernetes, OpenTelemetry (OTLP), Envoy, NGINX |
| Observability | Prometheus, Grafana, Tempo, Alertmanager |
| DevOps | CI/CD Pipelines, Vault, Keycloak (OIDC) |
- CodeMasterPro: A specialized developer tooling platform designed to streamline the local development environment and improve engineering velocity. I did explore the ralph-wiggum principal a while ago, have a look at the code
- Observability over Monitoring: Don't just watch the dashboard; Understand it
- Clean Architecture: Write code that your future self, I know I am not perfect but I want to be :) and I am continously learning to make clean and maintable code
I’m always open to discussing distributed systems, backend performance, observability or even AI stuff. Don't talk to be about Linux, I always doing that on a daily-basis at my occupation, I wouldn't say I am a Linux expert, but I definitely know a lot about it
- LinkedIn: stefan-kumarasinghe
- Watch me build: YouTube Channel
