This Detailed Project Report outlines the technical architecture, design principles, integration patterns, and operational features of the SentinelOps AI dashboard.
SentinelOps AI is an automated Site Reliability Engineering (SRE) command center. The system intercepts simulated infrastructure and application outages, orchestrates a collaborative multi-agent LangGraph swarm to diagnose the issues, fetches matching recovery playbooks from a localized RAG database, and automatically rolls back configuration commits to restore microservice stability.
The platform is split into two primary layers: a high-performance Python FastAPI backend and a responsive, glassmorphic React Vite frontend.
┌────────────────────────┐
│ React Vite Client │ ◄─── Persistent Vocal Uplink (Web Audio)
└─────┬────────────▲─────┘
│ HTTP │ SSE (Server-Sent Events)
▼ │
┌──────────────────┴─────┐
│ FastAPI Server │
└─────┬────────────▲─────┘
│ │
▼ │ LangGraph Streams
┌──────────────┐ │
│ SRE Agent ├─────┘
│ Swarm Plexus │ ◄─── Semantic Runbook RAG Index
└──────────────┘
- FastAPI Core: Handles incoming REST API telemetry triggers and serves SSE (Server-Sent Events) streams under the
/api/swarm/analyzeroute. - Incident Simulator: Maintains system state machine, tracking active incident types, timestamp parameters, logs, and remediation outputs.
- LangGraph Multi-Agent Swarm: Coordinates execution between four cognitive nodes:
- Monitoring Agent: Scans logs to detect thresholds.
- RCA Agent: Runs git diff reviews.
- RAG Retrieval Agent: References internal markdown databases.
- Remediator Agent: Executes rollbacks.
- Local RAG Database: Built using TF-IDF matching over a curated list of SRE manuals and historic incident reports.
- React 18 & TypeScript: Strongly typed component layout built on top of TailwindCSS.
- Dynamic Theming Engine: Uses HSL color tokens mapped to root CSS variables to adjust font families, round corners, and glow properties in real-time.
- SVG Swarm Nexus: An interactive, lightweight SVG representation of the agent swarm. Illumination triggers align with SSE state updates.
- Hands-Free Audio Activation: Leverages Web Audio AnalyserNodes to register clap volume spikes and SpeechRecognition to monitor wake phrases.
The visual aesthetics change dynamically based on the active theme, adapting more than just colors. The system morphs font families, container border-radius metrics, and background animations:
| Theme | Aesthetic Target | Highlight Hue | Primary Font Stack | Panel Corners |
|---|---|---|---|---|
| Cyber Obsidian | Default SRE console | Neon Green | Inter / JetBrains Mono | 16px (Smooth) |
| Nebula Abyss | Space flight deck | Deep Violet | Orbitron / Space Grotesk | 28px (Extended) |
| Crimson Protocol | Critical alert console | Crimson Red | Share Tech Mono | 0px (Industrial Sharp) |
| Matrix Code | Terminal retro grid | Matrix Lime | DotGothic16 | 4px (Blocky) |
| Solar Flare | Aerospace monitoring | Solar Amber | Fira Code | 8px (Compact) |
To allow remote, hands-free activation of the operations uplink, a secondary monitoring system operates continuously in the browser thread.
The system captures microphone streams and connects them to a Web Audio ScriptProcessorNode:
- Computes the average frequency volume over a 1024 sample window.
- Registers a volume peak if average amplitude spikes above 85 units.
- Debounces activation using a 1200ms window lock to prevent duplicate triggering from echoes.
A low-resource background SpeechRecognition instance runs continuously:
- Transcripts are parsed in real-time.
- If the phrase "sentinel activate", "sentinel online", or "sentinel" is detected, the uplink is triggered.
- Auto-restart behavior is implemented in the
onendcallback viaisListeningRefchecking to bypass browser-enforced silence timeouts.
The system includes simulated incident templates designed to test the LangGraph reasoning team:
- Fault: Thread pool allocation leaks database sockets.
- RCA: Identifies missing
conn.close()inside batch payment execution loops. - Mitigation: Kills PostgreSQL orphaned processes and rolls back codebase commits.
- Fault: Global dictionary collects tokens without eviction limits.
- RCA: Finds missing TTL limits on
tokenCacheglobal allocations. - Mitigation: Restarts containers and reverts token cache optimization branch.
- Fault: Gateway routes point to unregistered cluster names.
- RCA: Locates cluster registry name discrepancy.
- Mitigation: Points upstream target back to the valid service coordinate.
- Fault: Upgrade asserts environment keys without config bindings.
- RCA: Captures startup
KeyErroronos.environ['REDIS_URL']. - Mitigation: Patches Helm specifications and triggers pod rollout.
- Fault: Logger debug logs grow unchecked without active rotation.
- RCA: Locates disabled
LOG_ROTATEparameter. - Mitigation: Truncates raw log files, resets level to
INFO, and re-activates rotation.
- Zero Emojis: System logs, terminal outputs, and reports contain zero casual emojis.
- Responsive Layout: Designed for seamless multi-column viewing on desktop and laptop layouts.
- Persistent Choice: Selected theme preferences are saved to
localStorageand persist automatically. - TypeScript Integrity: Verified clean compilations with no remaining strict compiler warnings.
Developed by Arjun R.