Autonomous Incident Investigation Platform
RootPilot is an AI-powered incident investigation platform designed for cloud-native and distributed systems.
The platform leverages observability pipelines, telemetry correlation, distributed tracing, and AI investigation workflows to analyze production incidents, identify probable root causes, and generate actionable remediation insights.
Modern distributed systems generate massive amounts of:
- logs
- traces
- metrics
- deployment events
- infrastructure telemetry
During incidents, engineers spend significant time manually:
- correlating failures
- reconstructing timelines
- tracing dependencies
- validating hypotheses
- identifying root causes
RootPilot aims to automate and accelerate this investigation process using AI-assisted operational intelligence.
- AI-powered root cause analysis
- Distributed telemetry correlation
- Incident timeline reconstruction
- Event-driven investigation workflows
- Cloud-native observability integration
- Provider-agnostic architecture
- Production-grade engineering patterns
RootPilot is intentionally designed around:
Infrastructure providers are abstracted through interfaces to enable future extensibility.
Examples:
- RabbitMQ → Kafka
- OpenAI → Anthropic/Ollama
- Elasticsearch → alternative telemetry stores
Internal services communicate asynchronously through messaging infrastructure.
Initial provider:
- RabbitMQ
Future support:
- Kafka
- NATS
RootPilot itself is designed to be observable using:
- logs
- traces
- metrics
- health checks
OpenTelemetry integration is planned from early development stages.
Responsible for telemetry collection and normalization.
Responsible for contextual correlation and timeline reconstruction.
Responsible for root cause analysis and remediation generation.
Responsible for incident orchestration and lifecycle management.
Responsible for API aggregation and external access.
- Python
- FastAPI
- AsyncIO
- RabbitMQ
- OpenTelemetry
- Elasticsearch
- PostgreSQL
- Elasticsearch
- LangGraph
- OpenAI APIs
- Docker
- Kubernetes
services/
shared/
infrastructure/
docs/
scripts/
RootPilot targets Python 3.13.
The expected local version is defined in:
.python-version
Project dependencies, development dependencies, and pytest discovery are defined in:
pyproject.toml
Create a local virtual environment from the repository root:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"Run tests:
python -m pytestRun only the shared configuration tests:
python -m pytest shared\config\testsdocs/vision.mddocs/architecture.mddocs/configuration.mddocs/roadmap.mddocs/project-context.md
RootPilot maintains ADRs to document important architectural decisions and tradeoffs.
Examples:
- monorepo strategy
- messaging system selection
- infrastructure abstraction patterns
- observability standards
ADRs are located in:
docs/ADRs/
RootPilot is currently in the foundational architecture phase.
Initial development priorities:
- repository structure
- infrastructure abstractions
- event-driven communication
- telemetry ingestion
- AI investigation workflows
Potential future capabilities:
- autonomous remediation
- Kubernetes diagnostics
- deployment impact analysis
- distributed tracing intelligence
- anomaly prediction
- multi-cluster observability
- AI-native operational intelligence
RootPilot is intentionally:
- backend-heavy
- infrastructure-oriented
- AI-assisted
- production-minded
The goal is to build a realistic engineering platform rather than a simple AI demo or chatbot wrapper.
Apache 2.0