Skip to content

0xpaulx/opensre

 
 

github-readme-tracer-banner

Open SRE — Build Your Own AI SRE Agents

The open-source framework that AI SRE agents are built on. Connect the tools you already run, define your own workflows, and let agents handle incident investigation and root cause analysis - your way, on your infrastructure.

Stars License CI Open Source

Slack · Getting Started · Tracer Agent · Docs · FAQ · Security


Quick Start

git clone https://github.com/Tracer-Cloud/open-sre-agent
cd open-sre-agent
make install
make install-hooks
cp .env.example .env
# run opensre onboard to configure your local LLM provider
# and optionally validate/save Grafana, Datadog, Slack, AWS, GitHub MCP, and Sentry integrations
opensre onboard
make local-grafana-live

Choose a path:

  • 🏃 Local - Run Tracer locally with a live Grafana environment, no cloud infra needed
  • ☁️ Self-hosted - Deploy to your own infrastructure for continuous monitoring
  • 🔌 LangGraph / LlamaIndex - Use Tracer as an agent in your existing AI stack (see Agent Docs)

Why Tracer?

When something breaks in production, the pressure is immediate - but the evidence is scattered. Logs in Datadog. Metrics in Grafana. Runbooks in Notion. Context in Slack threads already 200 messages deep.

Tracer is the open-source answer to that chaos. It's an AI SRE agent that correlates signals across your entire stack, reasons through root cause, and surfaces a clear diagnosis - in the time it used to take just to find the right dashboard.

Unlike closed SRE platforms, Tracer is fully open source and self-hostable. No vendor lock-in. No black-box reasoning. You own the agent, the data, and the workflow.

Whether you're an SRE triaging a P0, a platform engineer building internal tooling, a developer who just got paged, or an EM trying to reduce MTTR - Tracer works for your whole team.

Built in the open. Trusted in production.


How Tracer Works

tracer-how-it-works-illustration

Investigation Workflow

When an alert fires, Tracer automatically:

  1. Fetches the alert context and correlated logs, metrics, and traces
  2. Reasons across your connected systems to identify anomalies
  3. Generates a structured investigation report with probable root cause
  4. Suggests next steps and, optionally, executes remediation actions
  5. Posts a summary directly to Slack or PagerDuty - no context switching needed

Capabilities

🔍 Structured incident investigation Correlated root-cause analysis across all your signals
📋 Runbook-aware reasoning Tracer reads your runbooks and applies them automatically
🔮 Predictive failure detection Catch emerging issues before they page you
🔗 Evidence-backed root cause Every conclusion is linked to the data behind it
🤖 Full LLM flexibility Bring your own model - OpenAI, Anthropic, and more

Integrations

Tracer integrates with the systems that power modern data platforms.

Category Integrations
Data Platform Apache Airflow · Apache Kafka · Apache Spark
Observability Grafana · Datadog · CloudWatch · Sentry
Infrastructure Kubernetes · AWS · GCP · Azure
Dev Tools GitHub
Communication Slack · PagerDuty

Design Principles

We've tried to be intentional about how Tracer is built, not just what it does.

  • Real-world testing over mocks - we're big fans of end-to-end testing against real environments, whether that's a local observability stack (Grafana, Prometheus) or actual cloud infrastructure. If it doesn't work in the real world, it doesn't count.
  • Show your work - every conclusion Tracer reaches should be traceable back to the signals that led there. No black boxes.
  • Bring your own everything - your LLM, your tools, your runbooks. Tracer fits around your stack, not the other way around.
  • Open by default - the code is yours to read, fork, and improve. We'd rather have a smaller, more trusted tool than a bigger, opaque one.

Contributing

Tracer is community-built. Every integration, improvement, and bug fix makes it better for thousands of engineers. We actively review PRs and welcome contributors of all experience levels.

Good first issues are labeled good first issue. Ways to contribute:

  • 🐛 Report bugs or missing edge cases
  • 🔌 Add a new tool integration
  • 📖 Improve documentation or runbook examples
  • ⭐ Star the repo - it helps other engineers find Tracer

See CONTRIBUTING.md for the full guide.

Thanks goes to these amazing people:

jellithorpe
John Ellithorpe
ayushsinghal90
Ayush Singhal
VaibhavUpreti
Vaibhav Upreti
Maame-codes
Maame Afua A.P Fordjour
paultracer
paultracer
aliya-tracer
aliya-tracer
kylie-tracer
kylie-tracer
Gust-svg
Gust-svg
davincios
vincenthus
arnetracer
arnetracer
iamkalio
Kalio
zeel2104
Zeel Desai

Security

Tracer is designed with production environments in mind:

  • No storing of raw log data beyond the investigation session
  • All LLM calls use structured, auditable prompts
  • Log transcripts are kept locally - never sent externally by default

See SECURITY.md for responsible disclosure.


License

Apache 2.0 - see LICENSE for details.

About

Build your own AI SRE agents. The open source toolkit for the AI era ✨

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 98.7%
  • Other 1.3%