The open-source framework that AI SRE agents are built on. Connect the tools you already run, define your own workflows, and let agents handle incident investigation and root cause analysis - your way, on your infrastructure.
Slack · Getting Started · Tracer Agent · Docs · FAQ · Security
git clone https://github.com/Tracer-Cloud/open-sre-agent
cd open-sre-agent
make install
make install-hooks
cp .env.example .env
# run opensre onboard to configure your local LLM provider
# and optionally validate/save Grafana, Datadog, Slack, AWS, GitHub MCP, and Sentry integrations
opensre onboard
make local-grafana-liveChoose a path:
- 🏃 Local - Run Tracer locally with a live Grafana environment, no cloud infra needed
- ☁️ Self-hosted - Deploy to your own infrastructure for continuous monitoring
- 🔌 LangGraph / LlamaIndex - Use Tracer as an agent in your existing AI stack (see Agent Docs)
When something breaks in production, the pressure is immediate - but the evidence is scattered. Logs in Datadog. Metrics in Grafana. Runbooks in Notion. Context in Slack threads already 200 messages deep.
Tracer is the open-source answer to that chaos. It's an AI SRE agent that correlates signals across your entire stack, reasons through root cause, and surfaces a clear diagnosis - in the time it used to take just to find the right dashboard.
Unlike closed SRE platforms, Tracer is fully open source and self-hostable. No vendor lock-in. No black-box reasoning. You own the agent, the data, and the workflow.
Whether you're an SRE triaging a P0, a platform engineer building internal tooling, a developer who just got paged, or an EM trying to reduce MTTR - Tracer works for your whole team.
Built in the open. Trusted in production.
When an alert fires, Tracer automatically:
- Fetches the alert context and correlated logs, metrics, and traces
- Reasons across your connected systems to identify anomalies
- Generates a structured investigation report with probable root cause
- Suggests next steps and, optionally, executes remediation actions
- Posts a summary directly to Slack or PagerDuty - no context switching needed
| 🔍 Structured incident investigation | Correlated root-cause analysis across all your signals |
| 📋 Runbook-aware reasoning | Tracer reads your runbooks and applies them automatically |
| 🔮 Predictive failure detection | Catch emerging issues before they page you |
| 🔗 Evidence-backed root cause | Every conclusion is linked to the data behind it |
| 🤖 Full LLM flexibility | Bring your own model - OpenAI, Anthropic, and more |
Tracer integrates with the systems that power modern data platforms.
We've tried to be intentional about how Tracer is built, not just what it does.
- Real-world testing over mocks - we're big fans of end-to-end testing against real environments, whether that's a local observability stack (Grafana, Prometheus) or actual cloud infrastructure. If it doesn't work in the real world, it doesn't count.
- Show your work - every conclusion Tracer reaches should be traceable back to the signals that led there. No black boxes.
- Bring your own everything - your LLM, your tools, your runbooks. Tracer fits around your stack, not the other way around.
- Open by default - the code is yours to read, fork, and improve. We'd rather have a smaller, more trusted tool than a bigger, opaque one.
Tracer is community-built. Every integration, improvement, and bug fix makes it better for thousands of engineers. We actively review PRs and welcome contributors of all experience levels.
Good first issues are labeled good first issue. Ways to contribute:
- 🐛 Report bugs or missing edge cases
- 🔌 Add a new tool integration
- 📖 Improve documentation or runbook examples
- ⭐ Star the repo - it helps other engineers find Tracer
See CONTRIBUTING.md for the full guide.
Thanks goes to these amazing people:
|
John Ellithorpe |
Ayush Singhal |
Vaibhav Upreti |
Maame Afua A.P Fordjour |
paultracer |
aliya-tracer |
|
kylie-tracer |
Gust-svg |
vincenthus |
arnetracer |
Kalio |
Zeel Desai |
Tracer is designed with production environments in mind:
- No storing of raw log data beyond the investigation session
- All LLM calls use structured, auditable prompts
- Log transcripts are kept locally - never sent externally by default
See SECURITY.md for responsible disclosure.
Apache 2.0 - see LICENSE for details.
