An autonomous AI-powered DevOps operating system that orchestrates CI/CD, PR reviews, site reliability, architecture decisions, and incident management—without waking up human engineers at 3 AM.
Traditional DevOps is broken. Engineers burn out from alert fatigue, PR reviews bottleneck releases, and infrastructure waste goes unnoticed. DevOps OS changes the game by deploying a swarm of specialized AI agents that work 24/7 with the expertise of a 50-year veteran engineer.
Built with cutting-edge agentic AI architecture using LangGraph for orchestration and CrewAI for specialist execution, powered by Claude 3.5 Sonnet and GPT-4o.
graph TD
A[Webhook Events] --> B[FastAPI Gateway]
B --> C[LangGraph Orchestrator]
C --> D[CodeReview Agent]
C --> E[CI Monitor Agent]
C --> F[Infra Optimizer Agent]
C --> G[Incident Responder Agent]
C --> H[Documentation Agent]
C --> I[Security Audit Agent]
D --> J[GitHub PR Comments]
E --> K[Auto-Fix Proposals]
F --> L[AWS Cost Optimization]
G --> M[PagerDuty Auto-Resolve]
H --> N[Auto-Generated Docs]
I --> O[Security Reports]
| Component | Technology | Purpose |
|---|---|---|
| Orchestrator | LangGraph | Intelligent event routing and priority mapping |
| Specialist Agents | CrewAI | Domain-specific DevOps expertise |
| API Layer | FastAPI | High-performance webhook ingestion |
| LLM Backend | Claude 3.5 + GPT-4o | Reasoning and code analysis |
| Audit Trail | PostgreSQL | Complete system observability |
- Mission: Meticulous PR analysis with security-first mindset
- Capabilities:
- SOLID principles verification
- Security vulnerability detection
- Code quality scoring
- Auto-comments on GitHub PRs
- Mission: Diagnose pipeline failures with surgical precision
- Capabilities:
- Build log parsing and root cause analysis
- Auto-fix PR generation
- Flaky test detection
- Performance regression alerts
- Mission: Eliminate infrastructure waste across multi-cloud
- Capabilities:
- EC2 right-sizing recommendations
- Unused resource detection
- Terraform IaC optimization proposals
- Cost anomaly alerts
- Mission: P1 incident response in <5 minutes
- Capabilities:
- PagerDuty auto-acknowledgement
- Runbook execution
- Auto-mitigation actions
- Human escalation when needed
- Mission: Keep docs in sync with code
- Capabilities:
- Auto-generate API docs post-merge
- README updates
- Architecture diagram maintenance
- Mission: Continuous security compliance
- Capabilities:
- SAST/SCA scanning integration
- CVE detection in dependencies
- OWASP compliance checks
- Secret detection in code
- Python 3.10+
- API keys for OpenAI and Anthropic
- Optional: GitHub, AWS, PagerDuty credentials for full functionality
# Clone the repository
git clone https://github.com/daniellopez882/DevOps-AI-Engineer-Agent.git
cd DevOps-AI-Engineer-Agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: .\venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Copy environment template
cp .env.example .env
# Edit .env with your API keys
# Required: OPENAI_API_KEY, ANTHROPIC_API_KEY
# Optional: GITHUB_TOKEN, AWS credentials, PAGERDUTY_API_KEY# Start the FastAPI server
uvicorn main:app --reload --host 0.0.0.0 --port 8000📡 API Docs: Visit http://localhost:8000/docs for interactive Swagger UI
Fire a test event through the webhook:
curl -X POST http://localhost:8000/webhook \
-H "Content-Type: application/json" \
-d '{
"trigger_type": "pr_opened",
"repo": "your-org/your-repo",
"branch": "main",
"payload": {"pr_number": 42}
}'Watch the terminal for real-time agent execution logs.
| Trigger Type | Activated Agent | Priority | Response Time |
|---|---|---|---|
pr_opened |
CodeReviewAgent | P3 | ~30 seconds |
ci_failed |
CIMonitorAgent | P2 | ~15 seconds |
scheduled_infra |
InfraOptimizerAgent | P4 | Weekly |
pagerduty |
IncidentResponder | P1 | <5 minutes |
pr_merged |
DocumentationAgent | P3 | Post-merge |
scheduled_security |
SecurityAuditAgent | P4 | Weekly |
| Layer | Technology |
|---|---|
| Orchestration | LangGraph, CrewAI |
| API Framework | FastAPI, Uvicorn |
| LLM Providers | Anthropic Claude 3.5, OpenAI GPT-4o |
| Integrations | GitHub API, AWS SDK, PagerDuty API |
| Database | PostgreSQL (audit trails) |
| Deployment | Docker, Docker Compose |
DevOps-AI-Engineer-Agent/
├── main.py # FastAPI entry point
├── agent_graph.py # LangGraph state machine
├── crew_agents.py # CrewAI agent definitions
├── tools.py # Integration tools (GitHub, AWS, PagerDuty)
├── tasks.py # Agent task specifications
├── devops_agent_prompts.py # Prompt engineering templates
├── llm_config.py # LLM configuration
├── database.py # PostgreSQL audit logging
├── .env.example # Environment template
├── docker-compose.yml # Container orchestration
└── requirements.txt # Python dependencies
- No credentials stored in code — All secrets via environment variables
- Audit trail logging — Every agent action tracked in PostgreSQL
- Human-in-the-loop — Critical actions require approval
- Rate limiting — API protection against abuse
# Build and run with Docker Compose
docker-compose up -d
# View logs
docker-compose logs -f| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI API credential | ✅ |
ANTHROPIC_API_KEY |
Anthropic API credential | ✅ |
GITHUB_TOKEN |
GitHub API token | ❌ |
AWS_ACCESS_KEY_ID |
AWS credential | ❌ |
AWS_SECRET_ACCESS_KEY |
AWS credential | ❌ |
PAGERDUTY_API_KEY |
PagerDuty credential | ❌ |
DATABASE_URL |
PostgreSQL connection string | ✅ |
- Slack Integration — Real-time agent notifications
- Jira Sync — Auto-create tickets for actionable items
- Multi-Cloud Support — Azure, GCP cost optimization
- Custom Runbooks — User-defined incident response playbooks
- Analytics Dashboard — Grafana integration for agent metrics
- Learning Loop — Agent performance feedback and improvement
Contributions are welcome! This is an open-source project built with passion by the DevOps community.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License — see the LICENSE file for details.
Daniel Lopez
Expert AI Agent Engineer
Built with ❤️ using agentic AI architecture
- LangChain for LangGraph
- CrewAI for multi-agent orchestration
- FastAPI for the blazing-fast API framework
- Anthropic and OpenAI for LLM capabilities
🚀 DevOps OS — Where AI meets Infrastructure