Skip to content

mativallej/incident-ops

Repository files navigation

IncidentOps — AI-Powered Incident Analysis Platform

X (formerly Twitter) Follow GitHub top language License Forks Stars Watchers

Transform incident investigation from hours to seconds with AI-powered analysis, automatic pattern detection, and a growing knowledge base of resolutions.

Introduction

IncidentOps is an AI-powered incident analysis platform that helps QA and engineering teams answer "Why did this fail?" in seconds instead of hours.

Three ways to integrate:

  • SDK – Drop-in error reporting for Node.js applications
  • Express Middleware – Automatic error capture with zero code changes
  • REST API – Direct integration from any language or platform

When an error occurs, IncidentOps will:

  1. Capture the error with full context (stack trace, metadata)
  2. Fingerprint and deduplicate similar errors automatically
  3. Analyze using AI to identify root cause and suggest fixes
  4. Notify your team via n8n webhooks (Discord, Slack, email)
  5. Learn from resolutions to improve future suggestions

Key Features

  • AI-Powered Analysis – Uses Claude via Vercel AI Gateway to analyze errors and suggest root causes
  • Automatic Pattern Detection – Groups similar errors using intelligent fingerprinting
  • MCP Tools Integration – Model Context Protocol server provides AI with database access for deep analysis
  • TypeScript SDK – Zero-dependency client with Express middleware support
  • Real-time Dashboard – Next.js web interface to browse incidents and patterns
  • Docker Ready – One-command setup with Docker Compose
  • n8n Notifications – Flexible webhook integration for alerts
  • Open Source – Self-hosted, no external dependencies, your data stays private

Quick Start (Docker)

The fastest way to get started is with Docker Compose.

Prerequisites

Steps

  1. Clone the repository

    git clone https://github.com/matiasvallejosdev/incident-ops.git
    cd incident-ops
  2. Configure environment variables

    cp .env.example .env

    Edit .env with your configuration:

    # Required
    API_KEY=your-secure-api-key-here
    AI_GATEWAY_API_KEY=vck_your_vercel_ai_gateway_key
    
    # Optional
    N8N_PASSWORD=your-n8n-password
    ALLOWED_ORIGINS=http://localhost:3001
  3. Start all services

    docker-compose up -d
  4. Access the applications

  5. Report your first incident

    curl -X POST http://localhost:3000/api/incidents \
      -H "Content-Type: application/json" \
      -H "X-API-Key: your-secure-api-key-here" \
      -d '{
        "service": "my-service",
        "environment": "production",
        "severity": "high",
        "error": {
          "type": "DatabaseError",
          "message": "Connection refused to primary database"
        }
      }'

Managing Docker Services

# Stop all services
docker-compose down

# View logs
docker-compose logs -f

# View logs for specific service
docker-compose logs -f api
docker-compose logs -f dashboard

# Restart services
docker-compose restart

# Rebuild after code changes
docker-compose up --build -d

# Stop and remove everything (including data)
docker-compose down -v

Local Development

For development without Docker:

Prerequisites

  • Node.js 20+
  • pnpm 9+
  • PostgreSQL 16

Steps

  1. Install dependencies

    pnpm install
  2. Configure environment

    cp .env.example .env
    # Edit .env with your database URL and API keys
  3. Setup database

    pnpm db:push
  4. Start development servers

    pnpm dev
  5. Run tests

    pnpm test

SDK Usage

Install the SDK in your Node.js application:

npm install @incidentops/sdk

Basic Usage

import { IncidentOps } from '@incidentops/sdk';

const incidents = new IncidentOps({
  apiUrl: 'http://localhost:3000',
  apiKey: 'your-api-key',
  service: 'my-service',
  environment: 'production',
});

// Report an error
try {
  await riskyOperation();
} catch (error) {
  await incidents.report(error, {
    severity: 'high',
    context: { userId: '123', operation: 'checkout' }
  });
}

Express Middleware

import express from 'express';
import { IncidentOps } from '@incidentops/sdk';
import { errorMiddleware } from '@incidentops/sdk/express';

const app = express();
const incidents = new IncidentOps({
  apiUrl: process.env.INCIDENTOPS_URL,
  apiKey: process.env.INCIDENTOPS_KEY,
  service: 'api-server',
});

// Your routes here...

// Add error middleware (must be last)
app.use(errorMiddleware(incidents, {
  severity: 'high',
  excludePaths: ['/health'],
  includeHeaders: false,
}));

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         Client Application                          │
│                    (SDK / REST API / Middleware)                    │
└─────────────────────────────────────────────────────────────────────┘
                                   │
                                   ▼
┌─────────────────────────────────────────────────────────────────────┐
│                          Express API                                │
│                         (Port 3000)                                 │
├─────────────────────────────────────────────────────────────────────┤
│  • Incident ingestion & validation                                  │
│  • Fingerprint generation                                           │
│  • Pattern detection & grouping                                     │
│  • AI analysis orchestration                                        │
└─────────────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
    ┌────────────┐      ┌─────────────┐      ┌─────────────┐
    │ PostgreSQL │      │  MCP Server │      │     n8n     │
    │   (5432)   │      │   (stdio)   │      │   (5678)    │
    └────────────┘      └─────────────┘      └─────────────┘
           │                    │                    │
           │                    ▼                    ▼
           │            ┌─────────────┐      ┌─────────────┐
           │            │ Vercel AI   │      │  Discord/   │
           │            │   Gateway   │      │   Slack     │
           │            └─────────────┘      └─────────────┘
           │
           ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Next.js Dashboard                            │
│                          (Port 3001)                                │
├─────────────────────────────────────────────────────────────────────┤
│  • Incident list & details                                          │
│  • Pattern visualization                                            │
│  • AI analysis results                                              │
└─────────────────────────────────────────────────────────────────────┘

Processing Pipeline

  1. Incident Received → Validate with Zod schema
  2. Fingerprint Generated → Normalize message, hash with SHA-256
  3. Pattern Matched → Find or create pattern, increment count
  4. Incident Stored → Save to PostgreSQL with Prisma
  5. Notification Sent → POST to n8n webhook (async)
  6. AI Analysis Triggered → MCP tools query database, Claude analyzes (async)

Tech Stack

Component Technology Purpose
Runtime Node.js 20 Server runtime
Language TypeScript 5 Type safety
API Express 4 HTTP server
Database PostgreSQL 16 Data storage
ORM Prisma 6 Database access
AI Vercel AI SDK Claude integration
MCP @modelcontextprotocol/sdk AI tool protocol
Frontend Next.js 14 React dashboard
Styling Tailwind CSS 3 UI styling
Monorepo Turborepo + pnpm Build system
Notifications n8n Workflow automation

Project Structure

incident-ops/
├── apps/
│   ├── api/                 # Express backend API
│   │   ├── src/
│   │   │   ├── routes/      # API endpoints
│   │   │   ├── services/    # Business logic
│   │   │   ├── agent/       # AI analysis
│   │   │   └── middleware/  # Auth, errors
│   │   └── prisma/          # Database schema
│   │
│   ├── mcp-server/          # MCP tools for AI
│   │   └── src/tools/       # findSimilar, analyzeError, etc.
│   │
│   ├── web/                 # Next.js dashboard
│   │   └── src/app/         # Pages and components
│   │
│   └── test-sdk/            # SDK integration tests
│
├── packages/
│   └── sdk/                 # @incidentops/sdk
│       ├── client.ts        # Main SDK client
│       └── express.ts       # Express middleware
│
├── bruno/                   # API testing collection
├── docs/                    # Architecture documentation
└── docker-compose.yml       # Full stack setup

API Reference

Incidents

Method Endpoint Description
POST /api/incidents Create new incident
GET /api/incidents List incidents (with filters)
GET /api/incidents/:id Get incident details
POST /api/incidents/:id/resolve Mark as resolved

Patterns

Method Endpoint Description
GET /api/patterns List all patterns
GET /api/patterns/:id Get pattern details

Health

Method Endpoint Description
GET /api/health Health check (no auth)

All endpoints except /api/health require the X-API-Key header.

MCP Tools

The MCP server provides AI with these tools for analysis:

Tool Purpose
findSimilar Find similar past incidents
getPatternHistory Get pattern occurrence history
getServiceContext Get service-specific context
analyzeError Parse error type and category
assignIncident Assign to team member
getResolutionHistory Get past resolutions

Environment Variables

Variable Required Description
DATABASE_URL Yes PostgreSQL connection string
API_KEY Yes API authentication key
AI_GATEWAY_URL Yes Vercel AI Gateway URL
AI_GATEWAY_API_KEY Yes Vercel AI Gateway key
N8N_WEBHOOK_URL No n8n webhook for notifications
ALLOWED_ORIGINS No CORS allowed origins
PORT No API port (default: 3000)

Contributing

Contributions are welcome! To contribute:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes:
    • API: Edit files in apps/api/
    • Dashboard: Edit files in apps/web/
    • SDK: Edit files in packages/sdk/
    • MCP Tools: Edit files in apps/mcp-server/
  4. Run tests: pnpm test
  5. Run build: pnpm build
  6. Commit and push
  7. Open a Pull Request

Guidelines:

  • Include tests for new features
  • Update documentation if needed
  • Follow existing code style

Contact

License

This project is open source and available under the MIT License.


"The best incident is the one you prevent. The second best is the one you resolve in seconds."

About

AI-powered incident analysis platform. Transform "Why did this fail?" from hours to seconds with automatic pattern detection, root cause analysis, and a growing knowledge base.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors