Skip to content

prabhakaran-jm/mergemind

Repository files navigation

MergeMind: AI-Powered Software Development Intelligence Platform

License Platform Infrastructure Environment

AI-powered software development intelligence platform that transforms GitLab merge request data into actionable insights for engineering teams. Built for the Fivetran Challenge by a solo developer, it demonstrates a complete end-to-end solution combining custom data ingestion, cloud-native architecture, and advanced AI capabilities.

πŸ† Hackathon Submission Highlights

This project showcases a production-ready solution that exceeds all challenge requirements:

βœ… Custom Fivetran Connector

  • Production-grade GitLab API connector with incremental sync capabilities
  • Dynamic project discovery and automated data pipeline
  • Event-driven architecture with Cloud Function triggers
  • Comprehensive error handling and monitoring

βœ… Google Cloud Integration

  • BigQuery data warehouse with automated dbt transformations
  • Vertex AI integration for advanced AI capabilities
  • Cloud Run deployment with auto-scaling and monitoring
  • Event-driven pipeline eliminating batch processing delays

βœ… Industry-Focused AI Application

  • AI Reviewer Suggester with multi-step reasoning
  • AI Risk Assessor with security vulnerability detection
  • AI Diff Summarizer with intelligent caching
  • Comprehensive insights for engineering teams

βœ… Modern AI & Data Relevance

  • Multi-step LLM reasoning for complex analysis
  • Proactive security scanning and risk assessment
  • Real-time insights and recommendations
  • Scalable architecture supporting enterprise workloads

πŸš€ Key Features

🎯 Core AI Capabilities

  • Smart Reviewer Suggestions: Multi-step AI reasoning to recommend optimal reviewers based on expertise, workload, and availability
  • Intelligent Risk Assessment: Comprehensive risk analysis with security vulnerability detection, code pattern analysis, and complexity scoring
  • Automated Diff Summarization: AI-powered summaries with intelligent caching and commit-based invalidation
  • Real-time Insights: Live analysis of merge request pipeline with actionable recommendations

πŸ—οΈ Production Architecture

  • Event-Driven Data Pipeline: Real-time processing with Fivetran β†’ BigQuery β†’ dbt β†’ API
  • Cloud-Native Deployment: Google Cloud Run with auto-scaling, monitoring, and security
  • Modern Tech Stack: FastAPI backend, React frontend, Vertex AI integration
  • Comprehensive Monitoring: Prometheus, Grafana, custom exporters, and alerting

1. Custom Fivetran Connector for GitLab

A robust, production-grade Fivetran connector that extracts critical SDLC data from the GitLab API.

  • Incremental Syncs: Efficiently syncs merge requests using updated_after timestamps to minimize data transfer and API load.
  • Dynamic Project Discovery: Automatically discovers and syncs GitLab projects based on configurable naming patterns, making the connector highly scalable and low-maintenance.
  • Performance Optimized: Utilizes batching techniques (e.g., for fetching users) to avoid the N+1 problem and improve sync performance.
  • Automated dbt Integration: Triggers a dbt run upon successful data synchronization, enabling a fully automated, end-to-end data pipeline from extraction to transformation.

2. AI-Powered Intelligence Engine

At the core of MergeMind is a sophisticated AI engine built on Google Vertex AI that provides deep insights into every merge request.

  • AI Reviewer Suggester: A smart recommendation system that suggests the best possible reviewers for a merge request.

    • Multi-Step AI Reasoning: Uses a chain of LLM prompts to first analyze the required expertise, then analyze reviewer workload, and finally synthesize the results into a ranked list of suggestions.
    • Holistic Analysis: Considers not just technical expertise but also reviewer workload, availability, and fairness to provide practical and balanced recommendations.
  • AI Risk Assessor: A comprehensive risk analysis tool that provides both a quantitative score and qualitative feedback on every code change.

    • Multi-Vector Analysis: Decomposes "risk" into three key areasβ€”Code Patterns, Security, and Complexityβ€”and uses a dedicated LLM prompt to analyze each one.
    • Proactive Security Scanning: The security analysis prompt specifically instructs the LLM to look for common vulnerabilities like SQL Injection, XSS, and sensitive data exposure, acting as an automated security audit.
    • Tunable Weighted Scoring: Combines the scores from the three vectors into a single, weighted risk score, allowing the model to be tuned to organizational priorities.
  • AI Diff Summarizer: Automatically generates clear, concise summaries of merge requests.

    • Intelligent Caching: Features a smart caching mechanism that uses the commit SHA as part of the cache key, ensuring summaries are only regenerated when the code actually changes, saving time and cost.

πŸ—οΈ Technical Architecture

System Overview

MergeMind implements a modern, cloud-native architecture with event-driven data processing and AI-powered analytics:

graph TB
    subgraph "Data Sources"
        GL[GitLab API<br/>Projects, MRs, Users]
        FV[Fivetran Connector<br/>Custom GitLab Connector]
    end
    
    subgraph "Event-Driven Pipeline"
        CF[Cloud Function<br/>dbt-trigger-function]
        dbt[dbt Models<br/>Transformations]
    end
    
    subgraph "Data Warehouse"
        BQ_RAW[BigQuery Raw<br/>mergemind_raw]
        BQ_MODELED[BigQuery Modeled<br/>mergemind]
    end
    
    subgraph "AI Services Layer"
        VAI[Vertex AI<br/>Gemini 2.5 Flash Lite]
        RS[Reviewer Service<br/>AI Suggestions]
        RISK[Risk Service<br/>AI Risk Assessment]
        SUM[Summary Service<br/>AI Diff Summarization]
        INSIGHTS[AI Insights Service<br/>Comprehensive Analysis]
    end
    
    subgraph "API Layer"
        API[FastAPI Backend<br/>REST API]
        MR_ROUTER[MR Router<br/>Individual MR Operations]
        MRS_ROUTER[MRS Router<br/>MR Listings and Blockers]
        AI_ROUTER[AI Router<br/>AI Insights and Recommendations]
        HEALTH[Health Router<br/>Monitoring and Metrics]
    end
    
    subgraph "Frontend Layer"
        UI[React Frontend<br/>Modern Dashboard]
        DASH[AIDashboardCard<br/>Main Dashboard]
        INSIGHTS_UI[AIInsightsCard<br/>AI Analysis Display]
        RECS[AIRecommendationsCard<br/>Recommendations]
        BLOCKERS[BlockersCard<br/>Top Blockers]
    end
    
    subgraph "Infrastructure"
        GCP[Google Cloud Platform]
        RUN_API[Cloud Run API<br/>Backend Service]
        RUN_UI[Cloud Run UI<br/>Frontend Service]
        LB[Load Balancer<br/>Traffic Distribution]
        SECRETS[Secret Manager<br/>Credentials Storage]
    end
    
    %% Data Flow
    GL -->|API Calls| FV
    FV -->|Sync Data| BQ_RAW
    FV -->|Trigger| CF
    CF -->|Run Transformations| dbt
    dbt -->|Modeled Data| BQ_MODELED
    
    %% API Data Flow
    BQ_MODELED -->|Query Data| API
    API -->|AI Requests| VAI
    API -->|Service Calls| RS
    API -->|Service Calls| RISK
    API -->|Service Calls| SUM
    API -->|Service Calls| INSIGHTS
    
    %% Frontend Flow
    API -->|REST API| UI
    UI -->|Components| DASH
    UI -->|Components| INSIGHTS_UI
    UI -->|Components| RECS
    UI -->|Components| BLOCKERS
    
    %% Infrastructure
    API -->|Deploy| RUN_API
    UI -->|Deploy| RUN_UI
    RUN_API -->|Traffic| LB
    RUN_UI -->|Traffic| LB
    LB -->|Serve| GCP
    
    %% Security
    API -->|Credentials| SECRETS
    CF -->|Credentials| SECRETS
Loading

Core Components

1. Data Ingestion Layer

  • Fivetran Custom Connector: Production-grade GitLab API connector with incremental sync
  • Dynamic Project Discovery: Automatically discovers and syncs projects based on patterns
  • Event-Driven Triggers: Cloud Function integration for real-time dbt execution

2. Data Processing Layer

  • BigQuery Data Warehouse: Scalable data storage with partitioning and clustering
  • dbt Transformations: Automated data modeling and business logic
  • Event-Driven Pipeline: Eliminates batch processing delays

3. AI Services Layer

  • Vertex AI Integration: Google's Gemini 2.5 Flash Lite for advanced reasoning
  • Multi-Step AI Reasoning: Complex analysis chains for reviewer suggestions
  • Intelligent Caching: Commit-based cache invalidation for optimal performance

4. Application Layer

  • FastAPI Backend: High-performance Python API with comprehensive endpoints
  • React Frontend: Modern, responsive dashboard with real-time updates
  • Cloud Run Deployment: Serverless, auto-scaling infrastructure

5. Monitoring & Observability

  • Prometheus Metrics: Comprehensive application and business metrics
  • Grafana Dashboards: Real-time visualization and alerting
  • Custom Exporters: BigQuery, GitLab, and Vertex AI monitoring

πŸ“‹ Table of Contents

πŸƒβ€β™‚οΈ Quick Start

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Google Cloud Platform account
  • GitLab instance (self-hosted or GitLab.com)
  • Fivetran account (for data ingestion)

Local Development

  1. Clone the repository

    git clone https://github.com/mergemind/mergemind.git
    cd mergemind
  2. Set up environment

    # Copy environment template
    cp .env.example .env
    
    # Edit configuration
    nano .env
  3. Install dependencies

    # Install API dependencies
    cd app/backend/fastapi_app
    pip install -r requirements.txt
    
    # Install UI dependencies
    cd ../../frontend/web
    npm install
  4. Start services

    # Start API (terminal 1)
    cd app/backend/fastapi_app
    uvicorn main:app --reload --port 8080
    
    # Start UI (terminal 2)
    cd app/frontend/web
    npm run dev
  5. Access the application

Detailed Architecture Diagrams

For comprehensive architecture documentation, see:

πŸ“¦ Installation

Docker Compose (Recommended)

# Clone repository
git clone https://github.com/prabhakaran-jm/mergemind.git
cd mergemind

# Copy environment file
cp .env.example .env

# Edit configuration
nano .env

# Start services
docker-compose up -d

# Check status
docker-compose ps

Manual Installation

1. API Setup

cd app/backend/fastapi_app

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run tests
python run_tests.py

# Start development server
uvicorn main:app --reload --port 8080

2. UI Setup

cd app/frontend/web

# Install dependencies
npm install

# Start development server
npm run dev

# Build for production
npm run build

3. Data Pipeline Setup

cd warehouse/bigquery/dbt

# Install dbt
pip install dbt-bigquery

# Install dbt packages
dbt deps

# Run models
dbt run

# Run tests
dbt test

πŸ”„ Event-Driven Data Pipeline

The MergeMind platform features a fully automated event-driven data pipeline that processes GitLab data in real-time:

Pipeline Flow

  1. GitLab Events β†’ New merge requests, updates, or changes
  2. Fivetran Sync β†’ Custom connector syncs data to BigQuery
  3. Cloud Function Trigger β†’ Fivetran calls Cloud Function on sync completion
  4. dbt Transformations β†’ Automated data modeling and transformations
  5. BigQuery Updates β†’ Transformed data available for API consumption

Key Components

Fivetran Connector

  • Location: ingestion/fivetran_connector/
  • Features: Custom GitLab API connector with dbt trigger integration
  • Configuration: Environment variables for GitLab and Cloud Function URLs
  • Sync Frequency: Configurable (default: 1 hour)

Cloud Function (dbt Trigger)

  • Location: deploy/terraform/cloud_function/
  • Purpose: Triggers dbt runs when new data arrives
  • Runtime: Python 3.11 with dbt-core and dbt-bigquery
  • Timeout: 5 minutes (configurable)
  • Authentication: Bearer token for security

dbt Models

  • Location: warehouse/bigquery/dbt/models/
  • Transformations: Raw data β†’ Clean, modeled datasets
  • Output: mergemind dataset with business-ready views

Deployment

# Deploy infrastructure
cd deploy/terraform
terraform init
terraform plan
terraform apply

# Deploy Fivetran connector
cd ingestion/fivetran_connector
# Configure fivetran_config.json with your settings
# Deploy to Fivetran platform

# Test the pipeline
# Create a merge request in GitLab
# Monitor Fivetran sync logs
# Verify dbt transformations in BigQuery

Monitoring

  • Fivetran Logs: Monitor sync status and dbt trigger calls
  • Cloud Function Logs: Check dbt execution and errors
  • BigQuery: Verify data transformations and model updates
  • API Endpoints: Test data availability and freshness

βš™οΈ Configuration

Environment Variables

# GCP Configuration
GCP_PROJECT_ID=your-project-id
BQ_DATASET_RAW=mergemind_raw
BQ_DATASET_MODELED=mergemind
VERTEX_LOCATION=us-central1
VERTEX_AI_MODEL=gemini-2.5-flash-lite

# GitLab Configuration
GITLAB_BASE_URL=https://your-gitlab.com
GITLAB_TOKEN=glpat-your-token

# Event-Driven Pipeline Configuration
DBT_TRIGGER_URL=https://dbt-trigger-function-xxx-uc.a.run.app
DBT_TRIGGER_AUTH_TOKEN=your-secure-token-here

# API Configuration
API_BASE_URL=https://api.mergemind.com
LOG_LEVEL=INFO
ENVIRONMENT=production

# Security
SECRET_KEY=your-secret-key
ALLOWED_HOSTS=api.mergemind.com,mergemind.com

BigQuery Setup

-- Create datasets
CREATE SCHEMA `mergemind_raw`;
CREATE SCHEMA `mergemind`;

-- Create tables with schemas
CREATE TABLE `mergemind_raw.merge_requests` (
  mr_id INT64,
  project_id INT64,
  title STRING,
  description STRING,
  author_id INT64,
  state STRING,
  created_at TIMESTAMP,
  updated_at TIMESTAMP,
  additions INT64,
  deletions INT64,
  web_url STRING
);

CREATE TABLE `mergemind_raw.mr_notes` (
  id INT64,
  mr_id INT64,
  author_id INT64,
  note_type STRING,
  body STRING,
  created_at TIMESTAMP
);

CREATE TABLE `mergemind_raw.users` (
  user_id INT64,
  username STRING,
  name STRING,
  email STRING,
  state STRING,
  created_at TIMESTAMP
);

CREATE TABLE `mergemind_raw.projects` (
  project_id INT64,
  name STRING,
  description STRING,
  visibility STRING,
  created_at TIMESTAMP
);

CREATE TABLE `mergemind_raw.pipelines` (
  pipeline_id INT64,
  project_id INT64,
  status STRING,
  ref STRING,
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);

dbt Models

cd warehouse/bigquery/dbt

# Install packages
dbt deps

# Run models
dbt run

# Test models
dbt test

# Generate documentation
dbt docs generate
dbt docs serve

πŸ“š API Documentation

Base URL

  • Development: http://localhost:8080/api/v1
  • Production: https://api.mergemind.com/api/v1

Authentication

Currently no authentication required for MVP. Future versions will support API keys and OAuth.

Endpoints

Health Check

  • GET /healthz - Basic health check
  • GET /ready - Readiness check with dependency validation
  • GET /health/detailed - Comprehensive health check with metrics

Merge Requests

  • GET /mrs - List merge requests with risk analysis
  • GET /blockers/top - Get top blocking merge requests

Individual MR

  • GET /mr/{id}/context - Get comprehensive MR context
  • POST /mr/{id}/summary - Generate AI summary
  • GET /mr/{id}/reviewers - Get suggested reviewers
  • GET /mr/{id}/risk - Get risk analysis
  • GET /mr/{id}/stats - Get MR statistics

Metrics

  • GET /metrics - Get application metrics
  • GET /metrics/slo - Get SLO status and violations
  • POST /metrics/reset - Reset metrics (admin only)

Example Usage

# List open merge requests
curl "https://api.mergemind.com/api/v1/mrs?state=open&limit=20"

# Get MR context
curl "https://api.mergemind.com/api/v1/mr/123/context"

# Generate AI summary
curl -X POST "https://api.mergemind.com/api/v1/mr/123/summary"

# Get reviewer suggestions
curl "https://api.mergemind.com/api/v1/mr/123/reviewers"

# Get risk analysis
curl "https://api.mergemind.com/api/v1/mr/123/risk"

For complete API documentation, see API Reference.

πŸš€ Deployment

Google Cloud Run (Recommended)

# Build and push Docker images
docker build -t gcr.io/your-project/mergemind-api:latest app/backend/
docker push gcr.io/your-project/mergemind-api:latest

docker build -t gcr.io/your-project/mergemind-ui:latest app/frontend/
docker push gcr.io/your-project/mergemind-ui:latest

# Deploy to Cloud Run
gcloud run deploy mergemind-api \
  --image gcr.io/your-project/mergemind-api:latest \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --port 8080 \
  --memory 2Gi \
  --cpu 2 \
  --max-instances 10

gcloud run deploy mergemind-ui \
  --image gcr.io/your-project/mergemind-ui:latest \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated \
  --port 3000 \
  --memory 512Mi \
  --cpu 1 \
  --max-instances 5

Kubernetes

# Create cluster
gcloud container clusters create mergemind-cluster \
  --zone us-central1-a \
  --num-nodes 3 \
  --machine-type e2-standard-4

# Deploy with Helm
helm install mergemind ./helm/mergemind \
  --set gcp.projectId=your-project-id \
  --set gitlab.baseUrl=https://your-gitlab.com

For detailed deployment instructions, see Production Deployment Guide.

πŸ“Š Monitoring

Health Checks

# Basic health check
curl "https://api.mergemind.com/api/v1/healthz"

# Detailed health check
curl "https://api.mergemind.com/api/v1/health/detailed"

# SLO status
curl "https://api.mergemind.com/api/v1/metrics/slo"

Metrics

The application exposes Prometheus-compatible metrics:

  • Request count and duration
  • Error rates and types
  • Business metrics (MR analysis, AI summaries)
  • External service health

Alerting

Configure alerts for:

  • High error rates (>1%)
  • High latency (P95 > 2s)
  • Service downtime
  • BigQuery quota exceeded
  • AI service failures

For comprehensive monitoring setup, see the monitoring folder documentation.

πŸ”’ Security

Security Features

  • Input validation and sanitization
  • Rate limiting and DDoS protection
  • HTTPS enforcement
  • Security headers
  • Data encryption at rest and in transit
  • Access logging and audit trails
  • Dynamic configuration - No hardcoded secrets or project IDs
  • Environment-based secrets management - All sensitive data in environment variables
  • Secure file handling - Proper .gitignore and .dockerignore configuration

Compliance

  • GDPR compliance for data protection
  • SOC 2 Type II compliance
  • Security incident response plan
  • Regular security audits

For detailed security information, see the monitoring and infrastructure documentation.

πŸ§ͺ Testing

Run Tests

# API tests
cd app/backend/fastapi_app
python run_tests.py

# Run specific test types
python run_tests.py --type unit
python run_tests.py --type integration

# Run with coverage
python run_tests.py --coverage

# UI tests
cd app/frontend/web
npm test
npm run test:coverage

Test Coverage

  • Unit Tests: Service layer components
  • Integration Tests: End-to-end workflows
  • API Tests: Endpoint functionality
  • Performance Tests: Load and stress testing

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Code Style

  • Python: Black, isort, flake8
  • JavaScript: Prettier, ESLint
  • TypeScript: Strict mode enabled

πŸ‘¨β€πŸ’» Developer

Solo Developer Project

  • Built from scratch for the Fivetran Challenge
  • Full-stack development: Frontend, Backend, AI Services, Infrastructure
  • Technologies: Python, React, Google Cloud, Vertex AI, BigQuery, Fivetran
  • Architecture: Event-driven pipeline with AI-powered analytics

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

πŸ—ΊοΈ Roadmap

βœ… v1.0.0 (Current - Fivetran Challenge Submission)

  • βœ… Custom Fivetran connector for GitLab
  • βœ… Event-driven data pipeline with Cloud Functions
  • βœ… AI-powered merge request analysis
  • βœ… Risk assessment and reviewer suggestions
  • βœ… Automated diff summarization
  • βœ… Modern React dashboard
  • βœ… Production-ready infrastructure
  • βœ… Comprehensive monitoring and alerting

v1.1.0 (Q2 2024)

  • πŸ”„ Authentication and authorization
  • πŸ”„ Webhook notifications for real-time updates
  • πŸ”„ Bulk operations for MR management
  • πŸ”„ Advanced filtering and search capabilities
  • πŸ”„ Multi-environment support (dev/staging/prod)

v1.2.0 (Q3 2024)

  • πŸ“‹ Real-time collaboration features
  • πŸ“‹ Advanced analytics and reporting
  • πŸ“‹ Custom risk rules and thresholds
  • πŸ“‹ Team performance metrics
  • πŸ“‹ Integration with additional Git providers

v2.0.0 (Q4 2024)

  • πŸ“‹ Multi-repository and multi-organization support
  • πŸ“‹ Advanced AI models (GPT-4, Claude)
  • πŸ“‹ Enterprise SSO and RBAC
  • πŸ“‹ Self-hosted deployment options
  • πŸ“‹ Advanced workflow automation

πŸ™ Acknowledgments


MergeMind - Making merge request analysis intelligent and efficient. πŸš€

About

Enterprise DevOps intelligence platform with AI-powered GitLab analytics. Cloud-native architecture for engineering metrics and automated insights

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors