AI-powered software development intelligence platform that transforms GitLab merge request data into actionable insights for engineering teams. Built for the Fivetran Challenge by a solo developer, it demonstrates a complete end-to-end solution combining custom data ingestion, cloud-native architecture, and advanced AI capabilities.
This project showcases a production-ready solution that exceeds all challenge requirements:
- Production-grade GitLab API connector with incremental sync capabilities
- Dynamic project discovery and automated data pipeline
- Event-driven architecture with Cloud Function triggers
- Comprehensive error handling and monitoring
- BigQuery data warehouse with automated dbt transformations
- Vertex AI integration for advanced AI capabilities
- Cloud Run deployment with auto-scaling and monitoring
- Event-driven pipeline eliminating batch processing delays
- AI Reviewer Suggester with multi-step reasoning
- AI Risk Assessor with security vulnerability detection
- AI Diff Summarizer with intelligent caching
- Comprehensive insights for engineering teams
- Multi-step LLM reasoning for complex analysis
- Proactive security scanning and risk assessment
- Real-time insights and recommendations
- Scalable architecture supporting enterprise workloads
- Smart Reviewer Suggestions: Multi-step AI reasoning to recommend optimal reviewers based on expertise, workload, and availability
- Intelligent Risk Assessment: Comprehensive risk analysis with security vulnerability detection, code pattern analysis, and complexity scoring
- Automated Diff Summarization: AI-powered summaries with intelligent caching and commit-based invalidation
- Real-time Insights: Live analysis of merge request pipeline with actionable recommendations
- Event-Driven Data Pipeline: Real-time processing with Fivetran β BigQuery β dbt β API
- Cloud-Native Deployment: Google Cloud Run with auto-scaling, monitoring, and security
- Modern Tech Stack: FastAPI backend, React frontend, Vertex AI integration
- Comprehensive Monitoring: Prometheus, Grafana, custom exporters, and alerting
A robust, production-grade Fivetran connector that extracts critical SDLC data from the GitLab API.
- Incremental Syncs: Efficiently syncs merge requests using
updated_aftertimestamps to minimize data transfer and API load. - Dynamic Project Discovery: Automatically discovers and syncs GitLab projects based on configurable naming patterns, making the connector highly scalable and low-maintenance.
- Performance Optimized: Utilizes batching techniques (e.g., for fetching users) to avoid the N+1 problem and improve sync performance.
- Automated dbt Integration: Triggers a dbt run upon successful data synchronization, enabling a fully automated, end-to-end data pipeline from extraction to transformation.
At the core of MergeMind is a sophisticated AI engine built on Google Vertex AI that provides deep insights into every merge request.
-
AI Reviewer Suggester: A smart recommendation system that suggests the best possible reviewers for a merge request.
- Multi-Step AI Reasoning: Uses a chain of LLM prompts to first analyze the required expertise, then analyze reviewer workload, and finally synthesize the results into a ranked list of suggestions.
- Holistic Analysis: Considers not just technical expertise but also reviewer workload, availability, and fairness to provide practical and balanced recommendations.
-
AI Risk Assessor: A comprehensive risk analysis tool that provides both a quantitative score and qualitative feedback on every code change.
- Multi-Vector Analysis: Decomposes "risk" into three key areasβCode Patterns, Security, and Complexityβand uses a dedicated LLM prompt to analyze each one.
- Proactive Security Scanning: The security analysis prompt specifically instructs the LLM to look for common vulnerabilities like SQL Injection, XSS, and sensitive data exposure, acting as an automated security audit.
- Tunable Weighted Scoring: Combines the scores from the three vectors into a single, weighted risk score, allowing the model to be tuned to organizational priorities.
-
AI Diff Summarizer: Automatically generates clear, concise summaries of merge requests.
- Intelligent Caching: Features a smart caching mechanism that uses the commit SHA as part of the cache key, ensuring summaries are only regenerated when the code actually changes, saving time and cost.
MergeMind implements a modern, cloud-native architecture with event-driven data processing and AI-powered analytics:
graph TB
subgraph "Data Sources"
GL[GitLab API<br/>Projects, MRs, Users]
FV[Fivetran Connector<br/>Custom GitLab Connector]
end
subgraph "Event-Driven Pipeline"
CF[Cloud Function<br/>dbt-trigger-function]
dbt[dbt Models<br/>Transformations]
end
subgraph "Data Warehouse"
BQ_RAW[BigQuery Raw<br/>mergemind_raw]
BQ_MODELED[BigQuery Modeled<br/>mergemind]
end
subgraph "AI Services Layer"
VAI[Vertex AI<br/>Gemini 2.5 Flash Lite]
RS[Reviewer Service<br/>AI Suggestions]
RISK[Risk Service<br/>AI Risk Assessment]
SUM[Summary Service<br/>AI Diff Summarization]
INSIGHTS[AI Insights Service<br/>Comprehensive Analysis]
end
subgraph "API Layer"
API[FastAPI Backend<br/>REST API]
MR_ROUTER[MR Router<br/>Individual MR Operations]
MRS_ROUTER[MRS Router<br/>MR Listings and Blockers]
AI_ROUTER[AI Router<br/>AI Insights and Recommendations]
HEALTH[Health Router<br/>Monitoring and Metrics]
end
subgraph "Frontend Layer"
UI[React Frontend<br/>Modern Dashboard]
DASH[AIDashboardCard<br/>Main Dashboard]
INSIGHTS_UI[AIInsightsCard<br/>AI Analysis Display]
RECS[AIRecommendationsCard<br/>Recommendations]
BLOCKERS[BlockersCard<br/>Top Blockers]
end
subgraph "Infrastructure"
GCP[Google Cloud Platform]
RUN_API[Cloud Run API<br/>Backend Service]
RUN_UI[Cloud Run UI<br/>Frontend Service]
LB[Load Balancer<br/>Traffic Distribution]
SECRETS[Secret Manager<br/>Credentials Storage]
end
%% Data Flow
GL -->|API Calls| FV
FV -->|Sync Data| BQ_RAW
FV -->|Trigger| CF
CF -->|Run Transformations| dbt
dbt -->|Modeled Data| BQ_MODELED
%% API Data Flow
BQ_MODELED -->|Query Data| API
API -->|AI Requests| VAI
API -->|Service Calls| RS
API -->|Service Calls| RISK
API -->|Service Calls| SUM
API -->|Service Calls| INSIGHTS
%% Frontend Flow
API -->|REST API| UI
UI -->|Components| DASH
UI -->|Components| INSIGHTS_UI
UI -->|Components| RECS
UI -->|Components| BLOCKERS
%% Infrastructure
API -->|Deploy| RUN_API
UI -->|Deploy| RUN_UI
RUN_API -->|Traffic| LB
RUN_UI -->|Traffic| LB
LB -->|Serve| GCP
%% Security
API -->|Credentials| SECRETS
CF -->|Credentials| SECRETS
- Fivetran Custom Connector: Production-grade GitLab API connector with incremental sync
- Dynamic Project Discovery: Automatically discovers and syncs projects based on patterns
- Event-Driven Triggers: Cloud Function integration for real-time dbt execution
- BigQuery Data Warehouse: Scalable data storage with partitioning and clustering
- dbt Transformations: Automated data modeling and business logic
- Event-Driven Pipeline: Eliminates batch processing delays
- Vertex AI Integration: Google's Gemini 2.5 Flash Lite for advanced reasoning
- Multi-Step AI Reasoning: Complex analysis chains for reviewer suggestions
- Intelligent Caching: Commit-based cache invalidation for optimal performance
- FastAPI Backend: High-performance Python API with comprehensive endpoints
- React Frontend: Modern, responsive dashboard with real-time updates
- Cloud Run Deployment: Serverless, auto-scaling infrastructure
- Prometheus Metrics: Comprehensive application and business metrics
- Grafana Dashboards: Real-time visualization and alerting
- Custom Exporters: BigQuery, GitLab, and Vertex AI monitoring
- Quick Start
- Architecture
- Installation
- Configuration
- API Documentation
- Deployment
- Monitoring
- Security
- Contributing
- License
- Python 3.11+
- Node.js 18+
- Google Cloud Platform account
- GitLab instance (self-hosted or GitLab.com)
- Fivetran account (for data ingestion)
-
Clone the repository
git clone https://github.com/mergemind/mergemind.git cd mergemind -
Set up environment
# Copy environment template cp .env.example .env # Edit configuration nano .env
-
Install dependencies
# Install API dependencies cd app/backend/fastapi_app pip install -r requirements.txt # Install UI dependencies cd ../../frontend/web npm install
-
Start services
# Start API (terminal 1) cd app/backend/fastapi_app uvicorn main:app --reload --port 8080 # Start UI (terminal 2) cd app/frontend/web npm run dev
-
Access the application
- API: http://localhost:8080
- UI: http://localhost:5173
- API Docs: http://localhost:8080/docs
For comprehensive architecture documentation, see:
- Architecture Diagram - Complete system architecture
- Data Flow Diagram - Event-driven pipeline flow
- Deployment Architecture - Production deployment structure
# Clone repository
git clone https://github.com/prabhakaran-jm/mergemind.git
cd mergemind
# Copy environment file
cp .env.example .env
# Edit configuration
nano .env
# Start services
docker-compose up -d
# Check status
docker-compose pscd app/backend/fastapi_app
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run tests
python run_tests.py
# Start development server
uvicorn main:app --reload --port 8080cd app/frontend/web
# Install dependencies
npm install
# Start development server
npm run dev
# Build for production
npm run buildcd warehouse/bigquery/dbt
# Install dbt
pip install dbt-bigquery
# Install dbt packages
dbt deps
# Run models
dbt run
# Run tests
dbt testThe MergeMind platform features a fully automated event-driven data pipeline that processes GitLab data in real-time:
- GitLab Events β New merge requests, updates, or changes
- Fivetran Sync β Custom connector syncs data to BigQuery
- Cloud Function Trigger β Fivetran calls Cloud Function on sync completion
- dbt Transformations β Automated data modeling and transformations
- BigQuery Updates β Transformed data available for API consumption
- Location:
ingestion/fivetran_connector/ - Features: Custom GitLab API connector with dbt trigger integration
- Configuration: Environment variables for GitLab and Cloud Function URLs
- Sync Frequency: Configurable (default: 1 hour)
- Location:
deploy/terraform/cloud_function/ - Purpose: Triggers dbt runs when new data arrives
- Runtime: Python 3.11 with dbt-core and dbt-bigquery
- Timeout: 5 minutes (configurable)
- Authentication: Bearer token for security
- Location:
warehouse/bigquery/dbt/models/ - Transformations: Raw data β Clean, modeled datasets
- Output:
mergeminddataset with business-ready views
# Deploy infrastructure
cd deploy/terraform
terraform init
terraform plan
terraform apply
# Deploy Fivetran connector
cd ingestion/fivetran_connector
# Configure fivetran_config.json with your settings
# Deploy to Fivetran platform
# Test the pipeline
# Create a merge request in GitLab
# Monitor Fivetran sync logs
# Verify dbt transformations in BigQuery- Fivetran Logs: Monitor sync status and dbt trigger calls
- Cloud Function Logs: Check dbt execution and errors
- BigQuery: Verify data transformations and model updates
- API Endpoints: Test data availability and freshness
# GCP Configuration
GCP_PROJECT_ID=your-project-id
BQ_DATASET_RAW=mergemind_raw
BQ_DATASET_MODELED=mergemind
VERTEX_LOCATION=us-central1
VERTEX_AI_MODEL=gemini-2.5-flash-lite
# GitLab Configuration
GITLAB_BASE_URL=https://your-gitlab.com
GITLAB_TOKEN=glpat-your-token
# Event-Driven Pipeline Configuration
DBT_TRIGGER_URL=https://dbt-trigger-function-xxx-uc.a.run.app
DBT_TRIGGER_AUTH_TOKEN=your-secure-token-here
# API Configuration
API_BASE_URL=https://api.mergemind.com
LOG_LEVEL=INFO
ENVIRONMENT=production
# Security
SECRET_KEY=your-secret-key
ALLOWED_HOSTS=api.mergemind.com,mergemind.com-- Create datasets
CREATE SCHEMA `mergemind_raw`;
CREATE SCHEMA `mergemind`;
-- Create tables with schemas
CREATE TABLE `mergemind_raw.merge_requests` (
mr_id INT64,
project_id INT64,
title STRING,
description STRING,
author_id INT64,
state STRING,
created_at TIMESTAMP,
updated_at TIMESTAMP,
additions INT64,
deletions INT64,
web_url STRING
);
CREATE TABLE `mergemind_raw.mr_notes` (
id INT64,
mr_id INT64,
author_id INT64,
note_type STRING,
body STRING,
created_at TIMESTAMP
);
CREATE TABLE `mergemind_raw.users` (
user_id INT64,
username STRING,
name STRING,
email STRING,
state STRING,
created_at TIMESTAMP
);
CREATE TABLE `mergemind_raw.projects` (
project_id INT64,
name STRING,
description STRING,
visibility STRING,
created_at TIMESTAMP
);
CREATE TABLE `mergemind_raw.pipelines` (
pipeline_id INT64,
project_id INT64,
status STRING,
ref STRING,
created_at TIMESTAMP,
updated_at TIMESTAMP
);cd warehouse/bigquery/dbt
# Install packages
dbt deps
# Run models
dbt run
# Test models
dbt test
# Generate documentation
dbt docs generate
dbt docs serve- Development:
http://localhost:8080/api/v1 - Production:
https://api.mergemind.com/api/v1
Currently no authentication required for MVP. Future versions will support API keys and OAuth.
GET /healthz- Basic health checkGET /ready- Readiness check with dependency validationGET /health/detailed- Comprehensive health check with metrics
GET /mrs- List merge requests with risk analysisGET /blockers/top- Get top blocking merge requests
GET /mr/{id}/context- Get comprehensive MR contextPOST /mr/{id}/summary- Generate AI summaryGET /mr/{id}/reviewers- Get suggested reviewersGET /mr/{id}/risk- Get risk analysisGET /mr/{id}/stats- Get MR statistics
GET /metrics- Get application metricsGET /metrics/slo- Get SLO status and violationsPOST /metrics/reset- Reset metrics (admin only)
# List open merge requests
curl "https://api.mergemind.com/api/v1/mrs?state=open&limit=20"
# Get MR context
curl "https://api.mergemind.com/api/v1/mr/123/context"
# Generate AI summary
curl -X POST "https://api.mergemind.com/api/v1/mr/123/summary"
# Get reviewer suggestions
curl "https://api.mergemind.com/api/v1/mr/123/reviewers"
# Get risk analysis
curl "https://api.mergemind.com/api/v1/mr/123/risk"For complete API documentation, see API Reference.
# Build and push Docker images
docker build -t gcr.io/your-project/mergemind-api:latest app/backend/
docker push gcr.io/your-project/mergemind-api:latest
docker build -t gcr.io/your-project/mergemind-ui:latest app/frontend/
docker push gcr.io/your-project/mergemind-ui:latest
# Deploy to Cloud Run
gcloud run deploy mergemind-api \
--image gcr.io/your-project/mergemind-api:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--port 8080 \
--memory 2Gi \
--cpu 2 \
--max-instances 10
gcloud run deploy mergemind-ui \
--image gcr.io/your-project/mergemind-ui:latest \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--port 3000 \
--memory 512Mi \
--cpu 1 \
--max-instances 5# Create cluster
gcloud container clusters create mergemind-cluster \
--zone us-central1-a \
--num-nodes 3 \
--machine-type e2-standard-4
# Deploy with Helm
helm install mergemind ./helm/mergemind \
--set gcp.projectId=your-project-id \
--set gitlab.baseUrl=https://your-gitlab.comFor detailed deployment instructions, see Production Deployment Guide.
# Basic health check
curl "https://api.mergemind.com/api/v1/healthz"
# Detailed health check
curl "https://api.mergemind.com/api/v1/health/detailed"
# SLO status
curl "https://api.mergemind.com/api/v1/metrics/slo"The application exposes Prometheus-compatible metrics:
- Request count and duration
- Error rates and types
- Business metrics (MR analysis, AI summaries)
- External service health
Configure alerts for:
- High error rates (>1%)
- High latency (P95 > 2s)
- Service downtime
- BigQuery quota exceeded
- AI service failures
For comprehensive monitoring setup, see the monitoring folder documentation.
- Input validation and sanitization
- Rate limiting and DDoS protection
- HTTPS enforcement
- Security headers
- Data encryption at rest and in transit
- Access logging and audit trails
- Dynamic configuration - No hardcoded secrets or project IDs
- Environment-based secrets management - All sensitive data in environment variables
- Secure file handling - Proper .gitignore and .dockerignore configuration
- GDPR compliance for data protection
- SOC 2 Type II compliance
- Security incident response plan
- Regular security audits
For detailed security information, see the monitoring and infrastructure documentation.
# API tests
cd app/backend/fastapi_app
python run_tests.py
# Run specific test types
python run_tests.py --type unit
python run_tests.py --type integration
# Run with coverage
python run_tests.py --coverage
# UI tests
cd app/frontend/web
npm test
npm run test:coverage- Unit Tests: Service layer components
- Integration Tests: End-to-end workflows
- API Tests: Endpoint functionality
- Performance Tests: Load and stress testing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
- Python: Black, isort, flake8
- JavaScript: Prettier, ESLint
- TypeScript: Strict mode enabled
Solo Developer Project
- Built from scratch for the Fivetran Challenge
- Full-stack development: Frontend, Backend, AI Services, Infrastructure
- Technologies: Python, React, Google Cloud, Vertex AI, BigQuery, Fivetran
- Architecture: Event-driven pipeline with AI-powered analytics
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: docs/
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@mergemind.com
- β Custom Fivetran connector for GitLab
- β Event-driven data pipeline with Cloud Functions
- β AI-powered merge request analysis
- β Risk assessment and reviewer suggestions
- β Automated diff summarization
- β Modern React dashboard
- β Production-ready infrastructure
- β Comprehensive monitoring and alerting
- π Authentication and authorization
- π Webhook notifications for real-time updates
- π Bulk operations for MR management
- π Advanced filtering and search capabilities
- π Multi-environment support (dev/staging/prod)
- π Real-time collaboration features
- π Advanced analytics and reporting
- π Custom risk rules and thresholds
- π Team performance metrics
- π Integration with additional Git providers
- π Multi-repository and multi-organization support
- π Advanced AI models (GPT-4, Claude)
- π Enterprise SSO and RBAC
- π Self-hosted deployment options
- π Advanced workflow automation
- FastAPI for the API framework
- React for the frontend framework
- BigQuery for data warehousing
- Vertex AI for AI services
- Fivetran for data ingestion
MergeMind - Making merge request analysis intelligent and efficient. π