This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is a DevOps learning repository documenting hands-on implementations of various DevOps concepts and technologies. The main project is a polyglot microservices architecture demonstrating containerization, service-to-service communication, and production monitoring patterns.
The microservices system (1-microservices_test/) consists of:
-
Application Layer - 4 microservices written in different languages:
- Time Service (Go) - Fast compiled service at
time-service:5001 - System Info Service (Python) - System metrics using psutil at
system-info-service:5002 - Weather Service (Node.js) - Async external API integration at
weather-service:5003 - Dashboard Service (Python) - Aggregates all services with web UI at
dashboard-service:5000
- Time Service (Go) - Fast compiled service at
-
Observability Layer - Production monitoring stack:
- Prometheus - Metrics collection, scraping every 15s at port
9090 - Grafana - Visualization dashboards at port
3000(admin/admin)
- Prometheus - Metrics collection, scraping every 15s at port
All services communicate via REST APIs and Docker networking. Services discover each other by container name (e.g., http://time-service:5001).
- Polyglot architecture: Each service uses the best language for its task (Go for performance, Node.js for async I/O, Python for system libraries and web UI)
- Service independence: Each service has its own Dockerfile, dependencies, and can be scaled independently
- Graceful degradation: Dashboard continues working even when backend services fail
- Performance optimizations: Parallel API calls using ThreadPoolExecutor, intelligent caching (10min for weather data), reduced timeouts
- Real-time updates: JavaScript polling for live clock updates in the browser
- Prometheus instrumentation: All services expose
/metricsendpoints with custom metrics
# Navigate to microservices project
cd 1-microservices_test
# Build and start all services (including Prometheus/Grafana)
docker-compose up --build
# Start in detached mode
docker-compose up -d
# Rebuild single service
docker-compose build <service-name>
docker-compose up -d <service-name>
# Stop all services
docker-compose down
# View logs
docker-compose logs -f
docker-compose logs <service-name># Access dashboards
# Main dashboard: http://localhost:5000
# Grafana: http://localhost:3000 (admin/admin)
# Prometheus: http://localhost:9090
# Test individual services
curl http://localhost:5001/api/time
curl http://localhost:5002/api/sysinfo
curl http://localhost:5003/api/weather
curl http://localhost:5000/api/aggregate
# View Prometheus metrics
curl http://localhost:5001/metrics
curl http://localhost:5002/metrics
curl http://localhost:5003/metrics
curl http://localhost:5000/metrics
# Generate traffic for monitoring (requires bash)
./scripts/generate-traffic.sh -m balanced -d 60 -r 10 # Balanced load
./scripts/generate-traffic.sh -m spike -r 50 # Traffic spike
./scripts/generate-traffic.sh -m stress -d 300 # Stress test
# Check Prometheus alerts
curl http://localhost:9090/api/v1/alerts | jq
# Scale a service
docker-compose up --scale time-service=3
# Test performance
time curl -s http://localhost:5000/ > /dev/null
time curl -s http://localhost:5000/api/aggregate > /dev/nullEach service follows a consistent pattern:
/api/<resource>- Main API endpoint (JSON response)/health- Health check endpoint/metrics- Prometheus metrics endpoint
All services are instrumented with:
- Request counters:
<service>_http_requests_totalwith labels for endpoint, method, status - Latency histograms:
<service>_http_request_duration_secondsfor p50, p95, p99 percentiles - Custom metrics:
- Weather service:
weather_service_cache_hits_total,weather_service_cache_misses_total - Dashboard:
dashboard_service_upstream_request_duration_secondsfor tracking backend service latency
- Weather service:
Browser → Dashboard Service (port 5000)
↓ (parallel calls using ThreadPoolExecutor)
├→ Time Service (port 5001)
├→ System Info Service (port 5002)
└→ Weather Service (port 5003) → External API (wttr.in)
All services → Prometheus (port 9090) [scrapes /metrics every 15s]
↓
Grafana (port 3000) [queries Prometheus for visualization]
Services resolve each other using Docker Compose service names:
http://time-service:5001(nothttp://localhost:5001from inside containers)http://system-info-service:5002http://weather-service:5003- Browser requests go to
http://localhost:5000(dashboard acts as proxy)
The Go time service uses multi-stage Docker builds to minimize image size:
- Build stage: Uses
golang:1.21-alpinewith full toolchain - Runtime stage: Uses
alpine:latestwith only the compiled binary - Result: Much smaller final image (~15MB vs ~300MB)
The project works on Windows, Linux, and macOS:
- Docker Compose uses environment variable fallbacks:
${HOSTNAME:-${COMPUTERNAME:-localhost}} - Weather service uses
certifipackage for consistent SSL certificate handling across platforms - Traffic generation script requires bash (use Git Bash on Windows or WSL)
The dashboard service includes critical optimizations:
- Parallel API calls: Uses
ThreadPoolExecutorto call all backend services simultaneously - Reduced timeouts: Fail-fast approach (3s for most services, 2s for proxy)
- Weather caching: 10-minute cache to reduce external API calls
- Real-time updates: JavaScript polls
/api/time-proxyevery second for live clock
- Scrape interval: 15s (configured in
monitoring/prometheus.yml) - Evaluation interval: 10s for alert rules
- Alert rules: 11 total alerts covering performance, traffic, errors, resources, and availability
- Grafana dashboard: Pre-provisioned "Microservices Overview Dashboard" with 11 panels
- Dashboard refresh: Auto-refresh every 5 seconds
- Linux:
sudo systemctl start docker - Windows: Ensure Docker Desktop is running (check system tray icon)
Modify port mappings in docker-compose.yml if ports 5000-5003, 3000, or 9090 are in use.
# Check logs
docker-compose logs <service-name>
# Rebuild from scratch
docker-compose down
docker-compose build --no-cache <service-name>
docker-compose up# Check Prometheus targets
# Visit http://localhost:9090/targets
# All targets should show "UP" status
# Verify metrics endpoints are accessible
curl http://localhost:5001/metrics- Verify Prometheus is running:
docker ps | grep prometheus - Check datasource configuration in Grafana (should be auto-provisioned)
- Ensure time range in dashboard covers period when services were running
devops-progress/
├── 1-microservices_test/ # Main microservices project
│ ├── time-service/ # Go service
│ │ ├── main.go # HTTP server with Prometheus metrics
│ │ ├── go.mod # Go dependencies
│ │ └── Dockerfile # Multi-stage build
│ ├── system-info-service/ # Python service
│ │ ├── app.py # Flask app with psutil
│ │ ├── requirements.txt # Python deps + prometheus-client
│ │ └── Dockerfile
│ ├── weather-service/ # Node.js service
│ │ ├── server.js # Express + axios + caching
│ │ ├── package.json # npm deps + prom-client
│ │ └── Dockerfile
│ ├── dashboard-service/ # Python aggregator
│ │ ├── app.py # Flask with ThreadPoolExecutor
│ │ ├── requirements.txt
│ │ └── Dockerfile
│ ├── monitoring/ # Monitoring stack configuration
│ │ ├── prometheus.yml # Scrape configs + alert rules
│ │ ├── alert-rules.yml # 11 alert definitions
│ │ └── grafana/provisioning/ # Auto-configured datasources + dashboards
│ ├── scripts/
│ │ └── generate-traffic.sh # Traffic generation tool (5 modes)
│ ├── docker-compose.yml # Orchestrates all 6 containers
│ ├── README.md # Detailed project documentation
│ └── MONITORING.md # Monitoring setup guide
└── README.md # Repository overview
When making changes to a service:
- Edit the service code
- Rebuild only that service:
docker-compose build <service-name> - Restart the service:
docker-compose up -d <service-name> - Check logs:
docker-compose logs -f <service-name> - Test the endpoint:
curl http://localhost:<port>/api/<endpoint> - Verify metrics are updating:
curl http://localhost:<port>/metrics - Check Grafana dashboard for real-time impact
According to the project roadmap, upcoming topics include:
- Service discovery (Consul, Eureka)
- API gateways (Kong, Nginx)
- Kubernetes deployment
- Logging aggregation (ELK stack, Loki)
- Message queues (RabbitMQ, Kafka)
- Distributed tracing (Jaeger, Zipkin)
- Service mesh (Istio, Linkerd)