Skip to content

Latest commit

 

History

History
790 lines (678 loc) · 35.8 KB

File metadata and controls

790 lines (678 loc) · 35.8 KB

📈 Nimaora - Scaling Architecture

Deep Dive into Handling 20,000+ Concurrent Users

How we architect for 1,500-2,000 RPS with sub-second latency


📋 Table of Contents


Executive Summary

Nimaora CodeBattle is designed to handle competitive programming battles at scale:

Challenge Solution Technology
20,000+ concurrent users Horizontal Pod Autoscaling Kubernetes HPA
1,500-2,000 RPS High-performance runtime Laravel Octane + Swoole
Real-time leaderboard O(log N) operations Redis Sorted Sets
Instant attack notifications Priority queues + WebSocket RabbitMQ + Reverb
Session management Distributed sessions Redis + sticky sessions
Database bottleneck Connection pooling PgBouncer

Scaling Challenges

The Problem

A typical coding competition with 20,000 participants generates:

Traffic Pattern During Competition:
├── Peak Join Rate: 1,000 users/minute at start
├── Answer Submissions: ~10 per user = 200,000 total
├── Leaderboard Queries: ~30 per user = 600,000 total
├── Attack Actions: ~5 per user = 100,000 total
├── Heartbeats: 1 per 30s × 20,000 × 60min = 2,400,000 total
└── Total Requests: ~3.3 million in 60 minutes
    Average RPS: ~920 (Peak: 2,000+)

Specific Challenges

Challenge Impact Severity
Thundering herd at battle start All users join simultaneously Critical
Leaderboard hotspot Frequent reads/writes to same data Critical
Attack processing Real-time notification requirements High
Session management Prevent duplicate logins High
WebSocket scaling 20K+ persistent connections High
Database connections Connection pool exhaustion Medium

Architecture Decisions

Why We Chose This Stack

Component Choice Reasoning
Backend Laravel 12 + Octane Familiar ecosystem, Swoole performance
Frontend Next.js 15 React 19, Server Components, Edge runtime
Database PostgreSQL 16 ACID compliance, advanced indexing
Cache Redis 7 Sorted sets, pub/sub, clustering
Queue RabbitMQ Reliability, priority queues, clustering
WebSocket Laravel Reverb Native Laravel integration
Load Balancer Traefik Dynamic configuration, WebSocket support

Design Principles

  1. Stateless Application Layer - Any pod can handle any request
  2. Data Near Compute - Cache hot data in Redis, not database
  3. Async by Default - Non-blocking I/O, queue heavy operations
  4. Graceful Degradation - Circuit breakers, fallbacks

Application Layer Scaling

Laravel Octane with Swoole

┌─────────────────────────────────────────────────────────────┐
│                    Laravel Octane Pod                        │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Swoole HTTP Server                      │    │
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐   │    │
│  │  │ Worker  │ │ Worker  │ │ Worker  │ │ Worker  │   │    │
│  │  │   1     │ │   2     │ │   3     │ │   N     │   │    │
│  │  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘   │    │
│  │       │           │           │           │         │    │
│  │  ┌────┴───────────┴───────────┴───────────┴────┐   │    │
│  │  │           Coroutine Pool (10K+)             │   │    │
│  │  └─────────────────────────────────────────────┘   │    │
│  └─────────────────────────────────────────────────────┘    │
│                           │                                  │
│  ┌────────────────────────┼────────────────────────────┐    │
│  │     Persistent Connections (Connection Pool)        │    │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐          │    │
│  │  │PostgreSQL│  │  Redis   │  │ RabbitMQ │          │    │
│  │  │  Pool    │  │  Pool    │  │  Pool    │          │    │
│  │  └──────────┘  └──────────┘  └──────────┘          │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

Configuration

// config/octane.php
return [
    'server' => 'swoole',
    'workers' => env('OCTANE_WORKERS', 'auto'),
    'task_workers' => env('OCTANE_TASK_WORKERS', 'auto'),
    'max_requests' => env('OCTANE_MAX_REQUESTS', 10000),
    'tick' => true,
    'tables' => [
        'battles' => [
            'columns' => [
                ['name' => 'participant_count', 'type' => 'int'],
            ],
            'rows' => 100,
        ],
    ],
];

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  minReplicas: 10
  maxReplicas: 100
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 70
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "150"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 200
          periodSeconds: 15
        - type: Pods
          value: 20
          periodSeconds: 15
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60

Scaling Behavior

Event Action Time
CPU > 60% Double pods 15 seconds
RPS > 150/pod Add 20 pods 15 seconds
Load decrease Reduce 10% 60 seconds (after 5min stable)

Database Optimization

Schema Design

CREATE TABLE battle_participants (
    id BIGSERIAL PRIMARY KEY,
    battle_id INTEGER NOT NULL REFERENCES battles(id),
    username VARCHAR(50) NOT NULL,
    session_id VARCHAR(100),
    points INTEGER DEFAULT 0,
    shields INTEGER DEFAULT 0,
    arrows INTEGER DEFAULT 0,
    is_active BOOLEAN DEFAULT TRUE,
    last_activity_at TIMESTAMP,
    created_at TIMESTAMP,
    updated_at TIMESTAMP,
    
    CONSTRAINT unique_battle_username UNIQUE (battle_id, username)
);

CREATE INDEX idx_participants_leaderboard 
    ON battle_participants (battle_id, points DESC);

CREATE INDEX idx_participants_session 
    ON battle_participants (battle_id, session_id) 
    WHERE is_active = TRUE;

CREATE INDEX idx_participants_active 
    ON battle_participants (battle_id, is_active, last_activity_at);

Query Optimization

SELECT username, points, shields, arrows
FROM battle_participants
WHERE battle_id = $1
ORDER BY points DESC
LIMIT 100;

EXPLAIN ANALYZE:
Index Scan using idx_participants_leaderboard on battle_participants
  Index Cond: (battle_id = 1)
  Rows: 100
  Time: 0.8ms

PgBouncer Connection Pooling

┌─────────────────────────────────────────────────────────────┐
│                     Application Pods                         │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐     ┌─────────┐       │
│  │  Pod 1  │ │  Pod 2  │ │  Pod 3  │ ... │  Pod N  │       │
│  └────┬────┘ └────┬────┘ └────┬────┘     └────┬────┘       │
│       │           │           │               │             │
│       └───────────┴─────┬─────┴───────────────┘             │
│                         │                                    │
│                         ▼                                    │
│            ┌────────────────────────┐                       │
│            │       PgBouncer        │                       │
│            │                        │                       │
│            │  Max Client: 10,000    │                       │
│            │  Pool Size: 100        │                       │
│            │  Mode: Transaction     │                       │
│            └───────────┬────────────┘                       │
│                        │                                     │
│                        ▼                                     │
│            ┌────────────────────────┐                       │
│            │     PostgreSQL 16      │                       │
│            │                        │                       │
│            │  Max Connections: 1000 │                       │
│            │  Shared Buffers: 4GB   │                       │
│            └────────────────────────┘                       │
└─────────────────────────────────────────────────────────────┘

PostgreSQL Tuning

max_connections = 1000
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 32MB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 64MB
random_page_cost = 1.1
effective_io_concurrency = 200
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

Caching Strategy

Redis Sorted Sets for Leaderboard

┌─────────────────────────────────────────────────────────────┐
│                 Redis Sorted Set                             │
│                                                              │
│  Key: battle:1:leaderboard                                  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Score (Points) │ Member (Username)                   │   │
│  ├──────────────────────────────────────────────────────┤   │
│  │      450        │ ali_programmer                      │   │
│  │      425        │ sara_coder                          │   │
│  │      380        │ reza_dev                            │   │
│  │      375        │ mina_tech                           │   │
│  │      ...        │ ...                                 │   │
│  │      0          │ new_user_20000                      │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  Operations:                                                 │
│  ├── ZADD: O(log N) - Add/update score                      │
│  ├── ZREVRANGE: O(log N + M) - Get top M                    │
│  ├── ZREVRANK: O(log N) - Get user rank                     │
│  └── ZINCRBY: O(log N) - Increment score                    │
└─────────────────────────────────────────────────────────────┘

Implementation

class LeaderboardCache
{
    private const KEY_PREFIX = 'leaderboard:';

    public function updateScore(int $battleId, string $username, int $score): void
    {
        $key = $this->getKey($battleId);
        Redis::zadd($key, $score, $username);
    }

    public function incrementScore(int $battleId, string $username, int $increment): int
    {
        $key = $this->getKey($battleId);
        return (int) Redis::zincrby($key, $increment, $username);
    }

    public function getTop(int $battleId, int $limit = 100): array
    {
        $key = $this->getKey($battleId);
        $result = Redis::zrevrange($key, 0, $limit - 1, 'WITHSCORES');
        return $this->formatLeaderboard($result);
    }

    public function getRank(int $battleId, string $username): ?int
    {
        $key = $this->getKey($battleId);
        $rank = Redis::zrevrank($key, $username);
        return $rank !== null ? $rank + 1 : null;
    }

    public function getAroundRank(int $battleId, string $username, int $range = 5): array
    {
        $key = $this->getKey($battleId);
        $rank = Redis::zrevrank($key, $username);
        
        if ($rank === null) return [];

        $start = max(0, $rank - $range);
        $end = $rank + $range;

        return Redis::zrevrange($key, $start, $end, 'WITHSCORES');
    }
}

Multi-Layer Caching

┌─────────────────────────────────────────────────────────────┐
│                    Cache Hierarchy                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Layer 1: CDN Edge Cache                                    │
│  ├── Static assets (JS, CSS, images)                        │
│  ├── TTL: 1 year (versioned)                                │
│  └── Hit rate: 95%+                                         │
│                           │                                  │
│                           ▼                                  │
│  Layer 2: Application Cache (Swoole APCu)                   │
│  ├── Configuration, routes                                  │
│  ├── TTL: Request lifetime                                  │
│  └── Hit rate: 100% (warm pods)                             │
│                           │                                  │
│                           ▼                                  │
│  Layer 3: Redis Cache                                       │
│  ├── Leaderboards (sorted sets)                             │
│  ├── Sessions                                               │
│  ├── Participant data                                       │
│  ├── TTL: 2-60 seconds (varies)                             │
│  └── Hit rate: 85%+                                         │
│                           │                                  │
│                           ▼                                  │
│  Layer 4: Database Query Cache                              │
│  ├── Prepared statements                                    │
│  └── PostgreSQL buffer cache                                │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Cache Invalidation Strategy

Data Type Invalidation Trigger Method
Leaderboard Score change Immediate update
Participant data Any mutation Event-driven
Session Heartbeat timeout TTL expiry
Questions Never during battle Pre-cached

Queue System Design

Priority Queue Architecture

┌─────────────────────────────────────────────────────────────┐
│                     RabbitMQ Cluster                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Exchange: nimaora.direct                                   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Queue: nimaora.attacks          Priority: 10        │   │
│  │  ├── Max workers: 30                                 │   │
│  │  ├── Messages: Attack processing                     │   │
│  │  └── SLA: < 100ms processing                         │   │
│  ├──────────────────────────────────────────────────────┤   │
│  │  Queue: nimaora.broadcast        Priority: 8         │   │
│  │  ├── Max workers: 100                                │   │
│  │  ├── Messages: WebSocket broadcasts                  │   │
│  │  └── SLA: < 200ms delivery                           │   │
│  ├──────────────────────────────────────────────────────┤   │
│  │  Queue: nimaora.leaderboard      Priority: 5         │   │
│  │  ├── Max workers: 20                                 │   │
│  │  ├── Messages: Leaderboard updates                   │   │
│  │  └── SLA: < 500ms                                    │   │
│  ├──────────────────────────────────────────────────────┤   │
│  │  Queue: nimaora.default          Priority: 5         │   │
│  │  ├── Max workers: 50                                 │   │
│  │  └── Messages: General jobs                          │   │
│  ├──────────────────────────────────────────────────────┤   │
│  │  Queue: nimaora.notifications    Priority: 3         │   │
│  │  ├── Max workers: 15                                 │   │
│  │  └── Messages: User notifications                    │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Laravel Horizon Configuration

'environments' => [
    'production' => [
        'supervisor-attacks' => [
            'connection' => 'rabbitmq',
            'queue' => ['attacks'],
            'balance' => 'auto',
            'minProcesses' => 5,
            'maxProcesses' => 30,
            'tries' => 3,
            'timeout' => 15,
            'nice' => -5,
        ],
        'supervisor-broadcast' => [
            'connection' => 'rabbitmq',
            'queue' => ['broadcast'],
            'balance' => 'auto',
            'minProcesses' => 10,
            'maxProcesses' => 100,
            'tries' => 1,
            'timeout' => 10,
            'nice' => -3,
        ],
        'supervisor-leaderboard' => [
            'connection' => 'rabbitmq',
            'queue' => ['leaderboard'],
            'balance' => 'auto',
            'minProcesses' => 3,
            'maxProcesses' => 20,
            'tries' => 3,
            'timeout' => 30,
        ],
        'supervisor-default' => [
            'connection' => 'rabbitmq',
            'queue' => ['default'],
            'balance' => 'simple',
            'minProcesses' => 5,
            'maxProcesses' => 50,
            'tries' => 3,
            'timeout' => 60,
        ],
        'supervisor-notifications' => [
            'connection' => 'rabbitmq',
            'queue' => ['notifications'],
            'balance' => 'simple',
            'minProcesses' => 2,
            'maxProcesses' => 15,
            'tries' => 3,
            'timeout' => 60,
            'nice' => 5,
        ],
    ],
],

Attack Processing Flow

┌─────────────────────────────────────────────────────────────┐
│                   Attack Processing Flow                     │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. User clicks Attack                                       │
│     │                                                        │
│     ▼                                                        │
│  2. HTTP Request → AttackController                          │
│     │  └── Validate attacker has arrows                      │
│     │  └── Validate target exists                            │
│     │  └── Validate target has points > 0                    │
│     │                                                        │
│     ▼                                                        │
│  3. Synchronous Processing (< 50ms)                          │
│     │  └── Decrement attacker arrows                         │
│     │  └── Process attack on target                          │
│     │      ├── If target has shield → use shield             │
│     │      └── Else → deduct 1 point                         │
│     │  └── Create attack record                              │
│     │                                                        │
│     ▼                                                        │
│  4. Queue Async Tasks                                        │
│     │  ├── BroadcastLeaderboardUpdate (if points changed)    │
│     │  └── Broadcast AttackReceived to target                │
│     │                                                        │
│     ▼                                                        │
│  5. Return Response to Attacker                              │
│     │  └── { blocked: false, points_deducted: 1 }            │
│     │                                                        │
│  ═══════════════════════════════════════════════════════════ │
│  │                                                           │
│  │  ASYNC (Queue Workers)                                    │
│  │                                                           │
│     ▼                                                        │
│  6. Process AttackReceived Event                             │
│     │  └── Laravel Reverb broadcasts to target's channel     │
│     │                                                        │
│     ▼                                                        │
│  7. Target receives WebSocket notification                   │
│     └── Modal shows: "You were attacked by {username}!"      │
│                                                              │
│  Total Time: < 100ms (user perception)                       │
└─────────────────────────────────────────────────────────────┘

Real-time Communication

WebSocket Architecture

┌─────────────────────────────────────────────────────────────┐
│                  WebSocket Architecture                      │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────────────────────────────────────────────┐    │
│  │                 Load Balancer (Traefik)              │    │
│  │                                                      │    │
│  │  Sticky Sessions: nimaora_ws cookie                  │    │
│  │  Health Check: /app/websocket-health                 │    │
│  └────────────────────────┬────────────────────────────┘    │
│                           │                                  │
│         ┌─────────────────┼─────────────────┐               │
│         ▼                 ▼                 ▼               │
│  ┌────────────┐   ┌────────────┐   ┌────────────┐          │
│  │  Reverb 1  │   │  Reverb 2  │   │  Reverb 3  │          │
│  │            │   │            │   │            │          │
│  │ 7K conn    │   │ 7K conn    │   │ 6K conn    │          │
│  └─────┬──────┘   └─────┬──────┘   └─────┬──────┘          │
│        │                │                │                  │
│        └────────────────┼────────────────┘                  │
│                         │                                    │
│                         ▼                                    │
│  ┌─────────────────────────────────────────────────────┐    │
│  │              Redis Pub/Sub Backbone                  │    │
│  │                                                      │    │
│  │  Channels:                                           │    │
│  │  ├── battle.{id} (public leaderboard)               │    │
│  │  ├── presence-battle.{id} (online users)            │    │
│  │  └── private-participant.{id} (attack alerts)       │    │
│  └─────────────────────────────────────────────────────┘    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Channel Types

Channel Pattern Purpose Subscribers
Public battle.{id} Leaderboard updates All participants
Presence presence-battle.{id} Online tracking All participants
Private private-participant.{id} Attack notifications Single user

Event Broadcasting

class AttackReceived implements ShouldBroadcast
{
    public function broadcastOn(): array
    {
        return [
            new PrivateChannel('participant.' . $this->attack->target_id),
        ];
    }

    public function broadcastAs(): string
    {
        return 'attack.received';
    }

    public function broadcastWith(): array
    {
        return [
            'attacker' => $this->attack->attacker->username,
            'blocked' => $this->attack->shield_blocked,
            'points_lost' => $this->attack->points_deducted,
            'timestamp' => $this->attack->created_at->toISOString(),
        ];
    }
}

Infrastructure Scaling

Kubernetes Resource Allocation

Backend Pod:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "1Gi"
    cpu: "1000m"
  replicas: 10-100

WebSocket Pod:
  requests:
    memory: "128Mi"
    cpu: "250m"
  limits:
    memory: "256Mi"
    cpu: "500m"
  replicas: 5-50

Horizon Pod:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "2Gi"
    cpu: "2000m"
  replicas: 3-20

Resource Distribution

Total Resources at Peak (20K users):
├── Backend: 100 pods × 1 CPU = 100 CPU cores
├── WebSocket: 20 pods × 0.5 CPU = 10 CPU cores
├── Horizon: 10 pods × 2 CPU = 20 CPU cores
├── Frontend: 15 pods × 1 CPU = 15 CPU cores
├── PostgreSQL: 4 CPU (primary)
├── Redis: 4 CPU (master + 2 replicas)
├── RabbitMQ: 2 CPU
└── Total: ~155 CPU cores

Performance Benchmarks

Load Test Results

Test Profile: stress (50K RPS target)
Duration: 45 minutes
Results:

┌─────────────────────────────────────────────────────────────┐
│                    Performance Summary                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Throughput                                                  │
│  ├── Total Requests: 4,897,126                              │
│  ├── RPS Average: 1,814                                     │
│  ├── RPS Peak: 2,234                                        │
│  └── Success Rate: 99.88%                                   │
│                                                              │
│  Latency                                                     │
│  ├── P50: 23.4ms                                            │
│  ├── P90: 89.3ms                                            │
│  ├── P95: 156ms                                             │
│  ├── P99: 423ms                                             │
│  └── Max: 4.89s                                             │
│                                                              │
│  Custom Metrics                                              │
│  ├── Join Success Rate: 99.89%                              │
│  ├── Answer Success Rate: 99.92%                            │
│  ├── Attack Success Rate: 97.23%                            │
│  └── Leaderboard P95: 89ms                                  │
│                                                              │
│  Infrastructure                                              │
│  ├── Backend Pods: 78 (autoscaled from 10)                  │
│  ├── WebSocket Connections: 21,456                          │
│  ├── Database Connections: 412 (via PgBouncer)              │
│  └── Redis Memory: 2.1GB                                    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Performance Comparison

Metric Without Optimization With Optimization Improvement
Max RPS 200 2,000+ 10x
P95 Latency 2.5s 156ms 16x
DB Connections 10,000 100 (pooled) 100x
Leaderboard Query 850ms 34ms 25x
WebSocket Scale 1,000 20,000+ 20x

Future Improvements

Short-term (1-3 months)

Improvement Benefit Effort
Redis Cluster Higher throughput Medium
Read replicas Scale reads Medium
Rate limiting per user Prevent abuse Low
Circuit breaker tuning Better resilience Low

Medium-term (3-6 months)

Improvement Benefit Effort
Event sourcing Replay, audit High
CQRS pattern Separate read/write High
GraphQL subscriptions Efficient real-time Medium
Edge computing Lower latency Medium

Long-term (6-12 months)

Improvement Benefit Effort
Multi-region Geographic distribution Very High
ML-based scaling Predictive autoscaling High
Custom WebSocket server Ultimate performance Very High
Conflict-free replicated data Consistency without locks Very High

🚀 Built for Performance | 📈 Designed for Scale | ⚡ Optimized for Speed