📈 Nimaora - Scaling Architecture
Deep Dive into Handling 20,000+ Concurrent Users
How we architect for 1,500-2,000 RPS with sub-second latency
Nimaora CodeBattle is designed to handle competitive programming battles at scale:
Challenge
Solution
Technology
20,000+ concurrent users
Horizontal Pod Autoscaling
Kubernetes HPA
1,500-2,000 RPS
High-performance runtime
Laravel Octane + Swoole
Real-time leaderboard
O(log N) operations
Redis Sorted Sets
Instant attack notifications
Priority queues + WebSocket
RabbitMQ + Reverb
Session management
Distributed sessions
Redis + sticky sessions
Database bottleneck
Connection pooling
PgBouncer
A typical coding competition with 20,000 participants generates:
Traffic Pattern During Competition:
├── Peak Join Rate: 1,000 users/minute at start
├── Answer Submissions: ~10 per user = 200,000 total
├── Leaderboard Queries: ~30 per user = 600,000 total
├── Attack Actions: ~5 per user = 100,000 total
├── Heartbeats: 1 per 30s × 20,000 × 60min = 2,400,000 total
└── Total Requests: ~3.3 million in 60 minutes
Average RPS: ~920 (Peak: 2,000+)
Challenge
Impact
Severity
Thundering herd at battle start
All users join simultaneously
Critical
Leaderboard hotspot
Frequent reads/writes to same data
Critical
Attack processing
Real-time notification requirements
High
Session management
Prevent duplicate logins
High
WebSocket scaling
20K+ persistent connections
High
Database connections
Connection pool exhaustion
Medium
Component
Choice
Reasoning
Backend
Laravel 12 + Octane
Familiar ecosystem, Swoole performance
Frontend
Next.js 15
React 19, Server Components, Edge runtime
Database
PostgreSQL 16
ACID compliance, advanced indexing
Cache
Redis 7
Sorted sets, pub/sub, clustering
Queue
RabbitMQ
Reliability, priority queues, clustering
WebSocket
Laravel Reverb
Native Laravel integration
Load Balancer
Traefik
Dynamic configuration, WebSocket support
Stateless Application Layer - Any pod can handle any request
Data Near Compute - Cache hot data in Redis, not database
Async by Default - Non-blocking I/O, queue heavy operations
Graceful Degradation - Circuit breakers, fallbacks
Application Layer Scaling
Laravel Octane with Swoole
┌─────────────────────────────────────────────────────────────┐
│ Laravel Octane Pod │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Swoole HTTP Server │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │ Worker │ │ Worker │ │ Worker │ │ Worker │ │ │
│ │ │ 1 │ │ 2 │ │ 3 │ │ N │ │ │
│ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │ │ │
│ │ ┌────┴───────────┴───────────┴───────────┴────┐ │ │
│ │ │ Coroutine Pool (10K+) │ │ │
│ │ └─────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────┼────────────────────────────┐ │
│ │ Persistent Connections (Connection Pool) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │PostgreSQL│ │ Redis │ │ RabbitMQ │ │ │
│ │ │ Pool │ │ Pool │ │ Pool │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
// config/octane.php
return [
'server ' => 'swoole ' ,
'workers ' => env ('OCTANE_WORKERS ' , 'auto ' ),
'task_workers ' => env ('OCTANE_TASK_WORKERS ' , 'auto ' ),
'max_requests ' => env ('OCTANE_MAX_REQUESTS ' , 10000 ),
'tick ' => true ,
'tables ' => [
'battles ' => [
'columns ' => [
['name ' => 'participant_count ' , 'type ' => 'int ' ],
],
'rows ' => 100 ,
],
],
];
Horizontal Pod Autoscaling
apiVersion : autoscaling/v2
kind : HorizontalPodAutoscaler
metadata :
name : backend-hpa
spec :
scaleTargetRef :
apiVersion : apps/v1
kind : Deployment
name : backend
minReplicas : 10
maxReplicas : 100
metrics :
- type : Resource
resource :
name : cpu
target :
type : Utilization
averageUtilization : 60
- type : Resource
resource :
name : memory
target :
type : Utilization
averageUtilization : 70
- type : Pods
pods :
metric :
name : http_requests_per_second
target :
type : AverageValue
averageValue : " 150"
behavior :
scaleUp :
stabilizationWindowSeconds : 0
policies :
- type : Percent
value : 200
periodSeconds : 15
- type : Pods
value : 20
periodSeconds : 15
selectPolicy : Max
scaleDown :
stabilizationWindowSeconds : 300
policies :
- type : Percent
value : 10
periodSeconds : 60
Event
Action
Time
CPU > 60%
Double pods
15 seconds
RPS > 150/pod
Add 20 pods
15 seconds
Load decrease
Reduce 10%
60 seconds (after 5min stable)
CREATE TABLE battle_participants (
id BIGSERIAL PRIMARY KEY ,
battle_id INTEGER NOT NULL REFERENCES battles(id),
username VARCHAR (50 ) NOT NULL ,
session_id VARCHAR (100 ),
points INTEGER DEFAULT 0 ,
shields INTEGER DEFAULT 0 ,
arrows INTEGER DEFAULT 0 ,
is_active BOOLEAN DEFAULT TRUE,
last_activity_at TIMESTAMP ,
created_at TIMESTAMP ,
updated_at TIMESTAMP ,
CONSTRAINT unique_battle_username UNIQUE (battle_id, username)
);
CREATE INDEX idx_participants_leaderboard
ON battle_participants (battle_id, points DESC );
CREATE INDEX idx_participants_session
ON battle_participants (battle_id, session_id)
WHERE is_active = TRUE;
CREATE INDEX idx_participants_active
ON battle_participants (battle_id, is_active, last_activity_at);
SELECT username, points, shields, arrows
FROM battle_participants
WHERE battle_id = $1
ORDER BY points DESC
LIMIT 100 ;
EXPLAIN ANALYZE:
Index Scan using idx_participants_leaderboard on battle_participants
Index Cond: (battle_id = 1 )
Rows: 100
Time : 0 .8ms
PgBouncer Connection Pooling
┌─────────────────────────────────────────────────────────────┐
│ Application Pods │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Pod 1 │ │ Pod 2 │ │ Pod 3 │ ... │ Pod N │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └───────────┴─────┬─────┴───────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ PgBouncer │ │
│ │ │ │
│ │ Max Client: 10,000 │ │
│ │ Pool Size: 100 │ │
│ │ Mode: Transaction │ │
│ └───────────┬────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ PostgreSQL 16 │ │
│ │ │ │
│ │ Max Connections: 1000 │ │
│ │ Shared Buffers: 4GB │ │
│ └────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
max_connections = 1000
shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 32MB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 64MB
random_page_cost = 1.1
effective_io_concurrency = 200
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
Redis Sorted Sets for Leaderboard
┌─────────────────────────────────────────────────────────────┐
│ Redis Sorted Set │
│ │
│ Key: battle:1:leaderboard │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Score (Points) │ Member (Username) │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ 450 │ ali_programmer │ │
│ │ 425 │ sara_coder │ │
│ │ 380 │ reza_dev │ │
│ │ 375 │ mina_tech │ │
│ │ ... │ ... │ │
│ │ 0 │ new_user_20000 │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ Operations: │
│ ├── ZADD: O(log N) - Add/update score │
│ ├── ZREVRANGE: O(log N + M) - Get top M │
│ ├── ZREVRANK: O(log N) - Get user rank │
│ └── ZINCRBY: O(log N) - Increment score │
└─────────────────────────────────────────────────────────────┘
class LeaderboardCache
{
private const KEY_PREFIX = 'leaderboard: ' ;
public function updateScore (int $ battleId , string $ username , int $ score ): void
{
$ key = $ this ->getKey ($ battleId );
Redis::zadd ($ key , $ score , $ username );
}
public function incrementScore (int $ battleId , string $ username , int $ increment ): int
{
$ key = $ this ->getKey ($ battleId );
return (int ) Redis::zincrby ($ key , $ increment , $ username );
}
public function getTop (int $ battleId , int $ limit = 100 ): array
{
$ key = $ this ->getKey ($ battleId );
$ result = Redis::zrevrange ($ key , 0 , $ limit - 1 , 'WITHSCORES ' );
return $ this ->formatLeaderboard ($ result );
}
public function getRank (int $ battleId , string $ username ): ?int
{
$ key = $ this ->getKey ($ battleId );
$ rank = Redis::zrevrank ($ key , $ username );
return $ rank !== null ? $ rank + 1 : null ;
}
public function getAroundRank (int $ battleId , string $ username , int $ range = 5 ): array
{
$ key = $ this ->getKey ($ battleId );
$ rank = Redis::zrevrank ($ key , $ username );
if ($ rank === null ) return [];
$ start = max (0 , $ rank - $ range );
$ end = $ rank + $ range ;
return Redis::zrevrange ($ key , $ start , $ end , 'WITHSCORES ' );
}
}
┌─────────────────────────────────────────────────────────────┐
│ Cache Hierarchy │
├─────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: CDN Edge Cache │
│ ├── Static assets (JS, CSS, images) │
│ ├── TTL: 1 year (versioned) │
│ └── Hit rate: 95%+ │
│ │ │
│ ▼ │
│ Layer 2: Application Cache (Swoole APCu) │
│ ├── Configuration, routes │
│ ├── TTL: Request lifetime │
│ └── Hit rate: 100% (warm pods) │
│ │ │
│ ▼ │
│ Layer 3: Redis Cache │
│ ├── Leaderboards (sorted sets) │
│ ├── Sessions │
│ ├── Participant data │
│ ├── TTL: 2-60 seconds (varies) │
│ └── Hit rate: 85%+ │
│ │ │
│ ▼ │
│ Layer 4: Database Query Cache │
│ ├── Prepared statements │
│ └── PostgreSQL buffer cache │
│ │
└─────────────────────────────────────────────────────────────┘
Cache Invalidation Strategy
Data Type
Invalidation Trigger
Method
Leaderboard
Score change
Immediate update
Participant data
Any mutation
Event-driven
Session
Heartbeat timeout
TTL expiry
Questions
Never during battle
Pre-cached
Priority Queue Architecture
┌─────────────────────────────────────────────────────────────┐
│ RabbitMQ Cluster │
├─────────────────────────────────────────────────────────────┤
│ │
│ Exchange: nimaora.direct │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Queue: nimaora.attacks Priority: 10 │ │
│ │ ├── Max workers: 30 │ │
│ │ ├── Messages: Attack processing │ │
│ │ └── SLA: < 100ms processing │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Queue: nimaora.broadcast Priority: 8 │ │
│ │ ├── Max workers: 100 │ │
│ │ ├── Messages: WebSocket broadcasts │ │
│ │ └── SLA: < 200ms delivery │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Queue: nimaora.leaderboard Priority: 5 │ │
│ │ ├── Max workers: 20 │ │
│ │ ├── Messages: Leaderboard updates │ │
│ │ └── SLA: < 500ms │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Queue: nimaora.default Priority: 5 │ │
│ │ ├── Max workers: 50 │ │
│ │ └── Messages: General jobs │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ Queue: nimaora.notifications Priority: 3 │ │
│ │ ├── Max workers: 15 │ │
│ │ └── Messages: User notifications │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Laravel Horizon Configuration
'environments ' => [
'production ' => [
'supervisor-attacks ' => [
'connection ' => 'rabbitmq ' ,
'queue ' => ['attacks ' ],
'balance ' => 'auto ' ,
'minProcesses ' => 5 ,
'maxProcesses ' => 30 ,
'tries ' => 3 ,
'timeout ' => 15 ,
'nice ' => -5 ,
],
'supervisor-broadcast ' => [
'connection ' => 'rabbitmq ' ,
'queue ' => ['broadcast ' ],
'balance ' => 'auto ' ,
'minProcesses ' => 10 ,
'maxProcesses ' => 100 ,
'tries ' => 1 ,
'timeout ' => 10 ,
'nice ' => -3 ,
],
'supervisor-leaderboard ' => [
'connection ' => 'rabbitmq ' ,
'queue ' => ['leaderboard ' ],
'balance ' => 'auto ' ,
'minProcesses ' => 3 ,
'maxProcesses ' => 20 ,
'tries ' => 3 ,
'timeout ' => 30 ,
],
'supervisor-default ' => [
'connection ' => 'rabbitmq ' ,
'queue ' => ['default ' ],
'balance ' => 'simple ' ,
'minProcesses ' => 5 ,
'maxProcesses ' => 50 ,
'tries ' => 3 ,
'timeout ' => 60 ,
],
'supervisor-notifications ' => [
'connection ' => 'rabbitmq ' ,
'queue ' => ['notifications ' ],
'balance ' => 'simple ' ,
'minProcesses ' => 2 ,
'maxProcesses ' => 15 ,
'tries ' => 3 ,
'timeout ' => 60 ,
'nice ' => 5 ,
],
],
],
┌─────────────────────────────────────────────────────────────┐
│ Attack Processing Flow │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. User clicks Attack │
│ │ │
│ ▼ │
│ 2. HTTP Request → AttackController │
│ │ └── Validate attacker has arrows │
│ │ └── Validate target exists │
│ │ └── Validate target has points > 0 │
│ │ │
│ ▼ │
│ 3. Synchronous Processing (< 50ms) │
│ │ └── Decrement attacker arrows │
│ │ └── Process attack on target │
│ │ ├── If target has shield → use shield │
│ │ └── Else → deduct 1 point │
│ │ └── Create attack record │
│ │ │
│ ▼ │
│ 4. Queue Async Tasks │
│ │ ├── BroadcastLeaderboardUpdate (if points changed) │
│ │ └── Broadcast AttackReceived to target │
│ │ │
│ ▼ │
│ 5. Return Response to Attacker │
│ │ └── { blocked: false, points_deducted: 1 } │
│ │ │
│ ═══════════════════════════════════════════════════════════ │
│ │ │
│ │ ASYNC (Queue Workers) │
│ │ │
│ ▼ │
│ 6. Process AttackReceived Event │
│ │ └── Laravel Reverb broadcasts to target's channel │
│ │ │
│ ▼ │
│ 7. Target receives WebSocket notification │
│ └── Modal shows: "You were attacked by {username}!" │
│ │
│ Total Time: < 100ms (user perception) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ WebSocket Architecture │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Load Balancer (Traefik) │ │
│ │ │ │
│ │ Sticky Sessions: nimaora_ws cookie │ │
│ │ Health Check: /app/websocket-health │ │
│ └────────────────────────┬────────────────────────────┘ │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Reverb 1 │ │ Reverb 2 │ │ Reverb 3 │ │
│ │ │ │ │ │ │ │
│ │ 7K conn │ │ 7K conn │ │ 6K conn │ │
│ └─────┬──────┘ └─────┬──────┘ └─────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Redis Pub/Sub Backbone │ │
│ │ │ │
│ │ Channels: │ │
│ │ ├── battle.{id} (public leaderboard) │ │
│ │ ├── presence-battle.{id} (online users) │ │
│ │ └── private-participant.{id} (attack alerts) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Channel
Pattern
Purpose
Subscribers
Public
battle.{id}
Leaderboard updates
All participants
Presence
presence-battle.{id}
Online tracking
All participants
Private
private-participant.{id}
Attack notifications
Single user
class AttackReceived implements ShouldBroadcast
{
public function broadcastOn (): array
{
return [
new PrivateChannel ('participant. ' . $ this ->attack ->target_id ),
];
}
public function broadcastAs (): string
{
return 'attack.received ' ;
}
public function broadcastWith (): array
{
return [
'attacker ' => $ this ->attack ->attacker ->username ,
'blocked ' => $ this ->attack ->shield_blocked ,
'points_lost ' => $ this ->attack ->points_deducted ,
'timestamp ' => $ this ->attack ->created_at ->toISOString (),
];
}
}
Kubernetes Resource Allocation
Backend Pod :
requests :
memory : " 512Mi"
cpu : " 500m"
limits :
memory : " 1Gi"
cpu : " 1000m"
replicas : 10-100
WebSocket Pod :
requests :
memory : " 128Mi"
cpu : " 250m"
limits :
memory : " 256Mi"
cpu : " 500m"
replicas : 5-50
Horizon Pod :
requests :
memory : " 512Mi"
cpu : " 500m"
limits :
memory : " 2Gi"
cpu : " 2000m"
replicas : 3-20
Total Resources at Peak (20K users):
├── Backend: 100 pods × 1 CPU = 100 CPU cores
├── WebSocket: 20 pods × 0.5 CPU = 10 CPU cores
├── Horizon: 10 pods × 2 CPU = 20 CPU cores
├── Frontend: 15 pods × 1 CPU = 15 CPU cores
├── PostgreSQL: 4 CPU (primary)
├── Redis: 4 CPU (master + 2 replicas)
├── RabbitMQ: 2 CPU
└── Total: ~155 CPU cores
Test Profile: stress (50K RPS target)
Duration: 45 minutes
Results:
┌─────────────────────────────────────────────────────────────┐
│ Performance Summary │
├─────────────────────────────────────────────────────────────┤
│ │
│ Throughput │
│ ├── Total Requests: 4,897,126 │
│ ├── RPS Average: 1,814 │
│ ├── RPS Peak: 2,234 │
│ └── Success Rate: 99.88% │
│ │
│ Latency │
│ ├── P50: 23.4ms │
│ ├── P90: 89.3ms │
│ ├── P95: 156ms │
│ ├── P99: 423ms │
│ └── Max: 4.89s │
│ │
│ Custom Metrics │
│ ├── Join Success Rate: 99.89% │
│ ├── Answer Success Rate: 99.92% │
│ ├── Attack Success Rate: 97.23% │
│ └── Leaderboard P95: 89ms │
│ │
│ Infrastructure │
│ ├── Backend Pods: 78 (autoscaled from 10) │
│ ├── WebSocket Connections: 21,456 │
│ ├── Database Connections: 412 (via PgBouncer) │
│ └── Redis Memory: 2.1GB │
│ │
└─────────────────────────────────────────────────────────────┘
Metric
Without Optimization
With Optimization
Improvement
Max RPS
200
2,000+
10x
P95 Latency
2.5s
156ms
16x
DB Connections
10,000
100 (pooled)
100x
Leaderboard Query
850ms
34ms
25x
WebSocket Scale
1,000
20,000+
20x
Improvement
Benefit
Effort
Redis Cluster
Higher throughput
Medium
Read replicas
Scale reads
Medium
Rate limiting per user
Prevent abuse
Low
Circuit breaker tuning
Better resilience
Low
Improvement
Benefit
Effort
Event sourcing
Replay, audit
High
CQRS pattern
Separate read/write
High
GraphQL subscriptions
Efficient real-time
Medium
Edge computing
Lower latency
Medium
Improvement
Benefit
Effort
Multi-region
Geographic distribution
Very High
ML-based scaling
Predictive autoscaling
High
Custom WebSocket server
Ultimate performance
Very High
Conflict-free replicated data
Consistency without locks
Very High
🚀 Built for Performance | 📈 Designed for Scale | ⚡ Optimized for Speed