-
Notifications
You must be signed in to change notification settings - Fork 182
Description
Environment
- EdgeVPN version: v0.31.1
- OS: Ubuntu (Oracle Cloud ARM instance)
- RAM: 24GB total
- Runtime: 7+ hours before OOM kill
Problem Description
EdgeVPN consumed progressively increasing memory over time, eventually reaching 14GB RAM and triggering the Linux OOM (Out of Memory) killer, which crashed the entire system and brought down all services.
Timeline
- 01:05 AM: EdgeVPN service started after system reboot
- 01:05 AM - 08:30 AM: Memory consumption grew from ~1.4GB to 14GB
- 00:59 AM (previous day): OOM killer terminated EdgeVPN process consuming 14GB
Evidence
OOM Killer Logs
[Fri Nov 14 00:59:15 2025] Out of memory: Killed process 3348128 (edgevpn)
total-vm:15707664kB, anon-rss:14202144kB, file-rss:0kB, shmem-rss:0kB, UID:0
pgtables:27984kB oom_score_adj:0
Process Stats at Runtime
USER PID %CPU %MEM VSZ RSS
root 3345319 56.8 6.0 3200528 1494936 edgevpn --address 10.1.0.10/24
- VSZ: 3.2GB virtual memory
- RSS: 1.5GB resident (was 14GB before kill)
- CPU: Sustained 56%+ CPU usage
Network Activity
EdgeVPN established 100+ peer connections to random nodes worldwide:
- 50+ TCP connections (Germany, Denmark, Brazil, Romania, AWS, etc.)
- 100+ UDP sockets across all network interfaces
- 4 mDNS sockets
- Continuous DHT peer discovery
Service Configuration
[Service]
Type=simple
ExecStart=/usr/local/bin/edgevpn --address 10.1.0.10/24
# NO memory limits set
# NO connection limits
EdgeVPN Logs
{"message":"connmanager disabled"}
{"message":"go-libp2p resource manager protection disabled"}
{"message":"Node ID: 12D3KooW9v9JTD9Jf1dGe32EaFCDHAZPtT5X91GroJfvtMavPG2G"}
{"message":"Bootstrapping DHT"}Root Cause Analysis
Based on the evidence and architecture documentation, the memory leak appears to be caused by:
1. Unbounded In-Memory Blockchain
- EdgeVPN maintains an in-memory blockchain for metadata (VPN IPs, DNS records, services)
- The blockchain grows indefinitely without pruning or garbage collection
- Every peer update adds a new block that persists in memory
2. Unlimited Peer Connections
- With
connmanager disabledandresource manager protection disabled, there are NO limits on:- Number of peer connections (observed 50+ TCP + 100+ UDP)
- Memory per connection
- Total memory consumption
- Each libp2p connection maintains state, buffers, and metadata in memory
3. Gossip Protocol Overhead
- Every blockchain message is broadcasted to ALL peers
- Each peer update triggers gossip propagation
- Message buffers accumulate in memory without bounds
- No rate limiting or backpressure mechanism
4. Disabled Resource Protection
From logs: go-libp2p resource manager protection disabled
This was likely disabled to avoid crashes (see libp2p/go-libp2p#1319), but it removes ALL memory safety mechanisms.
Proposed Solutions
-
Implement blockchain pruning
- Keep only last N blocks (e.g., 1000)
- Prune entries older than X hours
- Add configurable retention policy
-
Set default connection limits
const ( DefaultMaxPeers = 50 DefaultMaxInbound = 25 DefaultMaxOutbound = 25 )
-
Add memory-based backpressure
- Monitor memory usage
- Drop oldest blockchain entries when threshold reached
- Rate limit gossip message processing
-
Re-enable resource manager with safe defaults
- Or implement custom memory tracking
- Gracefully handle memory pressure
Long-term (Architecture)
-
Persistent blockchain with pruning
- Move blockchain to disk (SQLite/BadgerDB)
- Keep only recent entries in memory
- Implement LRU cache for hot data
-
Connection quality scoring
- Prefer high-quality, low-latency peers
- Drop poor-performing connections
- Implement peer reputation system
Workarounds (For Users)
Until fixed, users should:
-
Set systemd memory limits
sudo systemctl edit edgevpn.service
Add:
[Service] MemoryMax=1G Restart=on-failure -
Monitor and restart periodically
# Cron job to restart daily 0 3 * * * systemctl restart edgevpn
-
Use for short-lived connections only
- Not suitable for 24/7 daemons
- Restart after long sessions
Additional Context
This aligns with the project's own warnings:
"The underlying network is chatty. It uses a Gossip protocol for synchronizing the routing table and p2p. Every blockchain message is broadcasted to all peers... Might be not suited for low latency workload."
However, the memory implications of this design should be more prominently documented, especially since EdgeVPN is positioned as suitable for "edge and low-end devices."
System Info
EdgeVPN Version: v0.31.1
Go Version: 1.23.x
libp2p Version: v0.35.4
Platform: linux/arm64
Total RAM: 24GB
Swap: None configured (exacerbated the issue)
Related Issues
- go-libp2p-resource-manager panics on reserving memory libp2p/go-libp2p#1319 (libp2p resource manager panics)
- system: cannot reserve inbound connection: resource limit exceeded" #46 (connection resource limits)