fix(pool): add connect timeout — prevent 20-second stall on Deluge reconnect#34
Merged
Conversation
…CP dial pool.getConn() called conn.Connect() synchronously, blocking the single pool worker goroutine for the full OS TCP timeout (~20 s on Windows) when a connection attempt hung. This stalled ALL concurrent HTTP requests. Root cause: go.mod upgraded from go 1.17 to go 1.25.0 activates Go 1.21+ DNS behaviour, where net.Dialer tries IPv6 (::1) before IPv4 for localhost. If IPv6 packets are dropped rather than refused, the SYN times out silently before falling back – blocking the pool worker for the entire duration. Fix: run Connect() in a goroutine and select on result vs. a configurable ConnectTimeout (default 10 s, overridable via --connect-timeout / POOL_CONNECT_TIMEOUT). Orphaned Connect() goroutines are cleaned up asynchronously on timeout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The torrent list was taking 20+ seconds to load. Root cause: the pool worker goroutine was blocking indefinitely inside
conn.Connect().pool.getConn()calledconn.Connect()synchronously. The pool has a single worker goroutine — while it is insidegetConn(), the entireselectloop inworker()is frozen. All concurrent HTTP requests waiting for a pool connection are also stuck (they cannot send onpool.getuntil the pool worker is ready to receive).conn.Connect()in go-libdeluge usesnew(net.Dialer)with no dial timeout. When the TCP connection hangs (no RST, just silence), Go waits for the OS SYN timeout — ~20 seconds on Windows.Why it regressed now
The April 26 commit changed
go.modfromgo 1.17→go 1.24, then the telemetry commit raised it togo 1.25.0. From Go 1.21+, thegodirective activates the pure Go DNS resolver, which returns both::1and127.0.0.1forlocalhost. If the Deluge daemon listens only on IPv4 and the system drops (rather than refuses) IPv6 packets, the SYN to::1:PORTsilently times out after ~20 s before Go falls back to IPv4.With
IdleConnectionTime = 30s, connections expire if idle for 30 seconds. Each reconnect event then triggers the 20-second hang, blocking all concurrent requests.Fix
Run
conn.Connect()in a goroutine andselecton the result against a configurable timeout (default 10 s, overridable via--connect-timeout/POOL_CONNECT_TIMEOUT). OrphanedConnect()goroutines on timeout are cleaned up asynchronously.The pool worker is still serialised (one connection established at a time), but it can no longer be stalled for longer than
ConnectTimeout.