fix: autorelay reservation for private nodes#584
fix: autorelay reservation for private nodes#584fanny7d wants to merge 1 commit intokubeedge:mainfrom
Conversation
|
Welcome @fanny7d! It looks like this is your first PR to kubeedge/edgemesh 🎉 |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: fanny7d The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a critical bug that prevented private nodes from successfully reserving autorelay slots within the libp2p network. The fix involves refining the AutoRelay configuration parameters, specifically ensuring that the minimum number of candidates for relay discovery is always met and optimizing relay selection for single-relay environments. Additionally, it enhances the logic for determining network reachability in complex setups, leading to more robust and reliable peer connectivity. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a critical bug in autorelay reservation for private nodes by ensuring minCandidates is at least 1. The changes are logical and well-supported by new helper functions and unit tests. My review includes a fix for a subtle bug in the new normalizeAutoRelayConfig function, suggestions to improve code clarity by simplifying variable handling, and a recommendation to enhance test coverage for an identified edge case. I also noted some comments that should be translated to English for consistency.
pkg/tunnel/module.go
Outdated
| // normalizeAutoRelayConfig 计算归一化后的 autorelay 参数。 | ||
| // | ||
| // 修复 WithMinCandidates(0) 导致 peerSource 永远不被触发的 bug(issue #583): | ||
| // - minCandidates 至少为 1,确保 relay_finder 能启动 peerSource 并进入 Reserve() 路径 | ||
| // - maxCandidates 不小于 relayNums,保证候选池容量够用 | ||
| // - 单 relay 场景返回 numRelays=1,避免默认值 desiredRelays=2 导致一直等待第二个 relay | ||
| func normalizeAutoRelayConfig(cfgMaxCandidates, relayNums int) (minCandidates, maxCandidates, numRelays int) { | ||
| maxCandidates = cfgMaxCandidates | ||
| if maxCandidates < relayNums { | ||
| maxCandidates = relayNums | ||
| } | ||
| // minCandidates 必须 >= 1,否则 relay_finder.findNodes() 中的条件 | ||
| // `numCandidates < minCandidates` 永远为假,peerSource 不会被调用。 | ||
| minCandidates = 1 | ||
| if maxCandidates > 0 && minCandidates > maxCandidates { | ||
| minCandidates = maxCandidates | ||
| } | ||
| // 单 relay 场景显式设置 desiredRelays=1, | ||
| // 避免默认值 desiredRelays=2 导致一直等待第二个 relay。 | ||
| if relayNums == 1 { | ||
| numRelays = 1 | ||
| } | ||
| return | ||
| } | ||
|
|
||
| // buildAutoRelayOpts 根据归一化参数构建 autorelay 选项列表。 | ||
| func buildAutoRelayOpts(minCandidates, maxCandidates, numRelays int) []autorelay.Option { | ||
| opts := []autorelay.Option{ | ||
| autorelay.WithMinCandidates(minCandidates), | ||
| autorelay.WithMaxCandidates(maxCandidates), | ||
| autorelay.WithBackoff(30 * time.Second), | ||
| } | ||
| if numRelays > 0 { | ||
| opts = append(opts, autorelay.WithNumRelays(numRelays)) | ||
| } | ||
| return opts | ||
| } |
There was a problem hiding this comment.
The logic in normalizeAutoRelayConfig for setting minCandidates has a potential bug. If cfgMaxCandidates and relayNums are both 0, maxCandidates will be 0, but minCandidates will be 1, resulting in an invalid configuration where minCandidates > maxCandidates. The logic should ensure minCandidates is 0 if maxCandidates is 0.
Additionally, the function comments for normalizeAutoRelayConfig and buildAutoRelayOpts are in Chinese and should be translated to English for consistency with the rest of the codebase.
// normalizeAutoRelayConfig calculates the normalized autorelay parameters.
//
// It fixes a bug (issue #583) where WithMinCandidates(0) prevents peerSource from ever being triggered:
// - minCandidates is set to at least 1 to ensure the relay_finder starts peerSource and enters the Reserve() path.
// - maxCandidates is ensured to be no less than relayNums to provide sufficient capacity for the candidate pool.
// - For single-relay scenarios, numRelays is set to 1 to avoid the default desiredRelays=2, which would cause indefinite waiting for a second relay.
func normalizeAutoRelayConfig(cfgMaxCandidates, relayNums int) (minCandidates, maxCandidates, numRelays int) {
maxCandidates = cfgMaxCandidates
if maxCandidates < relayNums {
maxCandidates = relayNums
}
// minCandidates must be >= 1 to trigger peerSource, unless there are no candidates possible.
// The condition `numCandidates < minCandidates` in relay_finder.findNodes() will never be true
// if minCandidates is 0, thus peerSource will not be called.
if maxCandidates > 0 {
minCandidates = 1
} else {
minCandidates = 0
}
// For single-relay scenarios, explicitly set desiredRelays=1.
// This avoids the default (desiredRelays=2) which would cause waiting for a second relay that doesn't exist.
if relayNums == 1 {
numRelays = 1
}
return
}
// buildAutoRelayOpts builds the list of autorelay options from the normalized parameters.
func buildAutoRelayOpts(minCandidates, maxCandidates, numRelays int) []autorelay.Option {
opts := []autorelay.Option{
autorelay.WithMinCandidates(minCandidates),
autorelay.WithMaxCandidates(maxCandidates),
autorelay.WithBackoff(30 * time.Second),
}
if numRelays > 0 {
opts = append(opts, autorelay.WithNumRelays(numRelays))
}
return opts
}| minCandidates, maxCandidates, numRelays := normalizeAutoRelayConfig(c.MaxCandidates, relayNums) | ||
| if maxCandidates != c.MaxCandidates { | ||
| klog.Infof("MaxCandidates adjusted from %d to %d (relayNums=%d)", | ||
| c.MaxCandidates, maxCandidates, relayNums) | ||
| c.MaxCandidates = maxCandidates | ||
| } |
There was a problem hiding this comment.
The current implementation for handling maxCandidates is a bit convoluted. It calculates a new maxCandidates, logs an update, modifies the input configuration c.MaxCandidates, and then uses this modified value in buildAutoRelayOpts.
For better clarity and to avoid side effects on the input configuration, it's preferable to use the local maxCandidates variable directly.
I suggest removing the update to c.MaxCandidates here, and then passing the local maxCandidates variable to buildAutoRelayOpts on line 174 like so: buildAutoRelayOpts(minCandidates, maxCandidates, numRelays)...,.
minCandidates, maxCandidates, numRelays := normalizeAutoRelayConfig(c.MaxCandidates, relayNums)
if maxCandidates != c.MaxCandidates {
klog.Infof("MaxCandidates adjusted from %d to %d (relayNums=%d)",
c.MaxCandidates, maxCandidates, relayNums)
}| wantMaxCandidates: 4, | ||
| wantNumRelays: 0, | ||
| }, | ||
| } |
There was a problem hiding this comment.
The test suite for TestNormalizeAutoRelayConfig is missing a key edge case: when both cfgMaxCandidates and relayNums are 0. This case would have exposed the bug in the original implementation of normalizeAutoRelayConfig. Adding this test case will help prevent regressions.
Also, some comments in this file are in Chinese (e.g., lines 84 and 92). It would be great to translate them to English for consistency.
},
{
// cfgMaxCandidates=0, no relay nodes
// minCandidates should be 0 if maxCandidates is 0
name: "zero maxCandidates with no relays",
cfgMaxCandidates: 0,
relayNums: 0,
wantMinCandidates: 0,
wantMaxCandidates: 0,
wantNumRelays: 0,
},0f506af to
ec22ef5
Compare
ec22ef5 to
823c850
Compare
6316ef9 to
b040d3e
Compare
In go-libp2p autorelay, peerSource is only invoked when numCandidates < minCandidates. Using WithMinCandidates(0) means the condition is never true, so peerSource is never called and no relay reservation ever happens, resulting in NO_RESERVATION (204). Fix by normalizing autorelay config: set minCandidates>=1, adjust maxCandidates to cover all relay nodes, and set numRelays=1 for single-relay setups to avoid waiting for a second relay. Signed-off-by: fanny7d <fanchao1510@outlook.com>
b040d3e to
421e17d
Compare
What this PR does / why we need it
Fix autorelay reservation failure for private nodes (closes #583).
Root cause:
pkg/tunnel/module.gousedWithMinCandidates(0)whenconfiguring libp2p AutoRelay. In go-libp2p autorelay,
peerSourceisonly invoked when
numCandidates < minCandidates. WithminCandidates=0the condition is never true, so
peerSourceis never called and no relayreservation is ever attempted → continuous
NO_RESERVATION (204).Changes
pkg/tunnel/module.go: AddnormalizeAutoRelayConfig()to setminCandidates ≥ 1so peerSource is always triggered. For single-relaysetups, set
numRelays=1to avoid waiting for a second relay(default
desiredRelays=2). ExtractShouldForceReachabilityPrivate()to correctly handle ACK+NLB fronted-relay scenarios.
pkg/tunnel/util.go: AddShouldForceReachabilityPrivate()andbuildAutoRelayOpts()helpers.module_test.goandutil_test.go.Which issue(s) this PR fixes
Fixes #583
Special notes for reviewers
Verified with EdgeMesh v1.17.0 + go-libp2p v0.28.2: relay side shows
reserving relay slot for ...andNO_RESERVATIONno longer appears.