fix(meshcore): retry initial auto-connect attempt with backoff#3919
fix(meshcore): retry initial auto-connect attempt with backoff#3919Yeraze wants to merge 4 commits into
Conversation
MeshCoreManager.connect() previously made exactly one attempt and gave up on failure, leaving the source stuck disconnected until a manual Connect click — even with "Automatically connect on startup" enabled. Unlike Meshtastic TCP sources, whose transport retries forever with backoff by default, MeshCore's only retry machinery only covered a drop after an already-successful connect, and only when the opt-in heartbeat feature was configured. connect() now arms shouldReconnect and schedules a backoff retry (scheduleNextReconnect/attemptReconnect) when the first attempt fails, and resets shouldReconnect back to false on a successful connect so the heartbeat feature's own gating for post-connect drops is unaffected. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gr4SAmd6b69UzqEnamyc4c
|
Claude finished @Yeraze's task —— View job PR Review
SummaryThis is a well-targeted, minimal fix. The implementation is clean, the reasoning is sound, and the test coverage is thorough. A few observations follow. Code Quality
The catch block change is straightforward and correctly ordered:
The order matters — if steps 1 and 2 were swapped,
Resetting One subtle behavioral note worth considering:
const maxAttempts = this.config?.reconnectMaxAttempts ?? 0;
if (maxAttempts > 0 && this.reconnectAttempts >= maxAttempts) {
this.connectionState = 'failed';
this.emit('reconnect_giveup', ...);
return;
}The default is Test CoverageThe three test cases in
The update to Minor test note: In the third test ( Potential ConcernsNo concern with infinite retry for initially-failing sources: Already the behavior for post-connect drops with heartbeat configured, and for Meshtastic TCP. Consistent.
VerdictApprove. The fix is minimal, correctly targeted, well-commented, and regression-tested. No blocking issues found. |
Nit from PR review on #3919 — attemptReconnect() resolves to undefined, so voiding the awaited result was a no-op that read as if it meant something. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gr4SAmd6b69UzqEnamyc4c
|
Claude finished @Yeraze's task —— View job PR Review
SummaryThis is a well-targeted fix for a real UX problem. The core change is minimal and the reasoning is clearly explained in the PR description and inline comments. However, there is one behavioral bug introduced by the fix that needs attention before merge. Bug: Double-
|
Flagged by automated PR review: connect()'s catch block schedules a retry, and attemptReconnect()'s own `!ok` branch then scheduled a second one on top of it every time the retry loop drove a failing connect(). That doubled reconnectAttempts (and the backoff growth it drives), left the first timer orphaned, and emitted 'reconnecting' twice per failure. scheduleNextReconnect() now no-ops if a reconnect is already pending, making it safe to call from multiple points in the same failure path. Added a regression test using the real (unstubbed) scheduler to catch this — verified it fails without the guard (reconnectAttempts == 2) and passes with it. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gr4SAmd6b69UzqEnamyc4c
|
Good catch — that was a real bug. Fixed in 77adfb5: Added a regression test using the real (unstubbed) scheduler — confirmed it fails without the guard ( -- Authored by Roger 🤓 Generated by Claude Code |
|
Claude finished @Yeraze's task —— View job PR Review
SummaryThis is a clean, well-targeted fix. The previous automated reviews already caught and resolved the main behavioral regression (double-
|
Nit from PR review on #3919 — the two success-path tests reached connect()'s real distanceDeleteScheduler.start(), which hits databaseService.settings.getSettingForSource() unmocked. Stub the scheduler explicitly instead of relying on it failing gracefully. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Gr4SAmd6b69UzqEnamyc4c
|
Claude finished @Yeraze's task —— View job PR Review
SummaryThis is a clean, well-scoped fix. The previous automated reviews caught and resolved the most significant behavioral issue (double-scheduling via the Code Quality
The ordering of operations is correct:
The inline comment on the catch block (lines 919–926) is unusually thorough and directly useful for future maintainers. The reasoning for why
if (this.reconnectTimer) return;This is the right fix for the double-scheduling bug. It correctly no-ops before the counter increment at line 5001, so
Correct and well-commented. Without this, sources without
In
The guard handles this cleanly. However, the Test Coverage
Four tests covering the important paths:
The +4 lines stub Minor: test 2 doesn't stub Line 68 stubs Design Observations (non-blocking)Infinite retry with no log escalation. With
VerdictApprove. The implementation is minimal, correctly ordered, and addresses a real UX regression (#3918). The double-scheduling fix (77adfb5) and its regression test (test 4) are the key quality additions. No blocking issues. |
Summary
Fixes #3918.
MeshCoreManager.connect()previously made exactly one attempt and gave up permanently on failure — leaving the source stuck disconnected until a manual "Connect" click, even with "Automatically connect on startup" enabled. This differs from Meshtastic TCP sources, whose transport retries forever with exponential backoff by default. MeshCore's existing reconnect machinery (scheduleNextReconnect/attemptReconnect) only ever engaged for a drop after an already-successful connect, and only when the opt-in heartbeat feature (heartbeatIntervalSeconds) was configured — never for the initial connect attempt itself.connect()now armsshouldReconnectand callsscheduleNextReconnect()when the very first attempt fails, so a transient startup race (device/native backend not yet ready) recovers on its own via the existing exponential-backoff timer instead of requiring manual intervention.shouldReconnectis explicitly reset tofalseso the heartbeat feature's own opt-in gating for post-connect drops is unaffected — this change does not silently enable heartbeat-style auto-reconnect for sources that never configured it.Changes
src/server/meshcoreManager.ts: retry-on-initial-failure inconnect()'s catch block; explicitshouldReconnectreset on success.src/server/meshcoreManager.connect-error.test.ts: stubscheduleNextReconnectin the existing catch-handler tests so they don't leak a real retry timer.src/server/meshcoreManager.initialConnectRetry.test.ts(new): regression tests covering the retry-on-failure path, theshouldReconnectreset on success, and an end-to-end retry-then-succeed scenario viaattemptReconnect().Test plan
meshcoreManager.connect-error.test.ts,meshcoreManager.initialConnectRetry.test.ts,meshcoreManager.dropDetection.test.ts— all passprotobufsgit submodule in the sandbox (unrelated to this change); re-ran aftergit submodule update --initand all 9 pass (48/48 tests)eslinton changed files — no new warnings/errors (21 pre-existing issues at unrelated lines, confirmed present before this change viagit stash)🤖 Generated with Claude Code
https://claude.ai/code/session_01Gr4SAmd6b69UzqEnamyc4c
Generated by Claude Code