Skip to content

No graceful shutdown; background goroutines have no stop path #51

@kgNatx

Description

@kgNatx

Severity: P3 | Subsystem: routing | Category: design

Location: /home/kyleg/containers/atxfpv.org/skwad/main.go:43-56,242-245

What

main() blocks on ListenAndServe with no SIGTERM handling or srv.Shutdown. On docker compose down/up, in-flight requests including long-polls are killed. The hourly cleanup ticker (43-54) and StartFeedbackCleanup (56) have no cancellation. defer database.Close() at line 40 never runs.

Suggested fix

signal.NotifyContext on SIGTERM, call srv.Shutdown, cancel goroutines, close DB.

Verification

Traced the full path in /home/kyleg/containers/atxfpv.org/skwad/main.go. The code facts are all accurate: main() blocks on http.ListenAndServe (line 243) with no signal.Notify/NotifyContext and no srv.Shutdown; the hourly cleanup goroutine (lines 43-54) is a bare for range ticker.C with no cancellation; api.StartFeedbackCleanup (line 56, defined api/feedback.go:265-271) is likewise an uncancellable for range ticker.C; and defer database.Close() (line 40) will not run on a SIGTERM kill since deferred funcs don't execute on signal-driven process termination. So the core finding — no graceful shutdown, no goroutine stop path — is real.

However, the impact is overstated. The finding claims "in-flight requests including long-polls are killed," but the cited HandlePoll (api/handlers.go:1109-1120) is NOT a long-poll — it does a single GetSession read and returns the version immediately. The client polls on an interval; each request is sub-millisecond. So a docker compose down at worst resets a few short-lived requests, not blocked long-polls. For SQLite, skipping database.Close() on abrupt shutdown is low-risk (WAL/journal recovery handles it; no data loss). The goroutines die with the process anyway, which is the intended end state — the cancel-path absence only matters for graceful drain, the same single concern as the missing srv.Shutdown. This is a static-file + short-JSON-API service on SQLite behind Traefik; losing a couple in-flight short requests on deploy is a minor robustness nit, not a P2-level reliability/data issue. Downgrading to P3.

Filed from the 2026-05-30 multi-agent codebase review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low priorityenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions