Skip to content

fix(openshell-network-supervisor): gate proxy accept on symlink resolution readiness#1968

Open
Cali0707 wants to merge 1 commit into
NVIDIA:mainfrom
Cali0707:fix/symlink-reload-generation-race
Open

fix(openshell-network-supervisor): gate proxy accept on symlink resolution readiness#1968
Cali0707 wants to merge 1 commit into
NVIDIA:mainfrom
Cali0707:fix/symlink-reload-generation-race

Conversation

@Cali0707

Copy link
Copy Markdown
Contributor

Summary

The forward proxy accept loop can race with OPA engine's symlink resolution reload during sandbox startup. If a request is in-flight when the reload bumps the policy generation counter, the generation guard rejects it with a spurious 403. This is startup only: the initial engine is built with PID=0 (no symlink resolution), then a background task re-resolves with the real PID once /proc/<pid>/root is accessibly - two generation bumps for the same policy.

This gates proxy accept behind a watch channel that fires once symlink resolution completes.

Related Issue

Should resolve the e2e issues seen in #1891

Changes

  • Add tokio::sync::watch channel that signals when symlink resolution finishes (success, failure, or skip)
  • Gate the proxy accept loop on the readiness signal with a 15s fallback

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

…ution readiness

Signed-off-by: Calum Murray <cmurray@redhat.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 22, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant