Skip to content

fix(supervisor): address audit findings (switch-none rollback, dedup, cleanup)#386

Merged
BANANASJIM merged 1 commit into
mainfrom
fix/audit-supervisor
Jun 7, 2026
Merged

fix(supervisor): address audit findings (switch-none rollback, dedup, cleanup)#386
BANANASJIM merged 1 commit into
mainfrom
fix/audit-supervisor

Conversation

@BANANASJIM

@BANANASJIM BANANASJIM commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Addresses 4 of 6 audit findings in src/supervisor.zig (2 skipped as out-of-scope refactors, see below).

Changes

  • [MEDIUM] handleSwitchNone partial-apply — global switch-to-none was non-transactional: an early return on a mid-loop restart failure left already-processed devices with cleared mappers and no thread, recoverable only by daemon restart. Now mirrors handleSwitch: build a SwitchTx per target, commit each via new commitSwitchToNone, and rollbackCommittedSwitches restores every already-committed device on any restart failure. Old switch_mapping ownership moves into the tx and is freed by cleanupSwitchTxs (same as the named-mapping path).
  • [LOW] finalizeRebind errdefer — replaced manual catch-and-free blocks for dev_copy/phys_copy with errdefer, and the rollback errdefer with a plain devname_map.remove (the errdefers own the frees, no double-free). Matches attachWithInstanceResult style; no behavior change.
  • [LOW] nowNs dedup — delegates to event_loop.monotonicNs() instead of re-implementing clock_gettime(CLOCK_MONOTONIC). Test override path unchanged. No circular import (monotonicNs is a free function; event_loop does not import supervisor).
  • [LOW] PR-ε.1 comment — stripped the PR tracking label from the wedge-instrumentation comment in handleStatus, keeping the explanation.

Skipped (out-of-scope refactors)

  • [LOW] lookupChordMappingName caching — would require a cache built at applyUserConfigRuntime + free/rebuild on SIGHUP, adding cache-invalidation surface on a low-frequency user-triggered path. Deferred.
  • [LOW] handleStatus/handleList fixed buffer truncation — graceful truncation (not memory-unsafe) on an admin/debug path; converting both response paths to growable buffers + new ownership is beyond this audit's scope. Deferred.

Test plan

  • Two new regression tests: global SWITCH none rolls back all devices on failure (injects commit failure at idx 1, asserts device 0 mapper restored not null + device 1 restored) and global SWITCH none clears all mappers on success.
  • Falsifiability proven: disabling the rollback calls → rollback test fails (EXIT=1); removing the mapper-clear → success test crashes (EXIT=1).
  • ./scripts/padctl-docker test → EXIT=0 (full suite, testing.allocator leak-checked).
  • zig fmt --check src/supervisor.zig (zig 0.15.2) → clean.

refs: codebase audit

Summary by CodeRabbit

  • Bug Fixes
    • Improved reliability of device switching operations with automatic error recovery and rollback capabilities
    • Enhanced device state management to prevent incomplete state transitions when operations fail
    • Refined timing mechanism for deadline calculations in the event loop

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@BANANASJIM, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 17 minutes and 30 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e8ba321f-a090-40e9-b8f1-c2778f3b1113

📥 Commits

Reviewing files that changed from the base of the PR and between f2a7613 and 4f1487e.

📒 Files selected for processing (2)
  • .gitignore
  • src/supervisor.zig
📝 Walkthrough

Walkthrough

src/supervisor.zig is refactored to delegate monotonic time queries to event_loop.monotonicNs(), simplify finalizeRebind() error handling with errdefer cleanup, and make global "SWITCH none" transactional via new commitSwitchToNone() helper and rewritten handleSwitchNone() with rollback semantics. Two regression tests validate the transactional behavior.

Changes

Monotonic time sourcing, error handling refactor, and transactional SWITCH none

Layer / File(s) Summary
Monotonic time sourcing via event_loop
src/supervisor.zig
Supervisor.nowNs() delegates to event_loop.monotonicNs() instead of calling clock_gettime directly; module import is added to support the change.
finalizeRebind error handling refactor
src/supervisor.zig
Error handling is simplified to use errdefer cleanup for duplicated slices and devname_map rollback on failure, replacing prior catch-based cleanup logic.
Transactional SWITCH none implementation
src/supervisor.zig
Global SWITCH none becomes transactional: new commitSwitchToNone() helper performs per-device commit bookkeeping with test-only failure injection; handleSwitchNone() collects affected targets first, commits each, and rolls back all on any failure.
SWITCH none regression tests
src/supervisor.zig
Two regression tests validate transactional SWITCH none: one ensures failure rollback restores all devices' prior mappers; one ensures success clears all mappers.
Diagnostic comment wording adjustment
src/supervisor.zig
Minor wording-only comment changes in handleStatus around write_in_flight_ms diagnostic rationale.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • BANANASJIM/padctl#6: Both PRs modify src/supervisor.zig to make global SWITCH transactional with per-device commit and rollback behavior, including reload/mapping edge cases and associated tests.
  • BANANASJIM/padctl#319: Both PRs refactor finalizeRebind() in src/supervisor.zig for error handling and transactional failure preservation in hotplug resume paths.
  • BANANASJIM/padctl#344: Both PRs modify handleStatus() in src/supervisor.zig around the same diagnostic code location.

Poem

🐰 A rabbit hops through time so true,
With event_loop and error views,
Transactions bloom with rollback grace,
All devices safe in one embrace! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the primary change: implementing transactional rollback for 'switch-none', deduplication of monotonic time calls, and cleanup improvements—all matching the core audit findings addressed in this PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/audit-supervisor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@BANANASJIM BANANASJIM force-pushed the fix/audit-supervisor branch from f2a7613 to 02a378a Compare June 7, 2026 05:09
… cleanup)

- handleSwitchNone: make global switch-to-none transactional like handleSwitch.
  Previously an early return on mid-loop restart failure left already-processed
  devices with cleared mappers and no thread (inconsistent, daemon-restart only
  to recover). Now build a SwitchTx per target, commit each, and roll back every
  already-committed device on any restart failure. Adds commitSwitchToNone and
  two regression tests (rollback-on-failure + clear-on-success).
- finalizeRebind: replace manual catch-and-free blocks with errdefer for
  dev_copy/phys_copy and a plain devname_map.remove on the rollback path,
  matching attachWithInstanceResult's style; no behavior change.
- nowNs: delegate to event_loop.monotonicNs instead of re-implementing
  clock_gettime(CLOCK_MONOTONIC); test override path unchanged.
- handleStatus: drop the PR-ε.1 tracking label from the wedge-instrumentation
  comment, keeping the explanatory text.

refs: codebase audit
@BANANASJIM BANANASJIM force-pushed the fix/audit-supervisor branch from 02a378a to 4f1487e Compare June 7, 2026 05:16
@BANANASJIM BANANASJIM merged commit e4742a8 into main Jun 7, 2026
36 checks passed
@BANANASJIM BANANASJIM deleted the fix/audit-supervisor branch June 7, 2026 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant