-
Notifications
You must be signed in to change notification settings - Fork 167
keepalive: Add inspect command to disable NVMe keepalives (#2789) #2800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keepalive: Add inspect command to disable NVMe keepalives (#2789) #2800
Conversation
…2789) Add `vm/nvme_keepalive_mode` inspect node shows the current status, but can be updated through `ohcldiag-dev`: ``` C:\openvmm>ohcldiag-dev.exe uhdiag.sock i vm/nvme_keepalive_mode "enabled" C:\openvmm>ohcldiag-dev.exe uhdiag.sock i vm/nvme_keepalive_mode -u disabled "disabled" C:\openvmm>ohcldiag-dev.exe uhdiag.sock i vm/nvme_keepalive_mode "disabled" C:\openvmm>ohcldiag-dev.exe uhdiag.sock i vm/nvme_keepalive_mode -u enabled "enabled" C:\openvmm>ohcldiag-dev.exe uhdiag.sock i vm/nvme_keepalive_mode "enabled" C:\openvmm>ohcldiag-dev.exe uhdiag.sock i vm/nvme_keepalive_mode -u blah Error: update error: Invalid keepalive config: blah ``` Add a test for this in `openhcl_servicing.rs` that verifies that the keepalive can be disabled, and servicing completes successfully. (cherry picked from commit 458c128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds an OpenHCL inspect node to view/update NVMe keepalive mode at runtime (primarily for diagnostics/testing), and extends servicing tests to validate disabling keepalive via inspect.
Changes:
- Add mutable inspect handling for
vm/nvme_keepalive_mode, including update logging. - Extend
KeepAliveConfigparsing to accept"enabled"/"disabled"and add a stringifier helper. - Add a servicing test that disables NVMe keepalive via inspect and verifies IDENTIFY is observed after servicing restore.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| vmm_tests/vmm_tests/tests/tests/multiarch/openhcl_servicing.rs | Adds a new servicing test that toggles NVMe keepalive via inspect and validates expected NVMe behavior post-restore. |
| openhcl/underhill_core/src/options.rs | Updates KeepAliveConfig to support inspect-friendly strings and provides as_str() helper. |
| openhcl/underhill_core/src/dispatch/mod.rs | Exposes vm/nvme_keepalive_mode as a mutable inspect field with logging on updates. |
| /// Returns the string representation matching the inspect rename attributes. | ||
| pub fn as_str(&self) -> &'static str { | ||
| match self { | ||
| KeepAliveConfig::EnabledHostAndPrivatePoolPresent => "enabled", | ||
| KeepAliveConfig::DisabledHostAndPrivatePoolPresent => "nohost,privatepool", | ||
| KeepAliveConfig::Disabled => "disabled", | ||
| } | ||
| } |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc comment for KeepAliveConfig::as_str says it matches “inspect rename attributes”, but KeepAliveConfig doesn’t have any #[inspect(rename = ...)] attributes on its variants. This is misleading for future readers—either add the rename attributes to the enum variants (so Inspect outputs the same strings) or reword the comment to reflect that this is a custom string mapping used by the inspect handler.
| // Enable keepalive, then disable it later via inspect | ||
| flags.enable_nvme_keepalive = true; | ||
| // We need to disabled MANA KA since if either of the KA flasgs in on, DMA manager will save its state | ||
| // which includes NVMe regions and restore verification will fail ("unrestored allocations found"), | ||
| // since NVMe KA is off and we don't save anything). |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling/grammar in this comment makes the rationale hard to follow (e.g., “We need to disabled”, “flasgs in on”, and an extra closing parenthesis). Please fix the wording so it clearly explains why MANA keepalive must be disabled for this test.
| // Enable keepalive, then disable it later via inspect | |
| flags.enable_nvme_keepalive = true; | |
| // We need to disabled MANA KA since if either of the KA flasgs in on, DMA manager will save its state | |
| // which includes NVMe regions and restore verification will fail ("unrestored allocations found"), | |
| // since NVMe KA is off and we don't save anything). | |
| // Enable NVMe keepalive, then disable it later via inspect. | |
| flags.enable_nvme_keepalive = true; | |
| // Disable MANA keepalive. If either keepalive flag is enabled, the DMA manager will save its state, | |
| // which includes NVMe regions, and restore verification will fail with "unrestored allocations found" | |
| // because NVMe keepalive is off and we do not save any NVMe state in this test. |
| "nvme_keepalive_mode", | ||
| |value| -> anyhow::Result<&'static str> { | ||
| if let Some(new_value) = value { | ||
| let old_value = self.nvme_keep_alive.as_str(); | ||
| self.nvme_keep_alive = new_value.parse()?; | ||
| tracing::info!( | ||
| CVM_ALLOWED, | ||
| old_value, | ||
| new_value = self.nvme_keep_alive.as_str(), | ||
| "nvme_keepalive_mode updated via inspect" | ||
| ); | ||
| } | ||
| Ok(self.nvme_keep_alive.as_str()) | ||
| }, | ||
| ); |
Copilot
AI
Feb 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvme_keepalive_mode can be set to "enabled" via inspect, but this only updates self.nvme_keep_alive and does not update NVMe save/restore support in the NVMe manager (which is computed at startup). This can lead to an inconsistent state where DMA manager state is saved/restored because nvme_keep_alive.is_enabled() is true, but NVMe state is not saved (e.g., NvmeManager::save returns None when save_restore_supported is false), potentially causing servicing restore failures. Consider rejecting/returning an error when switching to "enabled" unless NVMe keepalive save/restore is actually supported for the current VM (e.g., NVMe VFIO is active and private pool is available), or gate the DMA-manager save decision on the same capability check used by NvmeManager::save.
Clean cherry pick of PR #2789
Add
vm/nvme_keepalive_modeinspect node shows the current status, but can be updated throughohcldiag-dev:Add a test for this in
openhcl_servicing.rsthat verifies that the keepalive can be disabled, and servicing completes successfully.