Add no-disable-device-node-modification-hook nvcdi feature flag by jkjk-ant · Pull Request #1833 · NVIDIA/nvidia-container-toolkit

jkjk-ant · 2026-05-18T11:59:43Z

The disable-device-node-modification CDI hook bind-mounts a tmpfs file over /proc/driver/nvidia/params inside the container. With procMount: Unmasked (Kubernetes 1.34+), that overmount makes the kernel's mount_too_revealing() check reject any subsequent fresh procfs mount from a less-privileged namespace — for example a nested user namespace created by bubblewrap. Workloads that sandbox themselves inside a GPU container can no longer mount procfs:

bwrap: Can't mount proc on /newroot/proc: Operation not permitted

The hook can be skipped in static cdi mode via nvidia-ctk cdi generate --disable-hooks, but jit-cdi mode (the default since 1.18.0) has no way to suppress an individual hook.

Setting NVreg_ModifyDeviceFiles=0 on the host (which the hook short-circuits on) is not viable on systems with NVSwitch: fabricmanager fails to initialize with NV_ERR_INVALID_STATE when the parameter is set, even with device nodes pre-created via udev or mknod. (Tested on H100 SXM with driver 570.195.03 — /dev/nvidia* and /dev/nvidia-nvswitch* device files were all present and correct, but nv-fabricmanager still returned request to acquire required privileges to access NVSwitch devices failed.)

This adds a nvcdi feature flag, following the no-additional-gids-for-device-nodes naming pattern, that suppresses the hook:

[nvidia-container-runtime.modes.jit-cdi]
nvcdi-feature-flags = ["no-disable-device-node-modification-hook"]

The hook's purpose is to prevent in-container nvidia-smi/libnvidia-ml from creating extra /dev/nvidiaN device nodes (#927). That prevention is already enforced by cgroup device controls in container runtimes, so disabling the hook does not affect device isolation in those environments. Because the hook does provide defense-in-depth where cgroup enforcement is absent or misconfigured, the flag is opt-in and off by default.

Verification

On a Kubernetes 1.34 node with the patched runtime and the flag enabled, a pod with procMount: Unmasked:

/proc/self/mountinfo shows no submount under /proc/driver/nvidia (the hook never ran)
/proc/driver/nvidia/params reads the host's real value (ModifyDeviceFiles: 1)
bwrap --unshare-all --bind / / --proc /proc --dev /dev /bin/true succeeds
nvidia-smi -L returns the allocated GPU
fabricmanager and persistenced unaffected

Without the flag, the same pod sees the hook's overmount and bwrap fails with Operation not permitted.

(Open to a different name if no-disable- reads too awkwardly — the inner hook name's disable- prefix makes most options collide.)

copy-pr-bot · 2026-05-18T11:59:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

The disable-device-node-modification CDI hook bind-mounts a tmpfs file over /proc/driver/nvidia/params inside the container. With procMount: Unmasked (Kubernetes 1.34+), that overmount makes the kernel's mount_too_revealing() check reject any subsequent fresh procfs mount from a less-privileged namespace -- for example a nested user namespace created by bubblewrap. Workloads that sandbox themselves inside a GPU container can no longer mount procfs. The hook can be skipped in static cdi mode via nvidia-ctk cdi generate --disable-hooks, but jit-cdi mode (the default since 1.18.0) has no way to suppress an individual hook. Setting NVreg_ModifyDeviceFiles=0 on the host (which the hook short-circuits on) is not viable on systems with NVSwitch: fabricmanager fails to initialize with NV_ERR_INVALID_STATE when the parameter is set, even with device nodes pre-created via udev or mknod. This adds a nvcdi feature flag, following the no-additional-gids-for-device-nodes naming pattern, that suppresses the hook in jit-cdi mode: [nvidia-container-runtime.modes.jit-cdi] nvcdi-feature-flags = ["no-disable-device-node-modification-hook"] The hook's purpose is to prevent in-container nvidia-smi/libnvidia-ml from creating extra /dev/nvidiaN device nodes. That prevention is already enforced by cgroup device controls in container runtimes, so disabling the hook does not affect device isolation. The flag is opt-in and off by default. Signed-off-by: Jack Kleeman <jkjk@anthropic.com>

jkjk-ant force-pushed the feat/disable-device-node-modification-hook-flag branch from 36a659a to 1e7875f Compare May 18, 2026 12:03

jkjk-ant force-pushed the feat/disable-device-node-modification-hook-flag branch from 1e7875f to 1d543ab Compare May 18, 2026 12:10

jkjk-ant changed the title ~~Add disable-device-node-modification-hook nvcdi feature flag~~ Add no-disable-device-node-modification-hook nvcdi feature flag May 18, 2026

jkjk-ant marked this pull request as ready for review May 18, 2026 12:32

jkjk-ant closed this May 18, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add no-disable-device-node-modification-hook nvcdi feature flag#1833

Add no-disable-device-node-modification-hook nvcdi feature flag#1833
jkjk-ant wants to merge 1 commit into
NVIDIA:mainfrom
jkjk-ant:feat/disable-device-node-modification-hook-flag

jkjk-ant commented May 18, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jkjk-ant commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

copy-pr-bot Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jkjk-ant commented May 18, 2026 •

edited

Loading