Skip to content

issue with secure-boot in at least Codespaces #145

@cgwalters

Description

@cgwalters

I haven't debugged this at all, but trying things out today bcvk was failing in a Codespaces instance with secure-boot, hit a KVM error.

Details # UEFI Secure Boot Fails in Nested Virtualization (Azure/Codespaces)

Problem Description

VMs created with bcvk libvirt run using the default --firmware=uefi-secure fail to boot in nested virtualization environments (GitHub Codespaces, Azure VMs) with the error:

KVM: entry failed, hardware error 0xffffffff
... SMM=1 ...

The VM starts but immediately enters a paused state with "internal-error" status.

Root Cause

UEFI Secure Boot requires SMM (System Management Mode) support. When running KVM inside another hypervisor (nested virtualization), SMM emulation is incomplete and unreliable, particularly on Azure/Microsoft hypervisors. The OVMF firmware tries to enter SMM during Secure Boot initialization, which triggers a KVM hardware error that cannot be recovered.

Environment

  • Platform: GitHub Codespaces (Azure nested virtualization)
  • Host Hypervisor: Microsoft (detected via systemd-detect-virt)
  • Nested KVM: Enabled (/sys/module/kvm_amd/parameters/nested = 1)
  • QEMU: 10.1.2
  • libvirt: 11.8.0
  • Kernel: 6.8.0-1030-azure

Symptoms

  1. VM domain shows state: paused (internal-error)
  2. QEMU log shows:
    KVM: entry failed, hardware error 0xffffffff
    EAX=00000000 EBX=b7e03d78 ECX=000000b2 EDX=000000b2
    ...
    EIP=00008000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=1 HLT=0
    
  3. VM XML shows:
    <feature enabled='yes' name='secure-boot'/>
    <smm state='on'/>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>

Reproduction

# This will fail in nested virtualization:
bcvk libvirt run --name test quay.io/centos-bootc/centos-bootc:stream10

# Check VM state:
virsh domstate test
# Output: paused

# Check QEMU monitor:
virsh qemu-monitor-command test --hmp info status
# Output: VM status: paused (internal-error)

# Check logs:
cat ~/.cache/libvirt/qemu/log/test.log | grep "KVM.*error"
# Output: KVM: entry failed, hardware error 0xffffffff

Solutions

Option 1: UEFI without Secure Boot (Recommended)

Use --firmware=uefi-insecure to get UEFI boot without SMM requirements:

bcvk libvirt run --name test --firmware=uefi-insecure \
  quay.io/centos-bootc/centos-bootc:stream10

Result: ✅ Works perfectly. UEFI boot, no Secure Boot, no SMM.

Option 2: Legacy BIOS

Use --firmware=bios for maximum compatibility:

bcvk libvirt run --name test --firmware=bios \
  quay.io/centos-bootc/centos-bootc:stream10

Result: ✅ Works perfectly. Legacy BIOS boot, no UEFI.

Attempted Workarounds (All Failed)

The following were tested but did not resolve the SMM issue:

  1. CPU mode changes: host-passthrough, host-model, custom CPU models all fail
  2. Migratable flag: Setting migratable=off makes no difference
  3. KVM parameters: No kernel module parameters can fix architectural limitations
  4. QEMU machine options: SMM issues are at the KVM level, not QEMU

Detection

We can reliably detect this problematic scenario before attempting to start a VM:

Detection Logic

fn is_nested_virtualization_risky() -> bool {
    // Check if running in a VM
    let in_vm = std::fs::read_to_string("/proc/cpuinfo")
        .map(|s| s.contains("hypervisor"))
        .unwrap_or(false);

    // Check if nested KVM is enabled
    let nested_kvm = std::fs::read_to_string("/sys/module/kvm_amd/parameters/nested")
        .or_else(|_| std::fs::read_to_string("/sys/module/kvm_intel/parameters/nested"))
        .map(|s| s.trim() == "1" || s.trim() == "Y")
        .unwrap_or(false);

    in_vm && nested_kvm
}

Detection Signals

Check Command Expected in Nested Virt
Hypervisor present grep hypervisor /proc/cpuinfo Match found
Hypervisor vendor lscpu | grep "Hypervisor vendor" Shows "Microsoft" (Azure)
systemd-detect-virt systemd-detect-virt Returns non-"none"
DMI vendor cat /sys/devices/virtual/dmi/id/sys_vendor "Microsoft Corporation"
Nested KVM cat /sys/module/kvm_*/parameters/nested "1" or "Y"

Recommended Warning

When --firmware=uefi-secure is used and nested virtualization is detected:

WARNING: Nested virtualization detected (Hypervisor: Microsoft)

  UEFI Secure Boot (--firmware=uefi-secure) requires SMM which often
  fails in nested environments.

  Recommended alternatives:
    --firmware=uefi-insecure  (UEFI without Secure Boot)
    --firmware=bios           (Legacy BIOS, most compatible)

  Continue anyway? [y/N]

Technical Background

Why SMM Fails in Nested Virtualization

  1. SMM Architecture: SMM is a special x86 CPU mode for firmware operations
  2. Nested KVM Limitation: KVM's nested virtualization doesn't fully emulate SMM
  3. OVMF Requirement: OVMF Secure Boot firmware requires functional SMM
  4. Azure Specific: Azure's hypervisor is optimized for Hyper-V, not nested KVM

Error Details

The error occurs when:

  1. libvirt starts QEMU with smm=on and Secure Boot OVMF firmware
  2. OVMF initializes and attempts to enter SMM
  3. Nested KVM cannot handle the SMM state transition
  4. CPU registers show SMM=1 flag set at time of error
  5. VM pauses with "internal-error" and cannot recover

Research References

Related Issues

  • Affects all nested KVM environments (not just Azure)
  • Proxmox nested virtualization shows similar issues
  • Some cloud providers (AWS bare metal) may work better
  • Issue has existed in KVM for multiple kernel versions

Testing Notes

Tested configurations:

  • --firmware=uefi-insecure: Works, VM boots successfully
  • --firmware=bios: Works, VM boots successfully
  • --firmware=uefi-secure + host-passthrough: Fails with SMM error
  • --firmware=uefi-secure + host-model: Fails with SMM error
  • --firmware=uefi-secure + migratable=off: Fails with SMM error

Recommendations

For bcvk/bootc Development

  1. Add detection heuristic to warn users in nested virt scenarios
  2. Change default to --firmware=uefi-insecure when nested virt detected
  3. Document limitation in README/docs
  4. CI/CD: Use --firmware=uefi-insecure in GitHub Actions/Codespaces

For Users

  1. In nested environments: Always use --firmware=uefi-insecure or --firmware=bios
  2. For Secure Boot: Test on bare metal or non-nested KVM hosts
  3. For development/testing: --firmware=bios is fastest and most compatible

Investigation Date

2025-11-10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions