Skip to content

feat(krun): raise IRQ cap, fix GIC nr_irqs, and make split irqchip opt-in#46

Merged
appcypher merged 5 commits into
krunfrom
appcypher/raise-irq-limits
Apr 18, 2026
Merged

feat(krun): raise IRQ cap, fix GIC nr_irqs, and make split irqchip opt-in#46
appcypher merged 5 commits into
krunfrom
appcypher/raise-irq-limits

Conversation

@appcypher
Copy link
Copy Markdown
Member

@appcypher appcypher commented Apr 13, 2026

Summary

Raise the per-VM IRQ ceiling so callers can attach many virtio-mmio devices (e.g. lots of virtio-fs tags or block-backed OCI rootfs layers), while preserving the existing in-kernel-IOAPIC behavior as the default.

  • x86_64 IRQ cap (opt-in). Keep IRQ_MAX = 15 for the default in-kernel IOAPIC path (KVM hardcodes KVM_IOAPIC_NUM_PINS = 24, giving 11 usable virtio IRQs). Add IRQ_MAX_SPLIT = 223 used only when the caller selects the userspace split irqchip, which emulates a 256-pin IOAPIC.
  • MachineBuilder::split_irqchip(bool). Callers that need >11 IRQs opt in via the builder; otherwise libkrun behaves exactly as before. Mirrors the existing krun_split_irqchip C API.
  • Fix KVM_CAP_SPLIT_IRQCHIP args[0]. Was hardcoded to 24 while IOAPIC_NUM_PINS was 256; now derived from IOAPIC_NUM_PINS so the reserved userspace-IOAPIC GSI range matches the emulated device. Functionally harmless before (libkrun installs MSI routes regardless), but the old value was a lie to KVM.
  • aarch64 IRQ_MAX bumped 159 → 223. No mode switch needed on aarch64.
  • GIC nr_irqs correctness fix. kvmgicv2 / kvmgicv3 rounded up to a multiple of 32 and accounts for the 32 private interrupts (SGIs + PPIs); the previous IRQ_MAX - IRQ_BASE + 1 under-allocated SPIs so any device assigned an IRQ above 127 ended up with an invalid interrupt.

Why opt-in and not always-on

Forcing the userspace split irqchip globally changed runtime behavior for every x86_64 caller, even those fine with the 11-IRQ budget. Keeping it opt-in means zero behavioral change for existing users; callers who legitimately need the higher cap (e.g. microsandbox with many virtio-fs mounts) flip one flag and get 219 usable IRQs.

Test plan

  • cargo build -p msb_krun (x86_64 linux, macOS host)
  • cargo fmt --check, cargo clippy -p msb_krun_vmm -p msb_krun
  • End-to-end: microsandbox patched against this branch with .split_irqchip(true), just build && just install, msb run alpine — boots cleanly
  • Exercise with ≥12 virtio-fs / block devices on x86_64 once a CI worker is available

Raise IRQ_MAX from 15 to 223 on x86_64 and from 159 to 223 on aarch64
to support 136+ virtio-MMIO devices needed for block-backed EROFS OCI
rootfs (one block device per OCI layer).

- Raise IOAPIC_NUM_PINS from 24 to 256 in the userspace split irqchip
  to match the new IRQ range on x86_64
- Always use split irqchip on x86_64 since KVM's in-kernel IOAPIC is
  hardcoded to 24 pins (KVM_IOAPIC_NUM_PINS) and cannot be changed
- Fix GIC nr_irqs calculation in kvmgicv2 and kvmgicv3: KVM interprets
  nr_irqs as total interrupts including 32 private ones (SGIs + PPIs),
  so the old `IRQ_MAX - IRQ_BASE + 1` formula under-allocated SPIs
@hsiangkao
Copy link
Copy Markdown

  • Raise IRQ_MAX from 15 to 223 on x86_64 and from 159 to 223 on aarch64 to support 136+ virtio-MMIO devices needed for block-backed EROFS OCI rootfs (one block device per OCI layer)

You could also use new libkrun VMDK support to:

- Fix rustfmt: collapse method chain in builder.rs to single line
- Fix clippy: use .div_ceil(32) instead of manual ((x + 31) / 32)
  in kvmgicv2 and kvmgicv3
- Fix test: increase cmdline buffer from 4096 to 16384 in
  test_register_too_many_devices since 219 devices (~10KB) now
  overflows the old 4KB limit
No longer needed since split irqchip is now always used.
@appcypher
Copy link
Copy Markdown
Member Author

appcypher commented Apr 15, 2026

@hsiangkao Reading your comment again. Thank you so much for these suggestions.

It makes the changes in this PR unnecessary!

I knew erofs had a merge feature but wasn't quite sure how to make it work. This amazing! Never knew VMDK had sth like that. I will try it and maybe I will close this PR.

The previous revision forced userspace split irqchip unconditionally on
x86_64 to reach the raised IRQ cap, which changed runtime behavior for
every caller — even those fine with the in-kernel IOAPIC's 11-IRQ budget.
Restore the mode as opt-in while keeping the raised ceiling available to
callers who need it.

- src/arch/src/x86_64/layout.rs: keep IRQ_MAX at 15 for the in-kernel
  IOAPIC path; add IRQ_MAX_SPLIT = 223 for the userspace split irqchip.
- src/vmm/src/builder.rs: restore the if/else on vm_resources.split_irqchip
  choosing IoApic vs KvmIoapic and the matching attach_legacy_devices arg.
  Size the MMIODeviceManager IRQ pool to IRQ_MAX_SPLIT only when split
  irqchip is selected.
- src/krun/src/api/builders.rs + builder.rs: add
  MachineBuilder::split_irqchip(bool) and thread it through to
  vmr.split_irqchip.
- src/devices/src/legacy/ioapic.rs: fix KVM_CAP_SPLIT_IRQCHIP args[0] to
  match the emulated IOAPIC's pin count (256) instead of the previous
  hardcoded 24. The old value was inconsistent with IOAPIC_NUM_PINS and
  only worked because libkrun installs MSI routes rather than pin routes.

aarch64 GIC nr_irqs fix and the aarch64 IRQ_MAX bump from the prior
commits remain in place — those stand on their own.
@appcypher appcypher changed the title feat(krun): raise IRQ limits and fix GIC nr_irqs for 136+ virtio devices feat(krun): raise IRQ cap, fix GIC nr_irqs, and make split irqchip opt-in Apr 18, 2026
Both lints are pre-existing on main and only surface because the CI
runner uses the latest stable toolchain, which is ahead of the local
dev toolchain.

- src/devices/src/virtio/snd/worker.rs: drop a useless .into_iter() on
  the argument to Vec::extend (clippy::useless_conversion in 1.95+).
- src/cpuid/src/common.rs: reorder the std::arch::* imports to put
  CpuidResult last, matching rustfmt 1.95+'s case-insensitive sort.
@appcypher appcypher merged commit bd5f184 into krun Apr 18, 2026
8 checks passed
@appcypher appcypher deleted the appcypher/raise-irq-limits branch April 18, 2026 21:55
@appcypher
Copy link
Copy Markdown
Member Author

appcypher commented Apr 18, 2026

For context, our filesystem implementation now use erofs meta and vmdk stitching, but this changes were still needed because a user was hitting irq limit on linux x86-64

@hsiangkao
Copy link
Copy Markdown

hsiangkao commented Apr 20, 2026

@hsiangkao Reading your comment again. Thank you so much for these suggestions.

It makes the changes in this PR unnecessary!

I knew erofs had a merge feature but wasn't quite sure how to make it work. This amazing! Never knew VMDK had sth like that. I will try it and maybe I will close this PR.

Yes, but it also depends on when you generate fsmerge metadata (assumedly OCI layers are already applied to EROFS parallelly so the fsmerge metadata generation could be a potential bottleneck.) If generating fsmerge still takes some time on the critical path and you don't have a way to re-distribute it, you could also use GPT partitions and mount EROFS in the guest instead (each erofs partition mount costs ~us, so 100 layers only takes ms), anyway you could benchmark all the alternatives and find which one is best for your scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants