Skip to content

Instance fails to boot after changing instance type (Xen ↔ Nitro) when hibernation is enabled #55

@831kirimi

Description

@831kirimi

Issue Description

When hibernation is enabled on an EC2 instance, amazon-ec2-hibinit-agent sets kernel command line parameters resume and resume_offset. However, the resume parameter is set using device names (e.g.,
resume=/dev/nvme0n1p1, resume=/dev/xvda1). When changing instance types between different hypervisor types (Xen ↔ Nitro), the device naming convention changes, causing the instance to fail boot
with instance status check failures.

Root Cause

The issue occurs in the patch_grub_config function where the swap device is identified by device name:

https://github.com/aws/amazon-ec2-hibinit-agent/blob/master/agent/hibinit-agent#L131

grub_update_kernel = "grubby --update-kernel=ALL --args='no_console_suspend=1 " + \
                         "resume_offset={offset} resume={swap_device}'"
grub_update_kernel = grub_update_kernel.format(offset=offset, swap_device=swap_device)
check_call(grub_update_kernel, shell=True)

https://github.com/aws/amazon-ec2-hibinit-agent/blob/master/agent/hibinit-agent#L189

if config.grub_update:
    dev_str = find_device_for_file(SWAP_FILE)
    patch_grub_config(dev_str, offset)

https://github.com/aws/amazon-ec2-hibinit-agent/blob/master/agent/hibinit-agent#L210

def find_device_for_file(filename):
    # Find the mount point for the swap file ('df -P /swap')
    df_out = check_output(['df', '-P', filename]).decode(sys.getfilesystemencoding())
    dev_str = df_out.split("\n")[1].split()[0]
    return dev_str

Reproduction Steps

  1. Launch hibernation-enabled instance with Xen-based instance type:
aws ec2 run-instances \
    --image-id ami-07faa35bbd2230d90 \
    --instance-type t2.micro \
    --key-name test \
    --security-group-ids sg-1234567890abcdef0 \
    --hibernation-options Configured=true \
    --block-device-mappings '[
        {
            "DeviceName": "/dev/xvda",
            "Ebs": {
                "VolumeSize": 30,
                "VolumeType": "gp3",
                "Encrypted": true
            }
        }
    ]'
  1. Stop the instance:
aws ec2 stop-instances --instance-ids i-1234567890abcdef0
  1. Change to Nitro-based instance type:
aws ec2 modify-instance-attribute --instance-id i-1234567890abcdef0 --instance-type Value=t3.micro
  1. Start the instance:
aws ec2 start-instances --instance-ids i-1234567890abcdef0
  1. Check instance status:
aws ec2 describe-instance-status --instance-ids i-1234567890abcdef0

Expected Result

Instance should boot successfully after instance type change.

Actual Result

Instance status check fails with "reachability: failed" status:

{
    "Details": [
        {
            "ImpairedSince": "2025-09-03T02:17:00+00:00",
            "Name": "reachability",
            "Status": "failed"
        }
    ],
    "Status": "impaired"
}

Proposed Solution

Use UUID or filesystem labels instead of device names for the resume parameter to ensure compatibility across different hypervisor types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions