Skip to content

Need to understand recovery when a device goes away and comes back #1448

@tasleson

Description

@tasleson

When testing a pool with 3 devices, I created a FS, mounted it and started IO. I then took the devices offline with echo offline > /sys/block/sdk/device/state. I then brought the devices back, but after I do we cannot use the device mapper tables. If I unmount and then try to mount FS I get:

# mount /stratis/yank/some_fs /mnt/fubar/
mount: /mnt/fubar: can't read superblock on /dev/mapper/stratis-1-3b8e5c85a0d84b04ab5e826689e7d020-thin-fs-077152e10d9b4d2eae0f1c151e6a6651.

Stratis daemon shows

ERROR libstratis::engine::strat_engine::thinpool::thinpool: Thinpool status is fail -> Failed

Logs show

[10350.529741] device-mapper: thin: 253:3: metadata operation 'dm_pool_commit_metadata' failed: error = -5
[10350.529742] device-mapper: thin: 253:3: aborting current metadata transaction
[10350.531325] sd 7:0:3:0: rejecting I/O to offline device
[10350.536549] sd 7:0:2:0: rejecting I/O to offline device
[10350.545825] sd 7:0:3:0: rejecting I/O to offline device
[10350.546515] device-mapper: thin: 253:3: failed to abort metadata transaction
[10350.546518] device-mapper: thin: 253:3: switching pool to failure mode
[10350.549157] device-mapper: thin metadata: couldn't read superblock
[10350.549158] device-mapper: thin: 253:3: failed to set 'needs_check' flag in metadata

The only way to get things back is to stop stratisd, remove the dm tables and restart the service.
Perhaps there is something better we can do here to get things back to a working state without requiring user intervention?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions