zpool replace yields "disk busy"

All your general support questions for OpenZFS on OS X.

Re: zpool replace yields "disk busy"

Postby ilovezfs » Sat Oct 24, 2015 2:10 am

Well, trying everything under the sun I could come up with, I could not reproduce the issue. However, there was one thing worth noting. When I deliberately created an artificial EBUSY situation, the output looked like this:

Code: Select all
# zpool replace demo 1809447841029417384 disk7
cannot label 'disk7': cannot label '/dev/disk7': unable to open device: 16
# zpool replace demo 1809447841029417384 disk7s1
cannot open '/dev/disk7s1': Resource busy
cannot replace 1809447841029417384 with disk7s1: disk7s1 is busy


What's interesting is that neither of you got the message "cannot open '/dev/disk7s1': Resource busy," nor the message "cannot label '/dev/disk7': unable to open device: 16," so it's busy in yet some other way. Same messages regardless of -f.
ilovezfs
 
Posts: 232
Joined: Thu Mar 06, 2014 7:58 am

Re: zpool replace yields "disk busy"

Postby jdwhite » Sat Oct 24, 2015 5:45 pm

ilovezfs wrote:First can we see the output of

Code: Select all
sudo zdb -l /dev/disk#s1



Code: Select all
# zdb -l /dev/disk5s1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 5000
    name: 'Qx2'
    state: 0
    txg: 3359756
    pool_guid: 12762124584885818258
    errata: 0
    hostid: 1107358191
    hostname: 'localhost'
    top_guid: 11117946658200455167
    guid: 11299653104960740877
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 11117946658200455167
        nparity: 1
        metaslab_array: 34
        metaslab_shift: 36
        ashift: 12
        asize: 8001538752512
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 4238121247080782406
            path: '/private/var/run/disk/by-id/media-B13011C3-A1D8-8F48-822E-C3FAA873BB60'
            whole_disk: 1
            DTL: 63
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 11299653104960740877
            path: '/private/var/run/disk/by-id/media-072C0FB2-4A83-324A-B6C1-D3FDB6884F92'
            whole_disk: 1
            DTL: 62
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 3553707349010248735
            path: '/private/var/run/disk/by-id/media-4BF35929-F5EF-CB4B-99B3-58F02CA3D05A'
            whole_disk: 1
            DTL: 61
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 3202437423136890323
            path: '/private/var/run/disk/by-id/media-2613DEE8-737E-5845-B6C7-B2398DAF2BE9'
            whole_disk: 1
            DTL: 60
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 5000
    name: 'Qx2'
    state: 0
    txg: 3359756
    pool_guid: 12762124584885818258
    errata: 0
    hostid: 1107358191
    hostname: 'localhost'
    top_guid: 11117946658200455167
    guid: 11299653104960740877
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 11117946658200455167
        nparity: 1
        metaslab_array: 34
        metaslab_shift: 36
        ashift: 12
        asize: 8001538752512
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 4238121247080782406
            path: '/private/var/run/disk/by-id/media-B13011C3-A1D8-8F48-822E-C3FAA873BB60'
            whole_disk: 1
            DTL: 63
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 11299653104960740877
            path: '/private/var/run/disk/by-id/media-072C0FB2-4A83-324A-B6C1-D3FDB6884F92'
            whole_disk: 1
            DTL: 62
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 3553707349010248735
            path: '/private/var/run/disk/by-id/media-4BF35929-F5EF-CB4B-99B3-58F02CA3D05A'
            whole_disk: 1
            DTL: 61
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 3202437423136890323
            path: '/private/var/run/disk/by-id/media-2613DEE8-737E-5845-B6C7-B2398DAF2BE9'
            whole_disk: 1
            DTL: 60
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data


ilovezfs wrote:It's possible you need to do a labelclear (and also possibly a gpt destroy), but one step at a time.


I tried a label clear alone, then a zpool replace.
Then tried a label clear and a gpt clear, then a zpool replace.

Still got device busy on both.

Both times went down like this:

Code: Select all
# diskutil list
...
/dev/disk5 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                                                   *2.0 TB     disk5
# zpool replace Qx2 11299653104960740877 disk5
invalid vdev specification
use '-f' to override the following errors:
/dev/disk5 does not contain an EFI label but it may contain partition
information in the MBR.
# zpool replace -f Qx2 11299653104960740877 disk5
cannot replace 11299653104960740877 with disk5: disk5 is busy


At which point disk5 has a zfs label again.

Code: Select all
/dev/disk5 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *2.0 TB     disk5
   1:                        ZFS                         2.0 TB     disk5s1
   2: 6A945A3B-1DD2-11B2-99A6-080020736631               8.4 MB     disk5s9
jdwhite
 
Posts: 11
Joined: Sat May 10, 2014 6:04 pm

Re: zpool replace yields "disk busy"

Postby ilovezfs » Sat Oct 24, 2015 11:23 pm

It's interesting that there's a defunct LABEL 2 on there, without LABEL 0 or LABEL 1. I wonder if that's the root of the problem.

Did you labelclear the partition (diskXsY) or the whole device (diskX). The command takes the partition not the whole device. You can of course do both, but it's the partition that's relevant. (Yes, that is confusing, and yes, there is already an upstream issue about it: https://github.com/zfsonlinux/zfs/issues/3156).

Note that since the partition table offsets are all the same, that LABEL 2 is probably still there and readable via zdb -l /dev/diskXsY. You may want to verify that before redoing the labelclear.

Another way to test to see if that's the issue would be to do the replace with a sparsebundle and see if that succeeds. (Assuming it did, we'd cancel that resilvering and then pick up where we left off with this troubleshooting).

Yet another way would be actually zeroing the entire device first. But that's cheating.
ilovezfs
 
Posts: 232
Joined: Thu Mar 06, 2014 7:58 am

Re: zpool replace yields "disk busy"

Postby ilovezfs » Sat Oct 24, 2015 11:52 pm

Indeed, it looks like nearly the exact same thing has happened before and the solution was, as expected, zeroing out the relevant stuff: https://github.com/zfsonlinux/zfs/issues/440

Note that his error message in #440 is the same as yours, and not the "normal" busy messages I posted above:

cannot replace 10094699726010659595 with sde: sde is busy
ilovezfs
 
Posts: 232
Joined: Thu Mar 06, 2014 7:58 am

Re: zpool replace yields "disk busy"

Postby jdwhite » Sun Oct 25, 2015 9:00 am

ilovezfs wrote:It's interesting that there's a defunct LABEL 2 on there, without LABEL 0 or LABEL 1. I wonder if that's the root of the problem.

Did you labelclear the partition (diskXsY) or the whole device (diskX). The command takes the partition not the whole device. You can of course do both, but it's the partition that's relevant. (Yes, that is confusing, and yes, there is already an upstream issue about it: https://github.com/zfsonlinux/zfs/issues/3156).


Yep, this was precisely my issue. I did not specify a partition when doing labelclear. The other mistake here was believing that the output from diskutil list was meaningful in this instance. I thought since partitions s2 and s9 were gone that labelclear had done its job.

The reality is that it didn't clear the ZFS label and put the disk in a state that didn't give you any partitions to query with 'zdb' et. al. Once I specified diskXs1 everything worked. Currently resilvering.

Thanks for your assistance.
jdwhite
 
Posts: 11
Joined: Sat May 10, 2014 6:04 pm

Previous

Return to General Help

Who is online

Users browsing this forum: Google [Bot], Sharko and 20 guests