zpool replace failing with 2-mirror pool: disk5 is busy

All your general support questions for OpenZFS on OS X.

zpool replace failing with 2-mirror pool: disk5 is busy

Postby tangent » Mon Mar 14, 2016 1:18 am

I have the following status:

Code: Select all
$ zpool status
  pool: Midden
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
   Sufficient replicas exist for the pool to continue functioning in a
   degraded state.
action: Online the device using 'zpool online' or replace the device with
   'zpool replace'.
  scan: resilvered 284K in 0h0m with 0 errors on Mon Mar 14 02:01:48 2016
config:

   NAME                                            STATE     READ WRITE CKSUM
   Midden                                          DEGRADED     0     0     0
     mirror-0                                      DEGRADED     0     0     0
       disk6                                       ONLINE       0     0     0
       media-2AAD0BEC-7AB3-A744-80CA-CA21F00A59A4  OFFLINE      0     0     0
     mirror-1                                      ONLINE       0     0     0
       media-A833F256-1C6B-2F4D-AFBD-03DB0B3D2ADA  ONLINE       0     0     0
       media-6B747E0C-CD67-224E-8837-5CDBAF94FCA5  ONLINE       0     0     0


The offlined disk (...9A4) used to show a "4" in the CKSUM column until I offlined it and cleared it.

My intention was to resilver it with the contents of disk6. (The problem disk is currently known as disk5.)

Now I can't get ZFS to accept the drive back into the pool:

Code: Select all
$ sudo zpool replace Midden media-2AAD0BEC-7AB3-A744-80CA-CA21F00A59A4 disk5
cannot replace media-2AAD0BEC-7AB3-A744-80CA-CA21F00A59A4 with disk5: no such device in pool
$ sudo zpool attach Midden disk6 disk5
cannot attach disk5 to disk6: disk5 is busy


I don't have another disk this large to replace it with at the moment, and I'm not convinced the problem is with the disk, anyway. The earlier symptom is that the disk enclosure lights started behaving erratically, so I moved both disks to separate USB enclosures, and am trying to rebuild the pool in that temporary state while I diagnose the enclosure.

The physical layout is two Lacie 2big Thunderbolt enclosures, daisy-chained. These are not RAID enclosures; the OS sees them as hosting two separate disks. I started with one 2x4TB mirror, then later added a 2x6TB mirror. It is the latter that is having trouble.

EDIT: I've already tried exporting the pool and reimporting. I've also rebooted several times. I even tried to work on it in single-user-mode, but I couldn't get the driver to load; something about "Trust cache is disabled." and invalid driver signatures.
tangent
 
Posts: 47
Joined: Tue Nov 11, 2014 6:58 pm

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby lundman » Mon Mar 14, 2016 5:03 pm

The disk renumbering is a concern on OSX, which is why we recommend people import pools using the command "zpool import -d /var/run/by-id/ $POOLNAME". Once you have done that, you should try to attach disk that way, and you can also use "zpool status -g" to see devices by GUID, and attach using GUIDs.
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby tangent » Tue Mar 15, 2016 6:02 am

lundman wrote:The disk renumbering is a concern on OSX, which is why we recommend people import pools using the command "zpool import -d /var/run/by-id/ $POOLNAME".


Only the "loose" disk was numbered that way. The rest of the pool was imported with the by-id scheme.

Nevertheless, I exported and reimported it under the by-id scheme, then re-onlined the disk by GUID, but I still get errors when trying to reattach the disk to the pool:

Code: Select all
$ zpool status
  pool: Midden
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
   the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://zfsonlinux.org/msg/ZFS-8000-2Q
  scan: resilvered 284K in 0h0m with 0 errors on Mon Mar 14 02:01:48 2016
config:

   NAME                                            STATE     READ WRITE CKSUM
   Midden                                          DEGRADED     0     0     0
     mirror-0                                      DEGRADED     0     0     0
       media-67775132-8937-914D-8D25-31512A24D764  ONLINE       0     0     0
       media-2AAD0BEC-7AB3-A744-80CA-CA21F00A59A4  UNAVAIL      0     0     0  cannot open
     mirror-1                                      ONLINE       0     0     0
       media-A833F256-1C6B-2F4D-AFBD-03DB0B3D2ADA  ONLINE       0     0     0
       media-6B747E0C-CD67-224E-8837-5CDBAF94FCA5  ONLINE       0     0     0

errors: No known data errors
$ sudo zpool replace Midden 15140371887263661994 /var/run/disk/by-id/media-9B6C3D21-6447-5548-9527-D7C179337FF1
cannot replace 15140371887263661994 with /var/run/disk/by-id/media-9B6C3D21-6447-5548-9527-D7C179337FF1: /var/run/disk/by-id/media-9B6C3D21-6447-5548-9527-D7C179337FF1 is busy
$ zpool status -g
  pool: Midden
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
   the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://zfsonlinux.org/msg/ZFS-8000-2Q
  scan: resilvered 284K in 0h0m with 0 errors on Mon Mar 14 02:01:48 2016
config:

   NAME                      STATE     READ WRITE CKSUM
   Midden                    DEGRADED     0     0     0
     18314333138307113294    DEGRADED     0     0     0
       1098692862752021739   ONLINE       0     0     0
       15140371887263661994  UNAVAIL      0     0     0  cannot open
     12185770036713117011    ONLINE       0     0     0
       2346621762543341536   ONLINE       0     0     0
       11271653767743057240  ONLINE       0     0     0

errors: No known data errors
$ sudo zpool attach Midden 1098692862752021739 /var/run/disk/by-id/media-9B6C3D21-6447-5548-9527-D7C179337FF1
cannot attach /var/run/disk/by-id/media-9B6C3D21-6447-5548-9527-D7C179337FF1 to 1098692862752021739: /var/run/disk/by-id/media-9B6C3D21-6447-5548-9527-D7C179337FF1 is busy
$ sudo zpool attach Midden 18314333138307113294 /var/run/disk/by-id/media-9B6C3D21-6447-5548-9527-D7C179337FF1
cannot attach /var/run/disk/by-id/media-9B6C3D21-6447-5548-9527-D7C179337FF1 to 18314333138307113294: can only attach to mirrors and top-level disks


The GUID change is because I repartitioned the booted-out disk, thinking maybe it was refusing to use a disk from a "different" ZFS pool. (I've seen this on hardware RAIDs: until you zero the first meg or so of a disk that was once part of one RAID set, you can't use it on a different card.)

The new media ID I looked up after restarting InvariantDisks and finding the link to /dev/disk5s1. I believe I'm still right to be talking about disk5 based on the disk list in Disk Utility: the other three disks in the pool are disk3, disk4, and disk6. Process of elimination.

disk5 got a proper O3X disk partitioning scheme earlier. Before reporting that the disk is "busy", O3X repartitioned it. During some of these commands, I get the "Initialize Disk" popup, but I always tell it to ignore the "new" disk.
tangent
 
Posts: 47
Joined: Tue Nov 11, 2014 6:58 pm

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby lundman » Tue Mar 15, 2016 4:23 pm

That it is busy is interesting, I'm guessing it isn't mounted (you probably checked for that), but is corestorage holding the device busy maybe?
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby tangent » Tue Mar 15, 2016 6:16 pm

lundman wrote:I'm guessing it isn't mounted (you probably checked for that)


Yes, I checked.

I've currently got that disk in a separate 1-disk enclosure, and I can see it disappear and reappear in Disk Utility when I toggle the power on the enclosure.

The enclosure supports USB2, Firewire 400, FW 800, and eSATA. I've tried it on both direct USB and eSATA via a Thunderbolt adapter.

Disk Utility is able to partition and format the drive.

is corestorage holding the device busy maybe?


Is `diskutil coreStorage list` the best way to answer that question? If so, then no; disk5 doesn't show up in its output.
tangent
 
Posts: 47
Joined: Tue Nov 11, 2014 6:58 pm

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby lundman » Wed Mar 16, 2016 5:25 pm

If disk5 is your empty disk, can you create a pool on it, just a temporary one.
"zpool create TEST disk5"?
That it thinks the disk is busy is stopping all the commands.
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby tangent » Wed Mar 16, 2016 7:05 pm

I'm in the middle of zero'ing the whole disk. That does two things for me:

1. There's no possible way that software could have the disk as busy if it doesn't know what the disk is. If it still says "busy" after this, it's a hardware problem.

2. Before all of this happened, I saw 4 checksum errors in zpool status for this disk, so this will do a "poor man's Spinrite" on the disk, forcing a remap of all bad sectors. If I can get this disk back into the pool and the checksum errors reappear, I'll know it's dying.

It's a 6 TiB disk, so it'll be about another 13 hours before it's fully flattened. I'll let you know what happens after that.
tangent
 
Posts: 47
Joined: Tue Nov 11, 2014 6:58 pm

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby Brendon » Wed Mar 16, 2016 8:30 pm

Does the disk show any SMART status issues?

tangent wrote:I'm in the middle of zero'ing the whole disk. That does two things for me:

1. There's no possible way that software could have the disk as busy if it doesn't know what the disk is. If it still says "busy" after this, it's a hardware problem.

2. Before all of this happened, I saw 4 checksum errors in zpool status for this disk, so this will do a "poor man's Spinrite" on the disk, forcing a remap of all bad sectors. If I can get this disk back into the pool and the checksum errors reappear, I'll know it's dying.

It's a 6 TiB disk, so it'll be about another 13 hours before it's fully flattened. I'll let you know what happens after that.
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby tangent » Thu Mar 17, 2016 9:13 am

Brendon wrote:Does the disk show any SMART status issues?

The disk is currently connected via USB, and Mac OS X doesn't support SMART over USB.

This whole saga started with the LaCie 2big Thunderbolt enclosure acting screwy, so I'm reluctant to put this disk back in that enclosure just to run a SMART test on it. Also, the SMART tests were being aborted "by the drive" according to smartctl when I did have them in that enclosure. I'm not sure if that's a Seagate thing, a Thunderbolt thing, or a LaCie thing.

Bottom line, I don't expect to be able to do SMART test on this disk any time soon, if ever.

HOWEVER: Zeroing the disk fixed the "disk busy" problem! Huzzah! The disk is back in the pool and being resilvered now.

In retrospect, I think I could have avoided some of this by using zpool detach or zpool split to remove the drive from the pool instead of zpool offline. That would let me use the ZFS tools to remove any labels/uberblocks/etc from the disk so that when it's reattached ZFS doesn't say "Hey, that's mine!" Zeroing the disk was a heavy-handed way to achieve the same effect.
tangent
 
Posts: 47
Joined: Tue Nov 11, 2014 6:58 pm

Re: zpool replace failing with 2-mirror pool: disk5 is busy

Postby lundman » Fri Mar 18, 2016 12:39 am

There is also the zpool labelclear command. Although I am unsure just how "good" that command is at the moment.

Glad you could work it out though.
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan


Return to General Help

Who is online

Users browsing this forum: No registered users and 23 guests