Replacing drive, keeps switching to 'removed'

All your general support questions for OpenZFS on OS X.

Replacing drive, keeps switching to 'removed'

Postby spud603 » Thu Sep 28, 2017 8:05 am

Hi,

I am trying to replace a drive in a raidz pool. The pool used to have 3 1TB drives and one went bad. I am trying to replace it with a 2TB drive that I have (I know -- this won't give me any more space and is a waste of that extra TB :)

Things seemed ok at first. The `zpool replace` command works fine and resilvering begins:
Code: Select all
  pool: franklin
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
   continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Sep 28 10:29:37 2017
    22.4M scanned out of 2.34T at 104K/s, (scan is slow, no estimated time)
    7.29M resilvered, 0.00% done
config:

   NAME                                              STATE     READ WRITE CKSUM
   franklin                                          DEGRADED     0     0     0
     raidz1-0                                        DEGRADED     0     0     0
       media-E97A7C83-B4CA-4DAA-BF5A-8F890ED5B6C2    ONLINE       0     0     0
       media-9241F0DE-1023-438C-81A7-31D78390484A    ONLINE       0     0     0
       replacing-2                                   DEGRADED     0     0    18
         10979512111072834080                        UNAVAIL      0     0     0  was /private/var/run/disk/by-id/media-FFE58D75-EBC6-4B1A-ACB3-701BD9B85FB2
         media-5250DAB9-9838-6345-98EE-539D7FD591F7  ONLINE       0     0     0  (resilvering)

But a few minutes into resilvering, the replacement drive state switches from "ONLINE" to "REMOVED":
Code: Select all
  pool: franklin
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
   continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Sep 28 10:29:37 2017
    33.7M scanned out of 2.34T at 67.4K/s, (scan is slow, no estimated time)
    7.29M resilvered, 0.00% done
config:

   NAME                                              STATE     READ WRITE CKSUM
   franklin                                          DEGRADED     0     0     0
     raidz1-0                                        DEGRADED     0     0     0
       media-E97A7C83-B4CA-4DAA-BF5A-8F890ED5B6C2    ONLINE       0     0     0
       media-9241F0DE-1023-438C-81A7-31D78390484A    ONLINE       0     0     0
       replacing-2                                   UNAVAIL      0     2    18  insufficient replicas
         10979512111072834080                        UNAVAIL      0     0     0  was /private/var/run/disk/by-id/media-FFE58D75-EBC6-4B1A-ACB3-701BD9B85FB2
         media-5250DAB9-9838-6345-98EE-539D7FD591F7  REMOVED      0     0     0  (resilvering)

What's more, the drive stops showing up in `diskutil list` and is no longer present in `/dev/`. It's like the reslivering process obliterates the drive from the OS's awareness.

The pool keeps scanning, but resilvering sticks where it is (0.0%). If I reboot the computer it starts this whole process over again with the same outcome. `zpool scrub` and `zpool clear` do what they're supposed to but don't fix the issue.

Anyone have thoughts? Is it a bad drive? An incompatible drive? (I had to set the ashift value of the new drive to be compatible with the pool...). A problem with the OS? Where can I go to diagnose what's going on?

Any help would be much appreciated!
spud603
 
Posts: 8
Joined: Sat Sep 09, 2017 9:42 pm

Re: Replacing drive, keeps switching to 'removed'

Postby spud603 » Thu Sep 28, 2017 8:09 am

I found some potentially relevant lines from my system log. It looks like the resilver starts, then there's a couple of kernel errors ("rxFrame - mbuf allocation failed"?), and then the system goes about removing all the filesystem references to the disk. I'm way out of my depth on interpreting those errors though :)

Code: Select all
9/28/17 10:29:37.486 AM zed[923]: eid=62 class=resilver.start pool=franklin
9/28/17 10:30:02.000 AM kernel[0]: 0xffffff82dcc8a700, 0x017105cc  Intel82574L::rxFrame - mbuf allocation failed
9/28/17 10:35:11.000 AM kernel[0]: Failed to issue COM RESET successfully after 3 attempts. Failing...
9/28/17 10:35:11.496 AM InvariantDisk[107]: Removing symlink: /var/run/disk/by-path/PCI0@0-SATA@1F,2-PRT4@4-PMP@0-@0:0
9/28/17 10:35:11.496 AM InvariantDisk[107]: Removing symlink: /var/run/disk/by-serial/ST2000DM006-2DM164-Z560JFV5
9/28/17 10:35:11.496 AM InvariantDisk[107]: Removing symlink: /var/run/disk/by-path/PCI0@0-SATA@1F,2-PRT4@4-PMP@0-@0:9
9/28/17 10:35:11.497 AM InvariantDisk[107]: Removing symlink: /var/run/disk/by-id/media-197CE511-0255-914C-8EC3-2E2F6AC895EE
9/28/17 10:35:11.497 AM InvariantDisk[107]: Removing symlink: /var/run/disk/by-serial/ST2000DM006-2DM164-Z560JFV5:9
9/28/17 10:35:11.498 AM InvariantDisk[107]: Removing symlink: /var/run/disk/by-path/PCI0@0-SATA@1F,2-PRT4@4-PMP@0-@0:1
9/28/17 10:35:11.498 AM InvariantDisk[107]: Removing symlink: /var/run/disk/by-id/media-5250DAB9-9838-6345-98EE-539D7FD591F7
9/28/17 10:35:11.499 AM InvariantDisk[107]: Removing symlink: /var/run/disk/by-serial/ST2000DM006-2DM164-Z560JFV5:1
9/28/17 10:35:11.503 AM zed[943]: eid=64 class=removed
9/28/17 10:35:11.524 AM zed[945]: eid=63 class=removed
9/28/17 10:35:11.538 AM zed[947]: eid=65 class=delay pool=franklin
9/28/17 10:35:11.551 AM zed[949]: eid=66 class=delay pool=franklin
9/28/17 10:35:11.570 AM zed[951]: eid=67 class=removed
9/28/17 10:35:11.583 AM zed[953]: eid=68 class=vdev.no_replicas pool=franklin
spud603
 
Posts: 8
Joined: Sat Sep 09, 2017 9:42 pm

Re: Replacing drive, keeps switching to 'removed'

Postby lundman » Thu Sep 28, 2017 4:11 pm

If the disk is removed from "diskutil list" then the OS doesn't see it any more, and ZFS just uses disks that the OS tells us it has. So you are having a larger problem. Do the usual checking of cables (different one?) but, probably the drive is bad.

mbufs are used with network, so initial thought is that its not related, but the timing is sure interesting.
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Replacing drive, keeps switching to 'removed'

Postby Jimserac » Thu Sep 28, 2017 5:42 pm

Don't know if this has anything to do with it, but I've got a mirrored Toshiba 2 Terrabyte disk (two mirrored 1 Tb partitions) encrypted under corestorage.

Works fine but every now and then after it sits for a while, intermittent I get a message in the upper right corner of my screen, as though I had prematurely unplugged
the USB cable. I haven't, of course. Then the ZFS icon is still there but greyed out and if I do a zpool status, it shows both mirrored disks removed.

I have to reboot and re-import it.

No errors show after this when I do a zpool status.

Am not sure exactly what triggers this.
Jimserac
 
Posts: 12
Joined: Mon Aug 29, 2016 11:40 am

Re: Replacing drive, keeps switching to 'removed'

Postby spud603 » Fri Sep 29, 2017 6:11 am

Thanks for the ideas.

This drive is in one of the internal bays in an old mac pro tower, so there aren't really cables to check. It's possible that something went wrong with that bay specifically and that that is also what was causing the funny behavior in the drive I replaced. I'll experiment with the old drive and the new one connected over USB to see if they work ok.

But I think you may be right that it's a deeper hardware issue :(
spud603
 
Posts: 8
Joined: Sat Sep 09, 2017 9:42 pm


Return to General Help

Who is online

Users browsing this forum: Google [Bot] and 32 guests