Drive Keeps removing itself from my pool

New to OpenZFS on OS X (Or ZFS in general)? Ask your questions here!

Drive Keeps removing itself from my pool

Postby GroovinBuddha » Thu Nov 05, 2020 10:17 pm

Hello all,

I'm having a bit of a conundrum I'm hoping is an easy answer for someone. Every once in a while I get this message when I check zpool status:

(base) Ed-Bowles-MacPro:~ edbowles$ zpool status
pool: Data
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 3.19M in 0 days 00:00:02 with 0 errors on Sat Oct 31 13:30:46 2020
config:

NAME STATE READ WRITE CKSUM
Data DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
media-E649B820-1CCD-134B-99BE-3A07727ECC3C ONLINE 0 0 0
media-4559FE62-EE03-9D45-B15D-374922DD5913 ONLINE 0 0 0
media-5BA3DE3F-9709-B44B-9B70-0176689F140F REMOVED 0 0 0
media-05811ABD-631F-2241-8606-42A8F9961D27 ONLINE 0 0 0

errors: No known data errors

I usually just online the drive and all is good
occasionally it throws up 1 error and I clear it (not sure if that is what I should be doing)

the problem usually persists for a few days then it stops removing itself for weeks

I'm guessing that it is the beginning signs of a failing drive and I should probably be replacing it...
However, I don't know which physical drive is the culprit
This: media-5BA3DE3F-9709-B44B-9B70-0176689F140F doesn't seem to match any readout in disk utility and doesn't seem to match any of the markings on any of my drives.

Does anyone know how I can figure out which drive is the issue? Please and thank you.
GroovinBuddha
 
Posts: 14
Joined: Thu Jan 16, 2020 8:09 am

Re: Drive Keeps removing itself from my pool

Postby FadingIntoBlue » Fri Nov 06, 2020 3:05 pm

I didn't write the below (think Lundman did) and it is somewhere on the site, but quickly:

Changing the device names on an existing pool can be done by simply exporting the pool and re-importing it with the -d option to specify which new names should be used.

To use the names in / var / run / disk /by-id,

```
$ sudo zpool export tank
$ sudo zpool import -d /var/run/disk/by-id tank
```

To use the names in / var / run / disk /by-serial,

```
$ sudo zpool export tank
$ sudo zpool import -d /var/run/disk/by-serial tank
```

To use the names in / var / run / disk /by-path,

```
$ sudo zpool export tank
$ sudo zpool import -d /var/run/disk/by-path tank
```

To use the less safe (because they vary) BSD disk names in /dev,

```
$ sudo zpool export tank
$ sudo zpool import -d /dev tank
```

Even if you are using invariant paths (by-id, by-serial, or by-path), you can reveal the "normal" BSD disk names at any time,

```
$ zpool status -L tank
```

Combine that with diskutil list and you can tell exactly which disk is which
FadingIntoBlue
 
Posts: 106
Joined: Tue May 27, 2014 12:25 am

Re: Drive Keeps removing itself from my pool

Postby GroovinBuddha » Fri Nov 06, 2020 7:15 pm

Thanks for the try, but I don't think that helps my situation and/or skill level.

In chatting with a co-worker today I have gathered a bit more info that hopefully will shed some light on the problem.

When I pull up a system report I can see for my setup that: (in order)
disk4 is in Bay 1
disk 2 is in Bay 2
disk 6 is in Bay 3
disk 5 is in Bay 4
These are the disks used for my ZFS setup and they all give me a readout to the effect of:

WDC WD4003FZEX-00Z4SA0:

Capacity: 4 TB (4,000,787,030,016 bytes)
Model: WDC WD4003FZEX-00Z4SA0
Revision: 01.01A01
Serial Number: WD-WCC5D2UUA0P2
Native Command Queuing: Yes
Queue Depth: 32
Removable Media: No
Detachable Drive: No
BSD Name: disk5
Rotational Rate: 7200
Medium Type: Rotational
Bay Name: Bay 4
Partition Map Type: GPT (GUID Partition Table)
S.M.A.R.T. status: Verified
Volumes:
Data:
Capacity: 4 TB (4,000,776,716,288 bytes)
File System: ZFS
BSD Name: disk5s1
Content: ZFS
disk5s9:
Capacity: 8.4 MB (8,388,608 bytes)
BSD Name: disk5s9
Content: 6A945A3B-1DD2-11B2-99A6-080020736631

When I go to another drive on my system (my boot drive), I get:

KINGSTON SHSS37A960G:

Capacity: 960.2 GB (960,197,124,096 bytes)
Model: KINGSTON SHSS37A960G
Revision: SAFM00.Y
Serial Number: 50026B7268021959
Native Command Queuing: Yes
Queue Depth: 32
Removable Media: No
Detachable Drive: No
BSD Name: disk0
Medium Type: Solid State
TRIM Support: Yes
Partition Map Type: GPT (GUID Partition Table)
S.M.A.R.T. status: Verified
Volumes:
EFI:
Capacity: 209.7 MB (209,715,200 bytes)
File System: MS-DOS FAT32
BSD Name: disk0s1
Content: EFI
Volume UUID: 0E239BC6-F960-3107-89CF-1C97F78BB46B
disk0s2:
Capacity: 959.99 GB (959,987,367,936 bytes)
BSD Name: disk0s2
Content: Apple_APFS

I can't help but notice the striking similarity between the formatting of my boot drive's UUID and the drive identifier used by ZFS when I check it's status.

I don't think it's a coincidence.

It seems that my system report is not able to see the device's UUID in a ZFS format and ZFS is only using UUID as an identifier.

I'm looking for a way to correlate the two.
I could guess that it is probably the drive in Bay 3 as it makes logical sense, but I'm hoping for a learning opportunity instead of just shutting down my system and systematically pulling drives one by one until I get it right.

Thanks again
GroovinBuddha
 
Posts: 14
Joined: Thu Jan 16, 2020 8:09 am

Re: Drive Keeps removing itself from my pool

Postby jawbroken » Fri Nov 06, 2020 11:04 pm

Code: Select all
zpool status -L

will give you you the diskXX number of the failing drive, which you can then look up in System Report as you've shown and get the serial number of the drive. This is unique to the drive and will be printed on the label on the top of the drive.
jawbroken
 
Posts: 61
Joined: Wed Apr 01, 2015 4:46 am

Re: Drive Keeps removing itself from my pool

Postby GroovinBuddha » Sat Nov 07, 2020 6:50 pm

jawbroken,

Sweet, thanks for the help, that is exactly the information I was looking for. :D

If you don't mind, can I pick your brain on 3 more subsequent questions?

1. Is the diskXX the BSD number/identifier? (not that it matters much, just for my own knowledge)

2. What does the -L mean?

3. Was I correct that zpool status is spitting out the UUID of the drive?

Thanks again

Now I just have to wait for the drive to remove itself and track it down.
GroovinBuddha
 
Posts: 14
Joined: Thu Jan 16, 2020 8:09 am

Re: Drive Keeps removing itself from my pool

Postby jawbroken » Sun Nov 08, 2020 2:54 am

1. Is the diskXX the BSD number/identifier? (not that it matters much, just for my own knowledge)

Yes, I believe people generally refer to them as BSD disk names or similar.
2. What does the -L mean?

I'm not sure what the L is supposed to stand for. Perhaps "label"?
3. Was I correct that zpool status is spitting out the UUID of the drive?

I'm not an expert in how the invariant disk code works. It's some sort of unique identifier, that I believe is supposed to work around the issue of disks appearing with different numbers after restarting, I'm not sure it has any particular meaning. You might notice that a disk that was disk2 at one point in time is now disk4, for example, but the ID will be stable. You can get the ZFS commands to report the drive serial number instead of the ID, which is probably more useful because you can actual read it on the physical drive, by following the instructions in the post above, i.e.
Code: Select all
sudo zpool export tank
sudo zpool import -d /var/run/disk/by-serial tank

At least for me, however, this doesn't survive a reboot so I generally don't bother.
jawbroken
 
Posts: 61
Joined: Wed Apr 01, 2015 4:46 am

Re: Drive Keeps removing itself from my pool

Postby GroovinBuddha » Sun Nov 08, 2020 9:31 pm

Thanks for the info, it was a big help.

I guess I still have a bunch of research to do, but this gets me past my immediate issue.

Cheers!
GroovinBuddha
 
Posts: 14
Joined: Thu Jan 16, 2020 8:09 am


Return to Absolute Beginners Section

Who is online

Users browsing this forum: Google [Bot] and 13 guests

cron