I am testing v1.9.2 with MacOS 10.13.6
I have a 24 disk pool (3vdev x 8disk raidz2). I'm testing with very old enterprise hard drives that are error prone (good for testing but not much else).
After a few days of testing, one of the drives was showing a lot of checksum errors, and the other one showed status "REMOVED" (presumably it had gone offline at some point).
I completely blasted the partition table on the REMOVED drive by zeroing out the 1st & last 100 sectors on the disk.
Rebooted, affirmed pool showed that drive as UNAVAIL with name 15144733226863810510 and /dev/disk18 in diskutil.
Then I did:
zpool replace -f RAID_Z2 15144733226863810510 disk18 (disk18 was the same drive who partition table I had wiped, not a new drive)
The resilvering completed and showed success.
After it was done, that drive in the pool doesn't have a label like all the others, it was just "disk18", and when I do "diskutil list" I notice it doesn't have a GPT partition table like all the others that got auto-partitioned during creation of the original pool.
Example of a few drives from diskutil list:
- Code: Select all
/dev/disk17 (external, physical):
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *3.0 TB disk17
1: ZFS RAID_Z2 3.0 TB disk17s1
2: 6A945A3B-1DD2-11B2-99A6-080020736631 8.4 MB disk17s9
/dev/disk18 (external, physical):
#: TYPE NAME SIZE IDENTIFIER
0: RAID_Z2 *3.0 TB disk18
/dev/disk19 (external, physical):
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *3.0 TB disk19
1: ZFS RAID_Z2 3.0 TB disk19s1
2: 6A945A3B-1DD2-11B2-99A6-080020736631 8.4 MB disk19s9
Then I rebooted, and now I have what you see below (ignore the second degraded drive under raidz2-2 for now - that's the drive that was throwing a lot of checksum errors and is probably the same type of failure occurring as the drive under raidz2-1)
- Code: Select all
user$ zpool status -L
pool: RAID_Z2
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://zfsonlinux.org/msg/ZFS-8000-2Q
scan: resilvered 176G in 0 days 00:33:21 with 0 errors on Sat Oct 5 15:39:30 2019
config:
NAME STATE READ WRITE CKSUM
RAID_Z2 DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
disk20 ONLINE 0 0 0
disk11 ONLINE 0 0 0
disk23 ONLINE 0 0 0
disk15 ONLINE 0 0 0
disk17 ONLINE 0 0 0
disk12 ONLINE 0 0 0
disk21 ONLINE 0 0 0
disk24 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
disk25 ONLINE 0 0 0
disk19 ONLINE 0 0 0
disk13 ONLINE 0 0 0
disk14 ONLINE 0 0 0
disk28 ONLINE 0 0 0
disk29 ONLINE 0 0 0
3647308650291240010 UNAVAIL 0 0 0 was /dev/disk18
disk16 ONLINE 0 0 0
raidz2-2 DEGRADED 0 0 0
2259187317473571956 UNAVAIL 0 0 0 was /private/var/run/disk/by-id/media-E0D2565B-EC26-BA41-98BF-DA581CAF0427
disk31 ONLINE 0 0 0
disk33 ONLINE 0 0 0
disk30 ONLINE 0 0 0
disk32 ONLINE 0 0 0
disk22 ONLINE 0 0 0
disk27 ONLINE 0 0 0
disk34 ONLINE 0 0 0
- Code: Select all
user$ zpool status
pool: RAID_Z2
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://zfsonlinux.org/msg/ZFS-8000-2Q
scan: resilvered 176G in 0 days 00:33:21 with 0 errors on Sat Oct 5 15:39:30 2019
config:
NAME STATE READ WRITE CKSUM
RAID_Z2 DEGRADED 0 0 0
raidz2-0 ONLINE 0 0 0
media-DED9DEDD-2894-2742-9267-7B583D028270 ONLINE 0 0 0
media-F2AB2FE8-9F95-994A-937C-BAB57A873378 ONLINE 0 0 0
media-43DC90AC-68BD-0145-BA9D-030915A96DF3 ONLINE 0 0 0
media-69E09660-AC23-3D46-8D42-D9E6C95A536C ONLINE 0 0 0
media-97BAAC73-DB89-EA4B-BD2E-DAF2EEA807A5 ONLINE 0 0 0
media-53C208FD-2439-2449-90F6-8FC77A18C4A9 ONLINE 0 0 0
media-69CE8F2F-03FF-714B-86B0-00CEC2C5FF40 ONLINE 0 0 0
media-8AD8C11B-2F72-2442-90AF-6078A9458873 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
media-7581185B-9619-A84B-9ED9-0CD7A28FCA41 ONLINE 0 0 0
media-A756BB36-9D6A-644D-B470-D421C1272DFB ONLINE 0 0 0
media-EFE0AA5F-07A5-A641-B518-A6E9DEF8C1FF ONLINE 0 0 0
media-5B634E04-4DBD-E346-9682-7CAFAD06BE10 ONLINE 0 0 0
media-F9F3843F-67C3-8D41-BBE1-FFC646B9B5F8 ONLINE 0 0 0
media-6C25B663-898E-DA48-A45D-933C8F29AC0A ONLINE 0 0 0
3647308650291240010 UNAVAIL 0 0 0 was /dev/disk18
media-1EDF7283-A8DC-4140-A441-0573D9992E50 ONLINE 0 0 0
raidz2-2 DEGRADED 0 0 0
2259187317473571956 UNAVAIL 0 0 0 was /private/var/run/disk/by-id/media-E0D2565B-EC26-BA41-98BF-DA581CAF0427
media-1A9817BA-3EC8-4E40-B7E4-8AC3E7B23ABE ONLINE 0 0 0
media-2CE698C0-C2E2-8E49-9181-2424075602E9 ONLINE 0 0 0
media-9E9CED32-D96E-F94F-80C3-C71FC546BAF1 ONLINE 0 0 0
media-7B6E8022-F1DA-CA4D-A403-DF8D337465D2 ONLINE 0 0 0
media-C2B48433-A7F6-F04E-8B3D-9E8D1C7F04F0 ONLINE 0 0 0
media-685A1BF1-F23F-D844-8108-0676BEEF0AF7 ONLINE 0 0 0
media-5C14B394-41F1-3D42-9409-CDB8FBCE59F2 ONLINE 0 0 0
To summarize:
- What is the correct way to replace a drive and end up with the proper GPT partition arrangement?
Also, related question:
When more than 1 drive is in a failed condition, as seen above:
raidz2-1: 3647308650291240010 UNAVAIL 0 0 0 was /dev/disk18
raidz2-2: 2259187317473571956 UNAVAIL 0 0 0 was /private/var/run/disk/by-id/media-E0D2565B-EC26-BA41-98BF-DA581CAF0427
How do I determine which failed drive correlates to which physical drive /dev/disk## ?
Thanks!