Replacing a disk question

New to OpenZFS on OS X (Or ZFS in general)? Ask your questions here!

Replacing a disk question

Postby photonclock » Sat Oct 05, 2019 3:25 pm

TLDR: When I do a replace, how do I get the new drive to auto-partition, or is there some steps I need to do first to partition the disk and label the partition before trying to use it as a replacement?


I am testing v1.9.2 with MacOS 10.13.6

I have a 24 disk pool (3vdev x 8disk raidz2). I'm testing with very old enterprise hard drives that are error prone (good for testing but not much else).

After a few days of testing, one of the drives was showing a lot of checksum errors, and the other one showed status "REMOVED" (presumably it had gone offline at some point).

I completely blasted the partition table on the REMOVED drive by zeroing out the 1st & last 100 sectors on the disk.

Rebooted, affirmed pool showed that drive as UNAVAIL with name 15144733226863810510 and /dev/disk18 in diskutil.

Then I did:
zpool replace -f RAID_Z2 15144733226863810510 disk18 (disk18 was the same drive who partition table I had wiped, not a new drive)

The resilvering completed and showed success.

After it was done, that drive in the pool doesn't have a label like all the others, it was just "disk18", and when I do "diskutil list" I notice it doesn't have a GPT partition table like all the others that got auto-partitioned during creation of the original pool.

Example of a few drives from diskutil list:

Code: Select all
/dev/disk17 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *3.0 TB     disk17
   1:                        ZFS RAID_Z2                 3.0 TB     disk17s1
   2: 6A945A3B-1DD2-11B2-99A6-080020736631               8.4 MB     disk17s9

/dev/disk18 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:                            RAID_Z2                *3.0 TB     disk18

/dev/disk19 (external, physical):
   #:                       TYPE NAME                    SIZE       IDENTIFIER
   0:      GUID_partition_scheme                        *3.0 TB     disk19
   1:                        ZFS RAID_Z2                 3.0 TB     disk19s1
   2: 6A945A3B-1DD2-11B2-99A6-080020736631               8.4 MB     disk19s9


Then I rebooted, and now I have what you see below (ignore the second degraded drive under raidz2-2 for now - that's the drive that was throwing a lot of checksum errors and is probably the same type of failure occurring as the drive under raidz2-1)

Code: Select all
user$ zpool status -L
  pool: RAID_Z2
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
   the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://zfsonlinux.org/msg/ZFS-8000-2Q
  scan: resilvered 176G in 0 days 00:33:21 with 0 errors on Sat Oct  5 15:39:30 2019
config:

   NAME                     STATE     READ WRITE CKSUM
   RAID_Z2                  DEGRADED     0     0     0
     raidz2-0               ONLINE       0     0     0
       disk20               ONLINE       0     0     0
       disk11               ONLINE       0     0     0
       disk23               ONLINE       0     0     0
       disk15               ONLINE       0     0     0
       disk17               ONLINE       0     0     0
       disk12               ONLINE       0     0     0
       disk21               ONLINE       0     0     0
       disk24               ONLINE       0     0     0
     raidz2-1               ONLINE       0     0     0
       disk25               ONLINE       0     0     0
       disk19               ONLINE       0     0     0
       disk13               ONLINE       0     0     0
       disk14               ONLINE       0     0     0
       disk28               ONLINE       0     0     0
       disk29               ONLINE       0     0     0
       3647308650291240010  UNAVAIL      0     0     0  was /dev/disk18
       disk16               ONLINE       0     0     0
     raidz2-2               DEGRADED     0     0     0
       2259187317473571956  UNAVAIL      0     0     0  was /private/var/run/disk/by-id/media-E0D2565B-EC26-BA41-98BF-DA581CAF0427
       disk31               ONLINE       0     0     0
       disk33               ONLINE       0     0     0
       disk30               ONLINE       0     0     0
       disk32               ONLINE       0     0     0
       disk22               ONLINE       0     0     0
       disk27               ONLINE       0     0     0
       disk34               ONLINE       0     0     0


Code: Select all
user$ zpool status
  pool: RAID_Z2
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
   the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://zfsonlinux.org/msg/ZFS-8000-2Q
  scan: resilvered 176G in 0 days 00:33:21 with 0 errors on Sat Oct  5 15:39:30 2019
config:

   NAME                                            STATE     READ WRITE CKSUM
   RAID_Z2                                         DEGRADED     0     0     0
     raidz2-0                                      ONLINE       0     0     0
       media-DED9DEDD-2894-2742-9267-7B583D028270  ONLINE       0     0     0
       media-F2AB2FE8-9F95-994A-937C-BAB57A873378  ONLINE       0     0     0
       media-43DC90AC-68BD-0145-BA9D-030915A96DF3  ONLINE       0     0     0
       media-69E09660-AC23-3D46-8D42-D9E6C95A536C  ONLINE       0     0     0
       media-97BAAC73-DB89-EA4B-BD2E-DAF2EEA807A5  ONLINE       0     0     0
       media-53C208FD-2439-2449-90F6-8FC77A18C4A9  ONLINE       0     0     0
       media-69CE8F2F-03FF-714B-86B0-00CEC2C5FF40  ONLINE       0     0     0
       media-8AD8C11B-2F72-2442-90AF-6078A9458873  ONLINE       0     0     0
     raidz2-1                                      ONLINE       0     0     0
       media-7581185B-9619-A84B-9ED9-0CD7A28FCA41  ONLINE       0     0     0
       media-A756BB36-9D6A-644D-B470-D421C1272DFB  ONLINE       0     0     0
       media-EFE0AA5F-07A5-A641-B518-A6E9DEF8C1FF  ONLINE       0     0     0
       media-5B634E04-4DBD-E346-9682-7CAFAD06BE10  ONLINE       0     0     0
       media-F9F3843F-67C3-8D41-BBE1-FFC646B9B5F8  ONLINE       0     0     0
       media-6C25B663-898E-DA48-A45D-933C8F29AC0A  ONLINE       0     0     0
       3647308650291240010                         UNAVAIL      0     0     0  was /dev/disk18
       media-1EDF7283-A8DC-4140-A441-0573D9992E50  ONLINE       0     0     0
     raidz2-2                                      DEGRADED     0     0     0
       2259187317473571956                         UNAVAIL      0     0     0  was /private/var/run/disk/by-id/media-E0D2565B-EC26-BA41-98BF-DA581CAF0427
       media-1A9817BA-3EC8-4E40-B7E4-8AC3E7B23ABE  ONLINE       0     0     0
       media-2CE698C0-C2E2-8E49-9181-2424075602E9  ONLINE       0     0     0
       media-9E9CED32-D96E-F94F-80C3-C71FC546BAF1  ONLINE       0     0     0
       media-7B6E8022-F1DA-CA4D-A403-DF8D337465D2  ONLINE       0     0     0
       media-C2B48433-A7F6-F04E-8B3D-9E8D1C7F04F0  ONLINE       0     0     0
       media-685A1BF1-F23F-D844-8108-0676BEEF0AF7  ONLINE       0     0     0
       media-5C14B394-41F1-3D42-9409-CDB8FBCE59F2  ONLINE       0     0     0



To summarize:
- What is the correct way to replace a drive and end up with the proper GPT partition arrangement?

Also, related question:

When more than 1 drive is in a failed condition, as seen above:
raidz2-1: 3647308650291240010 UNAVAIL 0 0 0 was /dev/disk18
raidz2-2: 2259187317473571956 UNAVAIL 0 0 0 was /private/var/run/disk/by-id/media-E0D2565B-EC26-BA41-98BF-DA581CAF0427

How do I determine which failed drive correlates to which physical drive /dev/disk## ?

Thanks!
Last edited by photonclock on Mon Oct 07, 2019 3:21 pm, edited 1 time in total.
photonclock
 
Posts: 11
Joined: Sat Oct 05, 2019 2:40 pm

Re: Replacing a disk question

Postby photonclock » Sat Oct 05, 2019 3:42 pm

Oddly, after exporting, reboot and import:

The UNAVAIL under raidz2-2 resolved itself for now. That could just be because some of these disks are intermittently approaching death.

But the replaced disk under raidz2-1 now shows:
PCI0@0-IOU2@1-I2PS@0-PPB4@4-PXS4@0-@11:0 ONLINE 0 0 0

Why the different "PCI..." label/name now?


Code: Select all
pool: RAID_Z2
 state: ONLINE
  scan: resilvered 176G in 0 days 00:33:21 with 0 errors on Sat Oct  5 15:39:30 2019
config:

   NAME                                            STATE     READ WRITE CKSUM
   RAID_Z2                                         ONLINE       0     0     0
     raidz2-0                                      ONLINE       0     0     0
       media-DED9DEDD-2894-2742-9267-7B583D028270  ONLINE       0     0     0
       media-F2AB2FE8-9F95-994A-937C-BAB57A873378  ONLINE       0     0     0
       media-43DC90AC-68BD-0145-BA9D-030915A96DF3  ONLINE       0     0     0
       media-69E09660-AC23-3D46-8D42-D9E6C95A536C  ONLINE       0     0     0
       media-97BAAC73-DB89-EA4B-BD2E-DAF2EEA807A5  ONLINE       0     0     0
       media-53C208FD-2439-2449-90F6-8FC77A18C4A9  ONLINE       0     0     0
       media-69CE8F2F-03FF-714B-86B0-00CEC2C5FF40  ONLINE       0     0     0
       media-8AD8C11B-2F72-2442-90AF-6078A9458873  ONLINE       0     0     0
     raidz2-1                                      ONLINE       0     0     0
       media-7581185B-9619-A84B-9ED9-0CD7A28FCA41  ONLINE       0     0     0
       media-A756BB36-9D6A-644D-B470-D421C1272DFB  ONLINE       0     0     0
       media-EFE0AA5F-07A5-A641-B518-A6E9DEF8C1FF  ONLINE       0     0     0
       media-5B634E04-4DBD-E346-9682-7CAFAD06BE10  ONLINE       0     0     0
       media-F9F3843F-67C3-8D41-BBE1-FFC646B9B5F8  ONLINE       0     0     0
       media-6C25B663-898E-DA48-A45D-933C8F29AC0A  ONLINE       0     0     0
       PCI0@0-IOU2@1-I2PS@0-PPB4@4-PXS4@0-@11:0    ONLINE       0     0     0
       media-1EDF7283-A8DC-4140-A441-0573D9992E50  ONLINE       0     0     0
     raidz2-2                                      ONLINE       0     0     0
       media-E0D2565B-EC26-BA41-98BF-DA581CAF0427  ONLINE       0     0     0
       media-1A9817BA-3EC8-4E40-B7E4-8AC3E7B23ABE  ONLINE       0     0     0
       media-2CE698C0-C2E2-8E49-9181-2424075602E9  ONLINE       0     0     0
       media-9E9CED32-D96E-F94F-80C3-C71FC546BAF1  ONLINE       0     0     0
       media-7B6E8022-F1DA-CA4D-A403-DF8D337465D2  ONLINE       0     0     0
       media-C2B48433-A7F6-F04E-8B3D-9E8D1C7F04F0  ONLINE       0     0     0
       media-685A1BF1-F23F-D844-8108-0676BEEF0AF7  ONLINE       0     0     0
       media-5C14B394-41F1-3D42-9409-CDB8FBCE59F2  ONLINE       0     0     0


Code: Select all
  pool: RAID_Z2
 state: ONLINE
  scan: resilvered 64K in 0 days 00:00:01 with 0 errors on Sat Oct  5 16:34:12 2019
config:
   NAME               STATE     READ WRITE CKSUM
   RAID_Z2            ONLINE       0     0     0
     raidz2-0         ONLINE       0     0     0
       /dev/disk20s1  ONLINE       0     0     0
       /dev/disk11s1  ONLINE       0     0     0
       /dev/disk23s1  ONLINE       0     0     0
       /dev/disk15s1  ONLINE       0     0     0
       /dev/disk17s1  ONLINE       0     0     0
       /dev/disk12s1  ONLINE       0     0     0
       /dev/disk21s1  ONLINE       0     0     0
       /dev/disk24s1  ONLINE       0     0     0
     raidz2-1         ONLINE       0     0     0
       /dev/disk25s1  ONLINE       0     0     0
       /dev/disk19s1  ONLINE       0     0     0
       /dev/disk13s1  ONLINE       0     0     0
       /dev/disk14s1  ONLINE       0     0     0
       /dev/disk28s1  ONLINE       0     0     0
       /dev/disk29s1  ONLINE       0     0     0
       /dev/disk26    ONLINE       0     0     0
       /dev/disk16s1  ONLINE       0     0     0
     raidz2-2         ONLINE       0     0     0
       /dev/disk18s1  ONLINE       0     0     0
       /dev/disk31s1  ONLINE       0     0     0
       /dev/disk33s1  ONLINE       0     0     0
       /dev/disk30s1  ONLINE       0     0     0
       /dev/disk32s1  ONLINE       0     0     0
       /dev/disk22s1  ONLINE       0     0     0
       /dev/disk27s1  ONLINE       0     0     0
       /dev/disk34s1  ONLINE       0     0     0
photonclock
 
Posts: 11
Joined: Sat Oct 05, 2019 2:40 pm

Re: Replacing a disk question

Postby photonclock » Sat Oct 05, 2019 6:07 pm

I see now that if I add a spare to a pool, that drive gets the GPT treatment.

I guess what I'm trying to understand is, how does ZFS use a device name/path/label to find members of a pool, and how/when does it matter?
photonclock
 
Posts: 11
Joined: Sat Oct 05, 2019 2:40 pm

Re: Replacing a disk question

Postby lundman » Sun Oct 06, 2019 6:08 pm

When you issued zpool replace with disk18, it should have partitioned it with the EFI partitions that ZFS likes. I'm unsure why it did not.

As for the names, OSX uses /dev/diskXX as you know, but disks get renumbered quite often under OSX which can be dangerous,
so we added a program called InvariantDisks, that create symlinks using disk name, or disk serial, which remains stable, no matter what
/dev/diskXX entry it gets.

InvariantDisks populates /var/run/disk/* where you can chose "by-id" "by-serial" "by-path". I forget which one we use by default, by-serial?

ZFS has a list of places it looks for pools automatically, and uses those disk names when referring to a pool. So for example, if you want
your pool to use /dev/diskXX names, you export the pool, then import it with:

zpool import -d /dev/ RAID_Z2

Now, zpool status will show just /dev/diskXX names (which out for renumbering)

And since it uses InvariantDisk names by default when you import, it implicitly runs

zpool import -d /var/run/disk/by-serial RAID_Z2

So you could export, and import the pool again using -d /var/run/disk/by-serial and in theory, all disks will get
the same name.
User avatar
lundman
 
Posts: 695
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Replacing a disk question

Postby photonclock » Mon Oct 07, 2019 3:16 pm

lundman wrote:When you issued zpool replace with disk18, it should have partitioned it with the EFI partitions that ZFS likes. I'm unsure why it did not.

It is definitely not doing the EFI partitions when I do a replace with a 'new' blank disk. Given that reproducible behavior, I instead experimented with adding spares to the pool (again using blank/unpartitioned disks), and those definitely got the EFI partition treatment. Then I tested faulting a disk in the pool and replacing with the spare, and that worked. So 'spare' behaves as expected, but 'replace' does not.

As for the names, OSX uses /dev/diskXX as you know, but disks get renumbered quite often under OSX which can be dangerous,
so we added a program called InvariantDisks, that create symlinks using disk name, or disk serial, which remains stable, no matter what
/dev/diskXX entry it gets. <snip>
zpool import -d /var/run/disk/by-serial RAID_Z2


Great info. Thank you, I will experiment further with that.

Should the replace issue be considered as a bug?
photonclock
 
Posts: 11
Joined: Sat Oct 05, 2019 2:40 pm

Re: Replacing a disk question

Postby lundman » Mon Oct 07, 2019 3:53 pm

If zpool replace does not partition a blank disk, then yes, that is definitely a bug. It would mean as
a work around, one could "zpool create deleteme /dev/diskNEW" "zpool destroy deleteme" "zpool replace realpool /dev/deaddisk /dev/diskNEW".
But that really should not be required.
User avatar
lundman
 
Posts: 695
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Replacing a disk question

Postby photonclock » Mon Oct 07, 2019 4:09 pm

lundman wrote:If zpool replace does not partition a blank disk, then yes, that is definitely a bug. It would mean as
a work around, one could "zpool create deleteme /dev/diskNEW" "zpool destroy deleteme" "zpool replace realpool /dev/deaddisk /dev/diskNEW".
But that really should not be required.


I think I understand your example. Just for clarity though - at the end, did you mean "zpool replace realpool /dev/deaddisk /dev/diskNEWs1"
photonclock
 
Posts: 11
Joined: Sat Oct 05, 2019 2:40 pm


Return to Absolute Beginners Section

Who is online

Users browsing this forum: No registered users and 1 guest