Page 1 of 1

Whole disks vs. partitions

PostPosted: Mon Feb 25, 2019 7:22 am
by DanielSmedegaardBuus
Hey :)

I am wondering about this. Having used RAID arrays mostly mdadm-style for almost ten years before going ZFS, I "grew up" with the traditional UNIXy separation of layers, where the FS knew nothing about the underlying block devices, and didn't care.

Along came ZFS, and a good ten years ago I created my first zpool, adhering to the rule of giving it the entire drives rather than partitions. Made sense given the docs.

Drives did fail, as they do, and I discovered that not all drives are created equal — that is, they're not the same size, and some are smaller. Also (applicable to my current concerns when running ZFS on USB drives), some USB controllers steal some sectors and report a smaller available space to the OS. I learned a lesson: Assuming that your current drive's size won't be larger than the replacement is not a good assumption.

When I created my new zpool recently, I therefore partitioned them with gdisk and left some hundreds of MBs unused at the end, to allow for slightly smaller drives and controllers stealing space. So, it's been running on the partitions, and I've had a ton of problems, most of which I don't attribute to this setup, but rather that it's a USB-driven pool.

Experiencing that my new strategy sometimes did result in many files missing after a hard reboot, or an external pool of two drives in a USB bay continuing to be written to minutes after exporting the pool, leaves me wondering how true the rule remains that you should assign whole drives and not partitions when you create a pool.

Especially on a Mac, as I originally adopted ZFS using the FUSE variant on Linux, so my initial experience was that of somewhat of a hack, and in my mind I kinda think of O3X of a hack :D

Re: Whole disks vs. partitions

PostPosted: Mon Feb 25, 2019 11:54 pm
by lundman
ZFS will leave a bit untouched at the end of the disk, to avoid that "some disks are slightly smaller" problem. Now that was tweaked a little larger, some 10 years ago, around the time of 1TB disks IIRC. It should not be required to have to do that manually these days. Whole disk is always recommended - a bunch of other things kick in if it is wholedisk, like device write cache etc.

But just because you have chosen to do partitions should certainly not mean you should have problems. With USB we do recommend using the invariantdisk paths, and not /dev/disk.

Re: Whole disks vs. partitions

PostPosted: Tue Feb 26, 2019 11:12 pm
by DanielSmedegaardBuus
Thank you, lundman, that's good to know!

I already do use non-disk paths (actually non-sdX paths, since my main archival pool has been backed by Linux so far). I've actually use GPT labels, naming each disk specifically, so that it's easy for me to know which disk failed when that happens. Not sure if that's possible if giving whole disks to the pool, but if not it's a minor inconvenience :)

I should be ready to recreate my pool from scratch sometime this weekend, when all data is moved off, and I'll have a go at giving `zpool create` the whole drives rather than partitions. It just dawned on me that even if the created partitions aren't quite as small as I'd like, I can always create a sparse file at a desired max size and use that as one of the vdev members, then immediately remove it and replace it with a full disk, so long as autoexpand isn't enabled :)

Thanks for the help!

Re: Whole disks vs. partitions

PostPosted: Wed Feb 27, 2019 2:22 am
by nodarkthings
I've been using "heavy" partitioning for years, with a mix of bootable HFS+ and 3 ZFS pools on the same drive (the sandwich way...) and I can confirm this is not the ideal way to go. ;)
Depending on which version of O3X, I had issues or not, so I do careful testing at each new version on a test MacOS partition, creating new pool, deleting it, before updating every system partition and the pools.
To be precise: I had no issue from 1.3 to 1.5.2, then it came back ok with 1.7.1 and 1.7.2. Haven't updated to 1.8.x yet.
I'm convinced now that I should dedicate the whole drive to ZFS and put my system partitions on another drive... someday. :mrgreen:

Re: Whole disks vs. partitions

PostPosted: Sun Mar 03, 2019 12:38 am
by DanielSmedegaardBuus
He he :D Nice!

I'll go the whole-disk route this time, and see what happens if I add GPT labels to whatever partitions ZFS creates. They're just so darn handy in pinpointing which drive failed :) At least on Linux.


Re: Whole disks vs. partitions

PostPosted: Sun Mar 03, 2019 1:16 am
by DanielSmedegaardBuus
I'll just add findings here as I go along, don't mind me — just thinking I'd improve the thread in case anyone googles for this in the future.

I just created a test pool, two drives in mirror, to test whether I can get Mojave running on ZFS still. (BTW, in case this is new to you too, if you ever need to reboot a headless Mac with FileVault without having to attach a keyboard etc., just ssh in, do `sudo fdesetup authrestart` and put in creds for unlocking FileVault, wait a bit, and then use screen sharing to finish logging in)

Okay, so findings so far:
1) ZFS creates and aligns partitions as expected to 1MB boundaries, and does add a small dummy (I assume) partition at the end of the drive. It's 8MB in size, so it may not suffice for your "buffering needs." Myself, in particular, have wanted previously to use the WD USB enclosures from which I long ago shucked a bunch of 2TB green drives, to mirror over failing internal drives, only to find out that the controllers in those enclosures apparently claimed almost 100 MB of drive space for their own use, I guess for additional space to reallocate bad sectors (shame on you, WD). This meant I could not use them, as I had used the entire disks internally, so the presented drive in those enclosures would be significantly smaller than the internal ones. Currently, I have 4TB drives in my pool, and one has failed, so I wouldn't mind using two of these enclosures to JBOD two old 2TB drives as temporary replacement for the failed drive, but that means I'd need to make sure my pool uses much less (~200 MB) space on each drive when initialised. My plan here is to either set up such a JBOD when creating the pool to ensure all drives are partitioned to fit the lowest common denominator that I currently know of, or to force the pool into the max-per-device-size I seek by creating a sparse file and using it as a member when creating the pool.

2) On Linux, I used to create GPTs and name my partitions, then import the pool with `-d /dev/disk/by-partlabel`. This made it fun (my pool is named "titanic" and each member is named after a crew member who died on the Titanic, just to tempt fate), and very helpful to figure out which drive is dying or dead, since my pool would display, e.g. "hugh-calderwood" as dead, and I could go find the drive with the "Hugh Calderwood" sticker on it and fix the problem. With `zpool create` and entire drives, the created partitions are always labeled "zfs." Renaming them works fine, I can do that and export, import, and reboot without problem. But, on macOS there's nowhere AFAICT that I can actually list my drives by partition labels. At `/private/var/run/disk/by-*` I have something akin to the Linux `/dev/disk/by-*`, but I'm missing the `by-partlabel` links. Quite sad, actually. I can still find whatever `media-HEXA_ID` that is listed in `zpool status` under `/private/var/run/disk/by-id`, and if I put stickers on my drives to match their ids I should be able to almost as easily pinpoint a bad drive, but I'm not so sure if I can mirror a bad drive to a fresh one and have it import as its old id... Regardless, not a biggie, just need to know :)

Okay, findings so far. Gonna try to get Mojave booting from a USB-based zpool now.

Re: Whole disks vs. partitions

PostPosted: Sun Mar 03, 2019 9:56 am
by chrryd
The /var/run/disk/by-* symlinks are created by InvariantDisks - it probably wouldn't be that hard to add an additional driver to it to also generate by-partlabel entries.

Re: Whole disks vs. partitions

PostPosted: Wed Mar 06, 2019 10:07 pm
by DanielSmedegaardBuus
Thanks, chrryd, I had no idea!

I've never looked for that info, TBH, as I haven't before used macOS for anything but laptops and workstations, so `diskXsY` was fine for my needs :) When I found it, I just assumed it was the system itself providing them.

Anyway, I gave up on setting up ZFS boot. Too little time, too much hassle, and I really needed that pool to get up and running again, so I just created a "normal" encrypted pool and went on my merry way with that.