Page 1 of 1

Pair of External HDs for 'cloned' backup

PostPosted: Tue Aug 06, 2019 9:15 am
by jreades
Hi -- I'm trying to get my head around ZFS after getting annoyed at the poor performance (and data loss) of a pair of exFAT drives and wanting something more robust, high-performance and x-platform compatible than HFS+. I think what I'm trying to achieve is quite simple, but most of the tutorials I've found assume that I want to migrate my entire system on to ZFS and I'm just not there yet.

So I have a pair of Seagate 4TB that I want to to be able to use with either of my two Macs (both running Mojave with all updates) in what I'll call an 'incremental clone' configuration (distinguishing this from mirrored for reasons that will become clear). Let's call them Drive 1 and Drive 2.

Drive 1 is the 'master' containing well over 10 years' worth of digital photography (> 1.25TB in a mix of CaptureOne, iPhoto/Photos, and Aperture packages -- those blessed/cursed folders). Reads from/writes to any of the packages are (presumably) made directly from the OS X application since the ZFS drive will now (thanks to OpenZFS) look like any other.

Drive 2 is my off-site backup drive that should faithfully (and, ideally, quickly) clone any changes made to Drive 1 since the last time it was connected and, space permitting, keep a snapshot of what was deleted for some period of time. So obviously Drive 1 will be repeatedly used/updated while Drive 2 is not connected so this doesn't work as a 'normal' mirroring.

There are several aspects of setup and configuration that confuse me:

1. It looks like I need not only to wipe the drives (obvs) but also to do something along the lines of to ensure that the drive doesn't get remounted by OSX as exFAT or something else and destroy my backup drives completely? But is that enough? The solutions that many of you seem to have used on the forums are much more aggressive than this and involve working around SIP (, but that approach seems to be mainly for use with internal and partitioned drives, not a standalone external. So is that all still necessary?

2. It seems to me that using the built-in compression of ZFS (lz4) will benefit me if used with both HDs not only for reads/writes while working with Disk 1, but obviously for backing up from Disk 1 to Disk 2.

3. There's also a bit on the forums about the importance of block size to performance when setting up the drive but I can't see much about how to figure out from command line what my particular situation is. I can look up the model numbers but am not sure what I'm looking for...

4. I don't think that for my purposes I actually need to set up any pools/tanks, so the only other thing is to enable the snapshotting on Drive 2 (or is this a command that I need to manually/cron run periodically?).

I assume that I would then just use ChronoSync (which I usually use to sync files) is the way to actually incrementally update Drive 2 since it won't care what the FS is if it can read the source and destination drives but it should all work faster thanks to ZFS performance and compression; however, I'd be interested to hear if there are other tricks that might speed up the incremental cloning process (rsync an obvious option but not my first choice).

Thank you!

Re: Pair of External HDs for 'cloned' backup

PostPosted: Tue Aug 06, 2019 1:09 pm
by Sharko
I was the user that was having trouble with drives reverting back to HFS+, and I think as soon as I gave the whole disk to ZFS without any partitions present everything went fine. I have found that Disk Utility in recent versions of OS X is not up to the task of dealing with partitions very well, and thus I have had to resort to using GPartEd under Linux to completely wipe partitions off of disks prior to use with ZFS. The bit about the "annoying pop-up' is really outdated information at this point, as I have never encountered it since I began using ZFS 1.5.2.

ZFS compression in my experience usually isn't a huge gain - if you're hosting a lot of JPEG's they don't really compress much further, and the extra metadata saved by ZFS takes up space.

Your 4TB disks are almost certainly going to be 4k native sector, though they may report as 512 byte. It's better to go with ahift=12 because of the asymmetry in performance: ashift=12 on a disk that is truly 512 byte will be degraded a bit, but when ashift=9 on a disk that is 4k sector performance will be abysmal. With 4TB disks I would just go with ashift=12. If you're looking at disk specs I think that the term manufacturers use is Advanced Format to indicate 4k sectors.

You mention using ChronoSync, which is perfectly fine if your source is formatted as HFS+ or APFS. However, if you don't mind a bit of overhead in your boot process (the need to import your source pool if it is ZFS) then you can use the native ZFS tools of snapshot and send/receive; these ZFS tools will be orders of magnitude quicker at accomplishing the job of keeping your backup in sync with your source. For example, I have a 400 GB home directory, and the snapshot/send/receive process to a USB3 external disk takes only a five minutes tops even with me typing the commands (not scripted). This is possible because both the source and destination have a common snapshot to start with, and the send/receive process only sends the actual blocks that have changed - it doesn't have to trawl through the whole file directory to find out what changed, it already has that list ready to go.

Re: Pair of External HDs for 'cloned' backup

PostPosted: Tue Aug 06, 2019 3:51 pm
by lundman
Just to agree; wholedisk is what ZFS prefers. lz4 compression is a good idea, not because you will save space (you won't with jpeg) but because it will also lz4 the metadata, making IO transfers smaller and fewer. Just go with ashift=12 - there is no real reason to use 9 these days, even if your disk was 512. It will help with future device changes.

Re: Pair of External HDs for 'cloned' backup

PostPosted: Tue Aug 06, 2019 11:23 pm
by jreades
Thank you both! So if I've followed this correctly:

1. Whole disk is definitely the way to go, but might need look at blowing away the old partition table using dd or something similar to ensure that Disk Utility doesn't try to get its mitts on my data.

2. My photographs are mostly RAW+metadata for mods so there may still be some (small) gains from compression, but mainly this just seems like a GOOD IDEA.

3. I definitely like the idea of using ZFS' built-in tools for managing synchronisation of the two external HDDs as that's obviously going to be more clever/performant than ChronoSync; however, I was a little unclear as to why there would be overhead in the boot process? I don't even routinely attach Drive 1 (the 'master') except when settling in for a 'session' so I don't need it to always be available at boot... though I'm guessing perhaps the problem here is that it might be available at boot unless I deliberately delay mounting. If I've understood the terminology correctly, Drive 1 and Drive 2 would also be in separate pools with send/receive to sync them?

I think that I can most of the way on my own, but is there a good tutorial for #3? I'm comfortable with the command line and bash scripting but definitely don't want to accidentally blow away my work while learning the ropes! :-)

Thanks again!

Re: Pair of External HDs for 'cloned' backup

PostPosted: Wed Aug 07, 2019 2:56 am
by nodarkthings
My two cents as a basic long term user: I confirm the whole disk option — while I still continue with my HFS/ZFS sandwich (it's not that 'dangerous' but it took me some time to figure out the best recipe... ;) )

About the possible gain, there is a little gain anyway, depending on contents:
• disk with nearly all audio (.wav): 33.6 Gb ZFS vs 35.31 Gb HFS
• mixed contents (but mostly .dmg, .zip) : 320 Gb vs 326 Gb
• Time Machine : 16 Gb vs nearly 32 Gb!
(N.B.: this is data I wrote down when I switched to ZFS, years ago, FWIW, considering the original size of the HFS partition then the transfer on ZFS)

Re: Pair of External HDs for 'cloned' backup

PostPosted: Wed Aug 07, 2019 11:15 am
by Sharko
Re: the overhead...

I find that the script which is supposed to auto-import my zpools doesn't work for me under Mojave. It's a timing thing; it used to work when my El Capitan boot disk was a FileVault 2 encrypted disk, because that somehow got the disks enumerated or whatever before the import script ran. Mojave doesn't support FileVault 2 on my Mac Pro, so no joy. Because the pools don't auto-import it means I can't just jump into logging into my main working account that has its home directory on a zpool. So for me this means I have a bit of a chore on boot up:

1. Start the machine
2. Log into an admin account on the APFS boot disk, since that is guaranteed to be valid (being as it is stored on the Apple-native APFS disk).
3. Import my zpool(s) using the Terminal and sudo. One of the zpools has the home directory for my working account, so this pool has to be imported and functional before I log into that account.
4. Log out of the admin account.
5. Log into the working account, which now works because the zpool is up and running.

If your master Drive 1 is accessed intermittently then you can just import as needed, since it sounds like you're going to retain your home directory on an Apple-native (HFS+ or APFS) volume. Your user account will need to have admin privileges to be able to run 'sudo zpool import Drive1', however. Since you're going to connect Drive2 at even less frequent intervals, yes it will be it's own pool as well.

Re: Pair of External HDs for 'cloned' backup

PostPosted: Wed Aug 07, 2019 5:04 pm
by lundman
If it is a timing thing with the auto^import script, you could try just throwing a "sleep 10" or something into the start of the script? Not as a solution, but to confirm that is what is happening.

Re: Pair of External HDs for 'cloned' backup

PostPosted: Fri Aug 09, 2019 6:17 am
by jreades
Really useful information, thank you!

For the sake of others following in this path.. I'm up and running with a test disk (didn't want to experiment on the actual backups) with this:
Code: Select all
sudo zpool create -f -o ashift=12 -O compression=lz4 -O casesensitivity=insensitive -O normalization=formD {Disk Name} /dev/{disk id}

I accidentally unmounted this through the Finder and then unplugged (as I would for a 'normal' disk) and managed to cause myself some serious grief (all operations would hang and clear/destroy didn't do anything). Rebooting with the disk connected cleared this up with (apparently) zero data loss. But it was a useful reminder to:
Code: Select all
zpool export {Disk Name}

to unmount, and:
Code: Select all
zpool import {Disk Name}

to remount. Hopefully I won't forget to do this too many times... I did find a useful looking tutorial here that would seem to cover some ways to make mounting/ejecting pools a little simpler and also promises to help me manage synching snapshots between pools... that's my next target.

Re: Pair of External HDs for 'cloned' backup

PostPosted: Sat Aug 10, 2019 11:49 am
by nodarkthings
@jreades: Thanks for the link, it gave me the idea to set mountpoints anywhere in MacOS — I never thought you could do that so widely...
For a first trial, I've created a dataset containing ~/Documents/Microsoft User Data folder with a mountpoint set at its former place (I'm still using Office 2011).
Actually many people were upset because you can't move any more Office 2016 Outlook data using a symlink, but that might be a great way for doing it! ;)