OpenZFS on OS X

by **Haravikk** » Tue Mar 17, 2015 5:22 am

Hi there,

Although I'm familiar with the principles behind ZFS and RAID systems in general (having used various hardware and software RAID systems in the past), I've never actually used ZFS for anything until now, so I'd appreciate some advice on my intended setup.

Basically what I have are two SATA to eSATA driverless JBOD adaptors*, each with a set of four disks (two 3tb, two 1.5tb). My plan is to combine these using ZFS such that I will have two RAID-Zs, which I would then mirror, hopefully giving me a single ~6tb volume which can cope with at least one failed drive or controller, and potentially more than one disk (depending upon where the failures actually occur).

Anyway, I'm wondering what the exact steps would be involved in this? Will I need to combine each pair of smaller disks first, then create the two RAID-Z's then create a mirror and finally the volume itself? Could someone give me some example commands for this? I'm also interested in any caveats; how difficult will it be to add capacity, bearing in mind that each controller can handle five disks and I already have four on each.

To try and clarify, my proposed setup would look something like this:

Code: Select all: ZFS Mirror (~6.0tb) |_ RAID-Z (Controller 1) | |_ 3.0tb Disk | |_ 3.0tb Disk | |_ ZFS Stripe or Concatenated Set | |_ 1.5tb Disk | |_ 1.5tb Disk |_ RAID-Z (Controller 2) |_ 3.0tb Disk |_ 3.0tb Disk |_ ZFS Stripe or Concatenated Set |_ 1.5tb Disk |_ 1.5tb Disk

Or put another way; each controller will have two 1.5tb disks combined into a single 3.0tb disk, added to the two other 3.0tb disks to give a total of three, which will be combined as a RAID-Z of ~6.0tb. The two resulting RAID-Z's will then be mirrored for a single ~6.0tb ZFS volume. If there are some other possible setups I might consider please let me know, though this is all I can think of, except for a RAID-10 style setup (3x striped arrays of 6.0tb each) but I don't think it offers any real advantage, while two RAID-Zs might potentially have more redundancy depending upon what fails.

I realise that such a mix of capabilities isn't ideal, though the smaller disks are actually much faster than the larger ones, so I'm hoping that when combined they should be a good match. Of course running the whole setup through only two connections isn't ideal either, but as I say, I'm not too bothered about performance, especially since it will basically be functioning as a NAS for periodic backup (probably nightly or weekly) anyway, so speeds in excess of 20mb/sec should be more than enough.

*For those that are interested, the controllers are Lycom UB-208RMs, and actually they do do hardware RAID but it leaves a lot to be desired including (but not limited to) actually rebuilding an array with a failed disk more than 10% of the time (if I didn't have other backups I'd have lost all of my data several times over by now). They are however perfectly capable JBOD adaptors, I just wouldn't recommend trusting their RAID capabilities.

by **realfolkblues12** » Fri Mar 20, 2015 12:32 am

I did some research as this was a interesting configuration idea. I did not find any ways to create sub vdevs like you described. From what I see ZFS will automatically use dynamic striping across vdevs.

if you want to protect agents drive failure and enclosure failure you could just make 4 mirrors(across the two drive enclosures) you would get more space and faster performance with better redundancy.

Code: Select all: ZPool | -------------------------------------------------------------- vdev vdev vdev vdev / / \ \ 1.5tb(mirror) 1.5tb(mirror) 3TB(mirrior) 3TB(mirrior)

You would get 9TB total usable space. This config would give you up to 4 dead drives at once without issue as long as none of the dead drives are from the same mirror.

Now if you don't care about the drive enclosure protection you could get even more space.

by **Haravikk** » Fri Mar 20, 2015 1:02 am

You're right, that does look to be a neater solution!

Just to clarify, the command would look something like:

Code: Select all: zpool create mypool mirror disk1 disk3 mirror disk2 disk4 mirror…

Is that right? Of course ignoring the fact that the disk numbers may be all over the place

I do have one other question actually, but I've noticed in some articles that their device names are something like "c0d0", clarifying which controller they belong to. Is this something I can do on OS X? I might just go the old-school route of slapping physical labels onto the covers for each drive bay, but that kind of device naming seemed handy.

by **realfolkblues12** » Fri Mar 20, 2015 1:17 am

You could physically remove all but one drive in each enclosure. create the pool with 1 vdev mirror. Then add the next set of drives using the zpool add command see http://docs.oracle.com/cd/E19253-01/819 ... index.html

Osx uses disk0 disk1 ect. Not c0d0.

There might be other ways to do it but I can't think of it off the top of my head.

by **realfolkblues12** » Fri Mar 20, 2015 1:19 am

Oh and u can use the diskutil command in terminal to get disk numbers.

by **realfolkblues12** » Fri Mar 20, 2015 1:26 am

And one last thing make sure in include sudo zpool create -f -o ashift=12 -O casesensitivity=insensitive -O normalization=formD

OS X gets finiky without the case option in my experence and ashift is used for 4k sectors

by **realfolkblues12** » Fri Mar 20, 2015 1:50 am

Sorry for the multiple posts i'm tired. You could use /var/run/disk/by-id to add each drive instead of diskN. But you would need to write down all the serial numbers of each drive to know you were adding them in the correct order. Also to answer you question yes you have to command right just add the flags i posted above.

by **rattlehead** » Wed Apr 15, 2015 8:15 am

One additional thought or two:
1) the flags I use are:
- atime=off (do not write last access times, I always do this, not only for ZFS, reduces write accesses, but may be not according to your taste)
- casesensitivity=caseinsensitive (the default OS X filesystem behavior, I once changed that which caused a lot of trouble (long before ZFS), so I instead tell the shell's autocompletion to ignore case, that works much better)
- normalization=formD (UTF-8 encoding of filenames, UTF-8 allowed different encodings for the same characters/symbols, hence normalization must be applied before comparison. Did some research a while ago, and formD seemed to fit best my needs)
- dedup=off (this is default, but I experimented a little with deduplication and set it on for that, performance REALLY goes down the drain, you should NOT use this feature unless you are sure, it pays off, like in the examples given in the documentation, so I explicitly set it off, so it does not matter if the defaults change at some time)
- com.apple.ignoreowner=on (for external drives, ownerships annoy me, but again your taste might be different here)
- ashift=12 (meaning a physical block size of 2 to the power of 12 = 4096 = 4k, for newer disks, or nowadays: for disks, that are not from the ancient greek, even if they report 512B blocks for Windows XP compatibility)
So the command would be:

Code: Select all: zpool create -O atime=off -O casesensitivity=insensitive -O normalization=formD -O dedup=off -O com.apple.ignoreowner=on -o ashift=12 mypool mirror disk1 disk2 mirror disk3 disk4 mirror disk5 disk6 mirror disk7 disk8

2) The first thing I do after creating a new pool is export and re-import it. During the export, all the metadata are stored to the disks and by re-importing, I convince myself that I actually can import the pool before putting any data in there. If you just unmount and disconnect the drives, you cannot import it. Although I remember a fancy way to specify the configuration on command line and mount the filesystem like that so you could export then. Anyway, you really should do the export first. I can't even think of a reason why the metadata is not stored automatically during creation.

Code: Select all: zpool export mypool zpool import #This should list your pool configuration zpool import mypool zpool status mypool #This should show everything as ONLINE and "No known errors"

3) You can then import the pool without specifying all the disks, those are part of the metadata that is stored to every disk. You can even put the disks to another location, e.g. another slot in your JBOD case. It does not matter, because the metadata tells the zpool the configuration and the role of that particular disk in that configuration. You pretty much want the mirrors in different JBOD cases, though, so you can deal with the failure of one whole box without interruption.

4) So what I do to find the names of the disks to put in the create command is diskutil list, usually with detached drives and then again with attached drives. The difference tells me the identifiers of the disks. After that I don't have to bother with those ids anymore...

Hope, this helps

by **Haravikk** » Wed Apr 15, 2015 9:28 am

Okay, so I haven't gotten around to setting this up quite yet, but I wanted to thank everyone for their feedback and help so far!

rattlehead wrote:dedup=off (this is default, but I experimented a little with deduplication and set it on for that, performance REALLY goes down the drain, you should NOT use this feature unless you are sure, it pays off, like in the examples given in the documentation, so I explicitly set it off, so it does not matter if the defaults change at some time)

Thanks for the detailed breakdown! Regarding de-duplication; I'm setting up this array as a kind of "archival" backup which I'll only backup to maybe once a week, in addition a more frequent backup on another drive, so performance isn't really a big concern as long as it's not so slow that these backups take forever. However, if de-duplication could save a good amount of space, then it will help maximise the capacity so may be worth it? Is this a feature that needs to enabled at creation, or can I enable it later? Even better, if it can be enabled later, is there any way to determine how much space it would save (e.g- by generating a de-duplication table to discover how many duplicate blocks currently exist?)

I'll be enabling compression to make the most of the space anyway, I know that LZ4 is currently recommended, but do you think I'd be better with gzip for greater compression vs speed?

rattlehead wrote:You can then import the pool without specifying all the disks, those are part of the metadata that is stored to every disk. You can even put the disks to another location, e.g. another slot in your JBOD case. It does not matter, because the metadata tells the zpool the configuration and the role of that particular disk in that configuration. You pretty much want the mirrors in different JBOD cases, though, so you can deal with the failure of one whole box without interruption.

That's great to hear, this was actually why I was asking about device identifiers (I wasn't very clear about it); I'm fine with looking up the device names so long as it's only when I really need them (i.e- creating the array, or replacing a disk etc.), but as long as import is mostly automatic that should be ideal!

rattlehead wrote:Hope, this helps

It helps a great deal, thanks!

Actually, I have come up with one other question though. Since I'm only going to be backing up periodically to this array, I'm currently trying to decide whether to leave it running 24/7 or to instead only switch it on when a new weekly backup is needed, and was thinking it would be handy if I could automate the importing of the array via script. Basically I was thinking of a script that would trigger when the physical devices are detected, and would run some zfs command to look for any newly available pools, check whether they are ready to be imported (i.e- all drives are present) and then do it only if the pool isn't degraded (otherwise I'd have it trigger an error somewhere). It wouldn't be hard to then extend it to automatically run the backup and then export the pool once that's finished, at which point it would be safe to physically turn off.

Anyway, since I don't currently have any pools setup, I don't really know what kind of output I can get that would be useful. For example, you mention that zpool import should list my pool configuration, but would this be easy to parse via script in a useful way to detect a pool that's ready to import?

Oh, one other question actually; is it worth setting up any kind of caching? I have a fast main system disk (a 1tb SSD), but if ZFS' RAM caching takes advantage of virtual memory then I don't suppose there'd be any advantage to specifying the same disk as an explicit cache (since it'd be the same speed), but if ZFS doesn't take much advantage of virtual memory then it may be worth it? If it would be of benefit, does anyone have recommendations on what kind of setup? As I say, it's really just for backup, so I don't expect I'll need a read cache, but if caching would benefit de-duplication or writing then it may be worth thinking about?

by **rattlehead** » Wed Apr 15, 2015 10:59 am

1) de-duplication
- I should have been much clearer about the performance: I meant the performance of the computer that runs the ZFS. In my case it's an MBPr 2012 with 16GB RAM. For the 6TB of data, the recommended minimum RAM was like 24GB or so, counting only the memory dedicated to ZFS, I do not have that much, so performance would be worse, I knew that. But the host system became practically useless, GUI feedback for moving the mouse took seconds, etc. I would nominate it as the most useless feature ever, because you practically need HUGE amounts of RAM (it was like xGB + 4GB per 1TB in the pool or so).
- you can enable / disable anytime you like, it only affects future write processes, though, so you would have to internally copy your data (moving within the same logical drive does not suffice, because it does not actually write the data, only updates the location information)
- yes, there is a command that shows you how much is saved at the moment. I'd have to google it myself, though...
- no, I don't know of any possibility to predict that value.
- compression for me also made things worse. I read somewhere something like "it's computationally cheap, works on-the-fly w/o noticable delay, and can save some space most of the time, even if it enlarges a file from time to time, the compression of others more than compensate this. There is no reason to not activate this feature". So I did. And copied all the data. And it needed more space than in the uncompressed volume, by margins. I think, I needed 1TB more or so. So the guy was wrong, this also depends on the type of data you have and not always there is data that is compressed so good, that it compensates.
- I would probably create an additional volume like so:

Code: Select all: zfs create -o compression=on mypool/compressed

and then put the data that should be compressed there. Or the other way around, create a mypool/uncompressed volume and put MPEG, DIVX, MP3, and other compressed video/audio files there.

2) File identifiers
- as soon as you have created a pool the zpool status (for imported pools) and zpool import (for unimported pools) commands show you the detailed configuration including the current disk identifiers

3) You're welcome

4) Process/workflow
- I would not let it run 24/7, because the energy consumption is significant (for private households).
- For me, the zpool import command either states no pools available or prints the configuration. This output contains a line " state: ONLINE" if and only if all the devices are present and no defect is known.

5) Caching
- I haven't set up caching at all. I mean, my RAID 6x2TB is connected via Thunderbolt and the proprietary RAID config was twice as fast as my internal SSD, so setting up an SSD as cache would actually slow down things... My impression is that IL and Caches are for big enterprise-like environments only. Maybe someone more experienced should tell us about this.

6) sample output
You see, I have two pools, one with two WD external drives connected via USB2 (only temporary solution), and the Pegasus RAID with 6 drives connected via Thunderbolt, one of which failed and is currently being replaced.

Code: Select all: rattlehead@/~$ zpool status pool: WD_ZFS state: ONLINE scan: scrub repaired 0 in 5h28m with 0 errors on Sat Apr 11 21:42:40 2015 config: NAME STATE READ WRITE CKSUM WD_ZFS ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 disk10 ONLINE 0 0 0 disk11 ONLINE 0 0 0 errors: No known data errors pool: zfsraid state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Apr 14 17:04:57 2015 3.05T scanned out of 9.61T at 41.1M/s, 46h28m to go 520G resilvered, 31.73% done config: NAME STATE READ WRITE CKSUM zfsraid DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 disk7 ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 12442721858756403365 UNAVAIL 0 0 0 was /dev/disk4s1/old disk4 ONLINE 0 0 0 (resilvering) disk5 ONLINE 0 0 0 disk6 ONLINE 0 0 0 disk3 ONLINE 0 0 0 disk8 ONLINE 0 0 0 errors: No known data errors

OpenZFS on OS X

A Mirror of RAID-Z's?

A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Re: A Mirror of RAID-Z's?

Who is online