Stability...

All your general support questions for OpenZFS on OS X.

Stability...

Postby moriz » Thu Apr 14, 2016 1:09 am

Hi,

First can I just say, OpenZFS on OSX is absolutely awesome and speaking as an admin for a small department, being able to know what condition my backups are in, and being able to pool storage incrementally, as budgets allow, and send and delete snapshots so efficiently, has been wonderful. On top of it all, it is so easy to setup, on some consumer level hardware. Thank you, and I'll look for where I can donate some money.

Now just the question about stability, and what might I be doing wrong.

One setup I have is a Mac mini 2012 with 16GB RAM and two WD Thunderbolt Duo attached, containing 4TB WD Red drives. The 8TB (usable) pool is 2 VDEVS, each a pair of 4TB drives mirrored. There's also the Mac's internal SSD being used as a cache disk for the pool. Each day, a Python script runs rsync between about 80 Macs and the pool, to back them up, including xattrs. Each Mac is backed up to a filesystem in the pool, and then snapshotted after a successful rsync (ignoring minor errors such as, "access denied" errors on files blocked by anti-virus on the Macs).

So this has been running for a couple of years, and the pool has been able to do some awesome things, like trim 24000 snapshots down to 4000 in the space of 10 hours, as well as the usual stuff of sending snapshots to other locations. It also coped with one of the Thunderbolt Duo enclosures developing some sort of IO error ("buffer underrun"), for which I ended up swapping drives and finally swapping the enclosure, and the pool survived fine.

I started the pool on whatever version was current 2 or 2.5 years ago, and it is now on 1.4.5 and Yosemite. Apart from my clunky Python scripts, which use multiprocess, so they can run a couple of rsyncs whilst also watching the network for laptops appearing, etc., the only issue I've had, which is minor given all the benefits, is stability, with the Mini freezing up from time to time.

I can't find any pattern to this, but then over a couple of years, versions have changed, etc. The last crash reports mention, sometimes, ZFS, but other times, other processes, so I don't know what to say. I just tried Micromat's Atomic memory tester, FWIW, but it didn't notice anything. Likewise using Disk Utility to verify the Mac’s own boot disk. It can freeze or crash within a couple of days, or stay up for two weeks. The pool scrubs ok, and reports no errors. Next I’ll try starting with a clean install of OS X on a new drive.

I have messed around with some tunables, but the crashes were before I did this. Originally I wondered if the freezes were to do with ZFS's RAM usage growing past the ARC limit. I’ve since doubled the RAM to 16GB and assigned the SSD as a cache. My feeling is things are more stable with the SSD cache, and also with disabling prefetch, but that's my vague impression. It may be that now on 1.4.5, the tunables I’ve chosen, are contributing to problems, but I don't know.

From /etc/zsysctl.conf

# Try to stop all RAM being consumed...
# even try setting it quite low to see if it has any effect.
kstat.zfs.darwin.tunable.zfs_arc_max=4294967296
kstat.zfs.darwin.tunable.zfs_arc_min=134217728

# Maybe prefetch is not great with lots of rsyncs, try disabling it.
kstat.zfs.darwin.tunable.zfs_prefetch_disable=1
kstat.zfs.darwin.tunable.arc_lotsfree_percent=30

# We processed a large drain list once, when the feature was added to
# OpenZFS, and which took a few days and several restarts,
# so turn this off for now, as when the mini crashes, and we restart,
# we don't want much of a delay importing the pool.
# But be sure to turn it back on sometime.
kstat.zfs.darwin.tunable.skip_unlinked_drain=1


At the moment, it has been up 3 days, has run about 300 rsyncs, is running a scrub since last night, and is showing 13.7GB kernel wired.

So my question is just, is there anything I can be doing to improve stability? Is it likely to be the consumer level stuff I'm using that's just flaky?

I also have a small pool of 3 x 2TB drives internal in a Mac Pro 2009 and it has never crashed on account of well, anything. But it is only light usage, compared to the 6 million files that rsync is having to compare each day in total on the Mac mini.

Thanks
moriz
 
Posts: 3
Joined: Mon Aug 17, 2015 12:35 am

Re: Stability...

Postby Brendon » Thu Apr 14, 2016 2:34 am

I'd suggest coming to the IRC channel, crash logs in hand. We'd be happy to take a look. The software has become increasingly stable in recent times, and I suspect we have fixed a lot of the low hanging issues - I never see the test suite panic any more, it used to be rather easy to break. So, my point would be that we might need to see your data in order to understand what is happening.

Cheers
Brendon
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm


Return to General Help

Who is online

Users browsing this forum: No registered users and 30 guests