ZFS worth using with non-ECC RAM? Can ARC be disabled?

Moderators: jhartley, MSR734, nola

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by raattgift » Thu Mar 14, 2013 6:39 am

UBC caches everything it can, and will almost certainly cache more in the absence of a zfs subsystem, since the latter robs memory from it.

Because the UBC is *unified*, any clean pages not locked into physical memory will be cached. That includes things demand-paged (text pages and the like), mmap(2)ed, read(2), and dirty pages which have been committed to secondary storage (i.e., old write(2), metada updates (atime updates notably) and so forth). Active pages are those that have been recently accessed as well as those brought in by warmd, kern.preheat, and so forth. ARC memory is most likely to be accounted for as active, although some may be treated as inactive if all your pools go idle for a substantial period of time.

The count of inactive pages in top(1) and in the Activity Monitory application is *roughly* the count of UBC pages that have not been recently accessed. "Free" will vanish to kern.vm_page_free_min over time (i.e., to nearly zero), given sufficient I/O creating new dirty pages or accessing previously uncached pages.

zstat gives more precision about the ARC than comparable tools do about the UBC. Usually the most dynamic and interesting line items in it will be :

WIRED 88 MiB 1848 MiB/1901 1937 MiB 11.82%

and

200 4096 92061 110519 202580 786920 arc_buf_hdr_t
104 4096 57569 8779 66348 315096 arc_buf_t

The former lists how much of physical memory is being used by the zfs subsystem; arc_buf_t is references to ARC records (which typically will be about 128kibytes), and arc_buf_hdr_t is references to L2ARC records; each of these eats about 256 bytes of physical memory.

You will note that zfs memory will typically be small compared to the total Active+Inactive+Wired memory, especially on a large-physical-memory system.

Bit errors in *any* of that memory may result in reading bad information from the cache, which includes anonymous pages (used for app data structures, mainly), pages of machine code, backing store for the display system, kernel data structures and code, and so forth.

If RAM errors are random, on a system with lots of memory and doing lots of I/O, they will likely hit clean UBC pages, since those typically occupy the largest fraction of physical memory. Those pages may be read again at some point in the future, or may be discarded, depending on system activity.

On a system with so little memory that caches are squeezed, bit errors are more likely to hit in-use pages that likely contain application code (including library code shared among many applications), or data structures, or (if there is lots of writing activity, including atime updates), dirty pages waiting to be committed to storage.

ARC/UBC interaction is tricky to gauge with available instrumentation (although it's amenable to dtrace-ing). Applications will generally use whatever's in the UBC first and will do so without notifying ARC, so the very hottest read-only pages will sit in UBC and may well be evicted from ARC. On the other hand, pages that are frequently dirtied will be kept alive in ARC, since writes are pushed from UBC into the ARC. The busiest blocks in ARC on the zevo port are likely to be metadata from things like atime updates or directory activity, rather than truly hot file-backed mmap pages. That's OK, analogous structures are hot in kernel caches for HFS+ and other virtual filesystems; ZFS is not more exposed to RAM errors than them.

The worst thing one could say about ZFS on a non-ECC-RAM system is that there is an expectation that data on ZFS is "safe" from most errors and that uncaught data integrity problems are therefore more surprising. Unfortunately nothing is "safe" in the presence of actual RAM errors. There is pretty much nothing one can do to reliably avoid being the victim of a random RAM error. Mitigations exist but a general policy of minimizing RAM occupancy and avoiding caching or alternatively doing computationally and/or memory-access-intensive on-line integrity checking are, imho, poor uses of system resources (and on a system with actual RAM errors, each of these tactics may *worsen* data corruption). The only real solution is strongly error checked and corrected RAM and an operating system that is built to report and recover from detected RAM errors.

(Of course even then you are subject to errors in the processors and their caches, software and firmware bugs, ...)
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by raattgift » Thu Mar 14, 2013 6:47 am

A much shorter version: zfs really helps with data availability in the face of fairly common problems like marginal USB3 or firewire cables, external disk power supplies, hubs/port multipliers, and so forth. *Availability* is pretty narrow though; you still need backups, and in the face of sufficiently flaky hardware you will need to restore from those. ZFS (when used correctly) mainly lets you know you should throw a drive or cable or power supply away and replace it with a new or known-working one, and buys you time to do that without having to take your system or your data offline. I.e., it keeps your data available in the face of common problems.

(It also comes with handy things like snapshots, clones, send|receive, dataset attributes, and so forth, which can be very useful for managing that data in ways that are difficult to impossible with the tools for JHFS+, GPT partitioning, CoreStorage, and AppleRaid.)
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by royfactorial » Thu Mar 14, 2013 7:53 pm

Thanks graham for folding my question into a larger document, and for all your efforts here on the zevo forum. And once again thank you raattgift for providing more in-depth explanations of what's going on under the hood.

I agree that there's only so much one can do to mitigate corruption, especially on a non-ECC system. For my own system, I'm trying to balance ease-of-use/compatibility (which HFS+ offers) with data integrity. Initially I was leaning toward ZFS because I value integrity more, but this ARC/RAM matter makes me wonder if ZFS really provides a significant integrity advantage vs. HFS+ with regular checksum scrubs (using something like Carbon Copy Cloner or IntegrityChecker). My wired+inactive memory after 22 days of uptime is only about 3GB, with 19GB free. This would lead me to believe that ZFS would indeed be caching more data in-memory than HFS+ (I should set up a test disk though to confirm this first-hand). For the time being, I may just stick with HFS+ until I can migrate to an ECC system down the line.
royfactorial Offline


 
Posts: 11
Joined: Sun Sep 16, 2012 1:10 am

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by raattgift » Fri Mar 15, 2013 6:36 am

tl;dr : it is simply not correct that JHFS+ storage is safer in any practical sense on a non-ECC Mac system than ZFS storage on the same system.

"makes me wonder if ZFS really provides a significant integrity advantage vs. HFS+ with regular checksum scrubs (using something like Carbon Copy Cloner or IntegrityChecker)"

ZFS's checksumming is live and is also on metadata, which you cannot do with JHFS+. With *replication*, ZFS checksumming also allows for automatic live repair and replacement of failed-checksum/IO-errored data such that no client application (including the kernel) will get corrupted data; this cannot be done with JHFS+. Even

ZFS is simply vastly superior at dealing with corruption between the system and its secondary storage. Such corruption is fairly commonplace and often goes unnoticed, leading to propagating errors (e.g. errors that find their way into your backups).

You are clearly letting perfection be the enemy of the merely vastly superior on the question of data availability and integrity.

On the matter of ease of administration, that's a personal matter. There is a learning curve. Nobody can tell you how fast or far you should climb it.

There is a thread here about compatibility problems. There aren't many. FWIW, I have two systems on which my $HOME (including ~/Library) is in one or more ZFS datasets, as is pretty much all non-system data that would ordinarily get moved off to secondary storage anyway (media files, archived stuff, irregularly-run games and their data). This was prompted by fsck_hfs producing a consistent filesystem in which the metadata looked accurate but in which several files had *silently* been filled with NULLs (it wasn't even clear from /var/log/fsck_hfs.log whether fsck_hfs or some unclean shutdown at some point caused the corruption). Since moving the data over, errors caught by ZFS have revealed a marginal firewire 800 cable (fresh out of the box), a marginal external drive power supply (ditto), and an internal buffering problem in an 8-port USB3 hub, all without data loss and with minor unavailability (the data in question stayed mounted and usable or even in use).

Apart from having to arrange a different backup system (zfs snapshot then zfs send | [ssh |] zfs receive) it just dropped right into place. Changing quota sizes and otherwise "resizing" storage space, making clones for testing or building on data sets I want to keep as pure as the archived-elswhere copy, altering mountpoints, not mounting volumes when devices are attached, and so forth, are all much much easier and quicker to do with the zevo port of ZFS than with native Mac OS X tools, and with much less data unavailability and risk.

"For the time being, I may just stick with HFS+ until I can migrate to an ECC system down the line"

You could alternatively get an ECC system and put one of the various open source branches of Solaris or FreeBSD on it and have your Mac clients use CIFS, NFS, or even AFP to mount datasets from it; however none of that would prevent your Mac from corrupting data in its own memory. You could even use iSCSI (or FC or the like), but with the same exposure due to the Mac. However, any processing of the data you do on the ECC post-Solaris/FreeBSD system will be protected by ECC. And of course, the ARC will be on those systems, in ECC-protected RAM. (However you will still have the UBC on your Mac caching away).

The normal administration of Zevo's port of ZFS and the FreeBSD port and the post-Solaris variants are practically identical. zfs send/receive is compatible (watch out for filename characterset issues, though, but a similar warning is needed for CIFS and NFS and AFP volumes hosted on a non-Mac). You can even zpool export / zpool import between systems with some care, if you are inclined to migrate physical disks (or the result of zpool split) from one system to another.

Zevo's port is a nearly ideal way for someone familiar with Mac OS X administration (to the extent of being comfortable with diskutil(1), at least) to get used to ZFS administration. It's also a highly stable and reliable subsystem (much more so than most FUSE-based filesystems; there's some atrocious systems for using NTFS for example).

However, I think you are frightening yourself with "what ifs", perhaps because you are thinking it's an all-or-nothing thing.
It's not. You can even start with small file vdevs and trivial (and disposable) datasets, or a couple of small FW drives (make sure you back up what you put in them; new ZFS admins often find they have to destroy and rebuild pools, especially home users who get comfortable enough to want to expand their initial storage). That's roughly the equivalent of "just stick with HFS+" for most of your data.

However, if you've convinced yourself it's just too scary compared to JHFS+, well, I think you've said what you need to on the topic.
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by royfactorial » Fri Mar 15, 2013 10:29 am

Again, I appreciate the time and effort you've put in to address these questions and put them in the proper context.

I'm not averse to using the command line and learning how to get this set up... I just want to make sure that putting in the effort will yield tangible benefits, and I think you've made the case for it. The ability of ZFS to verify metadata is important and something I had not considered. From what I've been reading, the probability of on-disk errors is much greater than unintended bit-flips in-memory, so in an imperfect world, it only makes sense to target the source of corruption that's more likely to occur.

Thanks again for your help: As you suggest, I will start out with some test drives to get familiar with the admin tools and see what happens.
royfactorial Offline


 
Posts: 11
Joined: Sun Sep 16, 2012 1:10 am

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by grahamperrin » Sat Mar 16, 2013 10:27 am

royfactorial wrote:… From what I've been reading, the probability of on-disk errors is much greater than unintended bit-flips in-memory …


That's my sense of things, although to be honest, I have rarely thought about problems with memory hardware.

Not specific to types of memory, part of this deserves a separate topic:

grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by shuman » Mon Mar 18, 2013 8:23 am

Would it be possible to somehow use a ramdisk strategy and have arc "tunneled" through a zfs ramdisk for to simulate ECC? Did that make any sense? ;)
- Mac Mini (Late 2012), 10.8.5, 16GB memory, pool - 2 Mirrored 3TB USB 3.0 External Drives
shuman Offline

User avatar
 
Posts: 96
Joined: Mon Sep 17, 2012 8:15 am

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by raattgift » Mon Mar 18, 2013 11:07 am

Also, if you really want to kill performance by minimizing caching done by the ZFS subsystem, then for every dataset

zfs set secondarycache=none pool/dataset
zfs set primarycache=none pool/dataset
zfs set sync=always pool/dataset

This is a realllllllllllllllllllly bad idea but won't break anything other than performance.

However, you will still benefit from UBC's caching of previously read blocks.
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

Re: ZFS worth using with non-ECC RAM? Can ARC be disabled?

Post by raattgift » Mon Mar 18, 2013 11:42 am

"Did that make any sense"

No. Sorry.

In general it is not possible to gain much more than a fraction of the benefit of hardware ECC implemented in the RAM modules, memory controller, and operating system, and for that fraction you wind up killing performance by doing a large number of memory reads (being careful about cpu cache pollution), as well as keeping significant state about physmem <> vmem mapping vs checksums.

If you want Mac OS X with ECC RAM, your only present non-hackintosh options are Mac Pros and XServes, both of which are getting fairly old and are a bit bulky and heavy for a laptop bag. As fileservers, they would detect and possibly correct memory corruption on themselves, however if your fileserved-clients aren't using ECC RAM, they may give you back bad requests, which might still lead to committing bad data to a pool as if it were good data.

While occasional RAM errors do in fact happen, they are pretty rare. In normal environments, it's considerably more likely that you will end up with a wholly-bad DIMM module, which will rapidly crash the system and cause it to fail early in the subsequent boot process (if not at POST). An important difference between disks and other secondary storage is that RAM is meant to be ephemeral and cleared out at system shutdown and startup, which is why you get an empty UBC and ARC (and why the L2ARC data is disregarded in most ZFS ports - illumos is working on persistant L2ARC however, and that *may* be useful) after a reboot. A corruption which takes down the system before the corruption is propagated to stable storage is a corruption that "vanishes" at reboot, almost always. By contrast, especially external disks are subject to accidental disconnection fairly often (you pull the wrong plug, or a cat or kid chews on a cable causing intermittent shorting, an ac->dc power supply starts becoming unreliable). With vdev replication and ZFS, this might not even lead to data becoming unavailable or unreliable. This is very different from a system crash or applications stalling or otherwise failing. Additionally, disks are more likely to return bad data from (or fail to write good data intermittently or from particular sets of physical blocks; decent non-ECC RAM very rarely does this, and practically all ECC RAM is likely only to report such errors loudly rather than enable you to carry along on a system that experiences them. Likewise, when ZFS complains about disk errors, you want to replace the failing disk asap, rather than limping along with a degraded pool. Sure, you might want to use flaky hardware for longer, but the "report and threaten to crash real soon now" result is better than simply crashing (or unmounting) without explanation.

Finally, not all corruptions of data in RAM are external; no ECC technology can protect you from software or chipset bugs that write data into the wrong places, or read data from the wrong places, just as ZFS alone cannot protect you from a buggy application or from you destroying the wrong dataset or snapshot by accident at the command line.
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

Previous

Return to General Discussion

Who is online

Users browsing this forum: ilovezfs and 1 guest

cron