Contemplating ZFS

Moderators: jhartley, MSR734, nola

Re: Contemplating ZFS

Post by LaMosca » Thu Jan 17, 2013 5:13 pm

Thanks. I've been reading through the zfs man page (converted to PDF to make it more convenient) and I'm starting to get it, lol

There appear to be quite a few options that I'll need to read more into and hopefully understand. Much learning to do.

I recall seeing this in the zpool man page:

ZFS supports a rich set of mechanisms for handling device failure and data corruption. All metadata and data is checksummed, and ZFS automatically repairs bad data from a good copy when corruption is detected.
In order to take advantage of these features, a pool must make use of some form of redundancy, using either mirrored or raidz groups. While ZFS supports running in a non-redundant configuration, where each root vdev is simply a disk, this is strongly discouraged. A single case of bit corruption can render some or all of your data unavailable.

So I assume with a pool of a single disk, if a file became corrupted, it would know it was corrupted, but wouldn't be able to revert to the uncorrupted version, but would alert you in some fashion?

I have a few disks that I was planning on not using in my 10 disk RAID10 set and I'm wondering now what would be the best way to utilize them.

One 2TB Western Digital Black
One 3TB Hitachi Deskstar

Hmmm, now that I think about it, I have another Hitachi 2TB Deskstar in an external single disk enclosure that I could swap with the 3TB and then would be able to mirror the two 2TB drives.

I obviously have more planning to do!
LaMosca Offline


 
Posts: 11
Joined: Thu Dec 06, 2012 10:42 pm

Re: Contemplating ZFS

Post by grahamperrin » Fri Jan 18, 2013 5:58 pm

LaMosca wrote:… with a pool of a single disk, if a file became corrupted, it would know it was corrupted, but wouldn't be able to revert to the uncorrupted version, but would alert you in some fashion …


Exactly, when properties of a file system are the defaults.

With ZEVO Community Edition 1.1.1, in the ZEVO pane of System Preferences, Checkups tab, an option: Validate Data.

Here's a pool where I sometimes found repairs without error, then repairs alongside two errors (permanent errors that could not be repaired), and most recently no repairs. But the two errors remain:

screenshot 2013-01-18 at 22.51.49.png
screenshot 2013-01-18 at 22.51.49.png (114.05 KiB) Viewed 82 times


For a single disk pool it's possible to set the following property, at the file system level:

copies=2

Then if you're lucky, one of the two copies of the data can be used for repair. When your luck runs out, you may find permanent errors. (In the screenshot above, the pool includes a file system with copies=2 … it's a disk that I keep for demonstration/test purposes only.)

As you have multiple disks, don't bother with copies=2 or copies=3 – it'll make greater sense to create a pool with multiple disks.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Contemplating ZFS

Post by LaMosca » Fri Jan 18, 2013 6:39 pm

Ah, interesting. Yes, after your additional information, I'm planning to mirror a couple sets of disks outside of my larger raid10 set for better redundancy of those pools. So from what you stated above, it appears with redundant disks, there should be no need for setting "copies=2" on a pool/filesystem?

Seeing that the checkups run via the ZEVO control panel require manually clicking on the "Start" button, is there a way to run them automatically (or do they regardless)? Do the checkup runs log somewhere?

Is there a best practice regarding pools and filesystems? As I understand it from what you said before, upon initial pool creation one filesystem is created and mounted. If you create other filesystems from the pool, they are mounted underneath that mount point. Is it best to create additional filesystems, or only in the case you need additional filesystem settings that weren't specified for the pool itself?
LaMosca Offline


 
Posts: 11
Joined: Thu Dec 06, 2012 10:42 pm

Re: Contemplating ZFS

Post by ghaskins » Sat Jan 19, 2013 12:31 am

LaMosca wrote:Ah, interesting. Yes, after your additional information, I'm planning to mirror a couple sets of disks outside of my larger raid10 set for better redundancy of those pools. So from what you stated above, it appears with redundant disks, there should be no need for setting "copies=2" on a pool/filesystem?


Correct

LaMosca wrote:Seeing that the checkups run via the ZEVO control panel require manually clicking on the "Start" button, is there a way to run them automatically (or do they regardless)?


At the very least, you can create a launchctl job to call "zpool scrub" whenever you want.

LaMosca wrote:Is there a best practice regarding pools and filesystems? As I understand it from what you said before, upon initial pool creation one filesystem is created and mounted.


By default, yes, but note that you don't have to leave it mounted (see zfs properties like "canmount").

LaMosca wrote:If you create other filesystems from the pool, they are mounted underneath that mount point.


By default, yes, but note you can choose to not mount it at all, or mount it to an alternate location. You might ask why one would want to create a filesystem and not mount it, and the answer has to do with inheritance. An umounted zfs filesystem can act as a configuration container for subordinate filesystems. For instance, I employ the following hierarchy on my ZFS backup machine: tank/backups/macpro/[users, lightroom]. However, the only filesystems that holds data are "users" and "lightroom". "backup" is the container for all backups and has the "compression=on" setting set. "macpro" inherits compression=on, and has the "zfs allow snapshot" set for the account "mpbackup" as well as a quota. Finally, each rsync job from the macpro gets its own filesystem (users, lightroom, etc) (which inherit both compression=on and the ACLs) so I can properly scope snapshots. See comments below for more details.

Likewise, when I was using Zevo I had it set up as one global zpool "tank", unmounted, and tank/Users was mounted to /Users. "tank" in this case really just represented the raw pooled capacity within the mac, of which /Users was carved from.

LaMosca wrote:Is it best to create additional filesystems, or only in the case you need additional filesystem settings that weren't specified for the pool itself?


Whats "best" is really going to depend on your needs and what you want to do. If you don't know what you would do with multiple filesystems, the short answer is you probably don't (yet) need more than the default one that gets created with the pool. However, as you start to get used to ZFS, you will start to see things where you can exploit its architecture to take things in directions you may have not considered before.

Good luck!
-Greg
ghaskins Offline


 
Posts: 52
Joined: Sat Nov 17, 2012 9:37 am

Recommended pool creation options

Post by grahamperrin » Sat Jan 19, 2013 8:14 am

A one line summary … the two properties in that one line can not be changed after a pool is created, so it's important to get them right first time.

The lowercase o applies the property to the pool at time of creation of the pool. ashift=12 assumes that at some point in the future, an Advanced Format drive may be added to the pool.

The uppercase O applies the property to the file system at time of creation of the pool.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Contemplating ZFS

Post by LaMosca » Tue Jan 22, 2013 10:47 am

ghaskins wrote:BTW: You may want to look into using a SAS HBA instead of an eSATA setup. The SAS HBA/enclosures will generally offer you much more non-blocking bandwidth, and SAS enclosures are backwards compatible with SATA drives. Personally, I am using an LSI 9207-8e together with a Sans Digital TR8X+ 8-bay enclosure. Very happy so far.

What version of Mac OS X are you using with this setup? What are you using for drivers? I see Astek provides drivers, but I didn't see the 9207-8e on their list of supported LSI cards. They also appear to sell their own rebranded LSI cards.

EDIT: When going to their store and looking at their drivers, they do list OS X 10.8 support as well as the 9207-8e card. So interested to hear what you're using :)
LaMosca Offline


 
Posts: 11
Joined: Thu Dec 06, 2012 10:42 pm

Re: Contemplating ZFS

Post by ghaskins » Tue Jan 22, 2013 5:18 pm

LaMosca wrote:What version of Mac OS X are you using with this setup? What are you using for drivers? I see Astek provides drivers, but I didn't see the 9207-8e on their list of supported LSI cards. They also appear to sell their own rebranded LSI cards.

EDIT: When going to their store and looking at their drivers, they do list OS X 10.8 support as well as the 9207-8e card. So interested to hear what you're using :)


I am using the Astek driver on 10.8, though I also used it on 10.7 for a while. Both worked fine. Operationally very stable and I can push unbelievable bandwidth through this thing. In the interest of full disclosure, Ill point out two problems I have had:

1) SAS devices do not seem to get their TCQ depth set very high (if at all) resulting in lower R/W IO performance. In addition, the driver will _not_ manage write-cache-enable (WCE). For the latter, I booted a linux livecd, and set WCE=1 persistently with sdparm. For the former, there wasnt much I could do and I ended up returning the drives for their SATA equivalents in the end. FWIW, the setup seems to support NCQ/WCE properly on SATA devices.

2) I seem to have a problem with the SAS array going out to lunch if the computer goes to sleep. At first I thought it might have been Zevo, but I've seen the problem at least once since moving back to using JHFS+, so I suspect it might possibly be the Astek driver. This is on a desktop machine which typically doesn't sleep anyway, so the workaround was I just permanently disabled sleep in the SysPrefs. Not a huge deal (for me), but something to consider if power savings is an issue (or you are using it on a laptop, perhaps with a TB enclosure).

Kind Regards,
-Greg
ghaskins Offline


 
Posts: 52
Joined: Sat Nov 17, 2012 9:37 am

Re: Contemplating ZFS

Post by LaMosca » Tue Jan 22, 2013 7:26 pm

I'm having issues with the NewerTech MaxPower RAID 6G cards I got to connect to my storage enclosure (with eSATA Silicon Image 3726 port multipliers). During transfer, the driver will reset, rescan the bus and then continue with the transfer. Not the speediest that way, lol

I am contemplating moving to SAS, waiting on a quote to replace the eSATA PM cards in the enclosure with miniSAS to SATA.

While researching the issue, I found others reporting similar issues with the card (NewerTech appears to repackage the HighPoint 622 card and provide Mac OS X support). Some recommended using a Silicon Images 3132 chip based controller and I *just* happened to have one of these already. Although it's SATA2, it does appear to resolve the issue with driver resets and provides the benefit of not needing to setup single disk JBODs in the card just to see the drive from the OS. So the cheap solution is to send back the NewerTech cards and get another Sil3132. I'm still interested to see what switching to SAS will cost however. I've seen the LSI 9207-8e for $340 or so (add in $50 for the Astek driver plus some more for cables). Speaking of cables, what are you using for miniSAS cables in your setup?

The other bit I'm thinking through is my use of NAS. I currently have a ReadyNAS NV+ (old, slow) which I am using as primary for documents and pictures. I'm thinking with ZFS on a RAID10 setup, I should use that as the primary source for everything important. I was then thinking, should I really be using NAS at all for a backup of the ZFS storage? Wouldn't that be like using an old clunker as a second car to <insert favorite very reliable car make here>? Would the files on the NAS be corruption free when you really needed to rely on them? I've started thinking of a second computer also running ZFS which I could backup the main ZFS RAID10 to instead. I'll have one set of 5 disks free in the storage enclosure. I'm thinking a new Mac Mini with a Thunderbolt solution of some sort to connect to the remaining 5 disks. Depending on what I end up doing with my gig-e switch, I'll have 1-2U to work with in my rack. I could get a Sonnet Thunderbolt ExpressCard adapter and a Sonnet Tempo Pro eSATA ExpressCard/34 which supports port multipliers. This would be around $280 + TB cable. If I went miniSAS, I could get one of the Thunderbolt PCIe chassis and install the LSI card. Astek appears to support this config. This would be (at the cheapest) around $750. The NAS I was looking at was around $1300-1500 (depending on if I get redundant power supplies) before buying disks. The Mac Mini solution would range from ~$920 to 1350 (plus any upgrades to the Mini, perhaps some additional cables, etc). For my situation, there are two added "bonuses" in using another Mac instead of a NAS, I can use it to interface with my current or future bike trainer and it should be able to step in and connect to the ZFS RAID10 directly if the main system went down (there will likely be a few unique things on the main server storage that I wouldn't backup (eg: EyeTV recordings prior to being processed and added to iTunes).

Thanks for the additional info on your issues. On my server, I set it never to spin down drives or go into sleep. It's set to power down if the UPS it's attached to gets low on battery, but it should thus shutdown before power to the external storage enclosure is lost. I'd be going with SATA disks so the SAS issues also should not affect me (if I go to SAS).
LaMosca Offline


 
Posts: 11
Joined: Thu Dec 06, 2012 10:42 pm

Re: Contemplating ZFS

Post by ghaskins » Tue Jan 22, 2013 9:53 pm

LaMosca wrote: I've seen the LSI 9207-8e for $340 or so (add in $50 for the Astek driver plus some more for cables).


I think I paid $325 and $35 back in Nov/Dec, so keep an eye out for sales.

Speaking of cables, what are you using for miniSAS cables in your setup?


One advantage the Sans Digital TR8X+ case had was it came with a pair of 8088s, which saved quite a bit of cash.

I've started thinking of a second computer also running ZFS which I could backup the main ZFS RAID10 to instead


That's exactly what I have done. I put together a Xeon E3 system with ECC ram and 4x3TB SATA drives, running OmniOS. I use rsync once per hour to sync the JHFS+ SAS array to the ZFS box, and it snaps the zfs setup when its done. I then use a cron job to manage the snapshot retention. This box is slated to be dropped off at my friends house about 20 miles away from mine later this week.

I wish that I were using ZFS/Zevo on the SAS setup as well, but I couldn't get it to perform reasonably well enough for my particular dataset. I think even if I were running ZFS native on the primary filesystem, I might still be inclined to use the rsync+snapshot model because I think rsync has better unreliable connection recovery options than zfs send.

Anyway, good luck in your setup and feel free to ask any more questions.

Kind Regards,
-Greg
ghaskins Offline


 
Posts: 52
Joined: Sat Nov 17, 2012 9:37 am

Re: Contemplating ZFS

Post by mk01 » Sun Feb 03, 2013 11:07 am

@ghaskins: as you are saying "I think rsync has better unreliable connection recovery options than zfs send" I would not agree. Maybe it may look scary on the first full send (few TB filesystem with only final snapshot), but otherwise, if you think of what rsync is doing and what zfs snapshot and then send -I, it's incomparable.

if you change one file on FS with few millions files, many as part of package where actual homogenous state across all files is needed (like dBase databases, mysql databases, even itunes library, iPhoto, Aperture, Logic. rsync would be running hours just to send one new file. And if in between transaction occurs (which needs to update multiple files), you will never be able to reconstruct.

snapshot is snapshot, it freezes the state and happens in matter of seconds, the same for send.

Once the snapshot is received and displayed on the other side, you are happy and sure you are fine. You can't tell that for rsync. It's impossible. Just go through the various --delete options. Delete before, after, during, during in sequence, during fuzzy... it's scaryx Not that those options are implemented. But they are just the result of all the dramatic risks they try to avoid.

@LaMosca: the benefits of the hierarchical design and options to mount / not mount for example are very good seen on the way how solaris is handling updates to the system. it has root fs (system) on zfs. if update transaction starting, it creates snapshot and clones it to a new filesystem (this takes few seconds). then update session changes the system and you can test. if anything fails, or just new serious bug arrives, you just reboots the system from the clone. and you still can during full operation run a virtual copy from the (old) damaged system and look, what happened. the old fashioned way on macosx (and still miles ahead from other systems) is do TM backup (you should remember), run updates, reboot and see. If not good, you just clean reinstalls from backup. not bad, but if you had no separated system partition, it will be time consuming.
mk01 Offline


 
Posts: 65
Joined: Mon Sep 17, 2012 1:16 am

PreviousNext

Return to General Discussion

Who is online

Users browsing this forum: ilovezfs and 1 guest

cron