performance degradation over time

Moderators: jhartley, MSR734, nola

Re: performance degradation over time

Post by jesse » Fri Apr 05, 2013 2:59 pm

yeah, i know running a hackintosh means i'm in the bottom of the queue for troubleshooting.

i'm not using compression or anything else that would impede performance.

i only have two guesses. one, i'm a unix admin and consequently have a lot of small files populating (or polluting) my home directory. here's an example:

crackmonkey:~ % du -sh .local
70M .local
crackmonkey:~ % find .local | wc -l
9560

i know small file I/O is the bane of most filesystems.

two, i don't religiously keep track of how much space i'm using in my pool, but it's possible i went over 80% and then back down. my lousy performance pool is at 70% now (good performance one is at 26%), but maybe you can't cross that threshold and gracefully recover. i don't seem to have trouble with ZFS on freebsd/solaris so i assume os x would be the same...

performance during a scrub isn't the actual problem. the problem is my zfs home directory seems to get slower with time, to the point that it's painful to use, and creating a new pool and using rsync to make the pools have identical data brings performance back. exporting/importing the pool and/or a reboot don't change things. weirdest zfs behavior i have seen, i guess the good news is that i have a 'fix.' :)
jesse Offline


 
Posts: 16
Joined: Fri Sep 14, 2012 11:59 pm

Re: performance degradation over time

Post by raattgift » Fri Apr 05, 2013 4:59 pm

Actually, given that this is a hackintosh with assorted "alien" hardware, can you dual-boot it, from, say an illumos, openindiana, or freebsd livecd ? You could scrub with one of those and compare numbers. You could also compare simple traversals of metadata pretty safely (find, ls, etc.), or even compare how (s)tar works, just to see if there is a really grossly large difference between your pools' performance that persists across different operating systems, with their different hardware drivers (and very non-IOKit approaches to them).

If you were feeling intrepid and had another Mac OS X box around you could try mounting the hosted-on-non-Mac-OS-X volumes (via SMB, NFS or even afp with some recourse to the packaging system) from the hackintosh described in this thread to the other Mac OS X box, and try using it as your $HOME. If that turns up to be usably responsive, a bit of work would let you try sharing the same volume from your hackintosh running Mac OS X, for comparison purposes.

There are a number of people here who have run their $HOME on zfs. I do, although I use mirrored vdevs that reside on separate thunderbolt-connected sata-3 SSDs, although I have lived for short periods with $HOME on pools that are rotating rust. I make heavy use of a couple of tens-of-Gbytes-pools that are dominated by small-file / random-read; they are 2- and 3- sided mirror vdevs on FW800 and USB 3. They perform reasonably enough to avoid any particular desire for mounting a filesystem from a non-Mac server.

The only performance problems I've had that relate to Mac OS X and ZEVO involve letting L2ARCs grow so large that the ARC performance tanks; that can happen on any platform, but ZEVO clamps down on the ARC pretty hard. Adjusting the primarycache/secondarycache dataset properties made the performance issues vanish immediately, and rearranging the sum of all L2ARC media to cap the size to only a few million arc_buf_hdr_t referenced objects is likely to allow me to set them back to "all" on the relevant datasets.
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

Re: performance degradation over time

Post by raattgift » Fri Apr 05, 2013 5:55 pm

Your system_profiler dump has :

Negotiated Link Speed: 1.5 Gigabit
Capacity: 250.06 GB (250,059,350,016 bytes)
Model: ST3250410AS
disk1s2:
Content: Apple_HFS
disk1s3:
Content: Apple_HFS


and

Negotiated Link Speed: 3 Gigabit
Capacity: 750.16 GB (750,156,374,016 bytes)
Model: ST3750640NS
disk2s2:
Capacity: 749.81 GB (749,812,400,128 bytes)
Content: ZFS


and

Negotiated Link Speed: 3 Gigabit
Capacity: 750.16 GB (750,156,374,016 bytes)
Model: ST3750640NS
disk0s2:
Capacity: 749.81 GB (749,812,400,128 bytes)
Content: ZFS


Your "Users" pool devices do not appear in the system_profiler dump, unless I've gone blind, and there's a second 750G one.

crackmonkey:~ % zpool iostat -v
capacity operations bandwidth
pool alloc free read write read write
------------------------------------------- ------ ------ ------ ------ ------ ------
Users 168Gi 64.1Gi 76 2 6.76Mi 12.4Ki
mirror 168Gi 64.1Gi 76 2 6.76Mi 12.4Ki
GPTE_C7015CF8-E5EA-4C9E-93C9-B4A2190F7AFE - - 47 1 4.70Mi 12.5Ki
GPTE_EED64656-120D-49EF-86E8-E0A746061DC3 - - 47 1 4.70Mi 12.5Ki
------------------------------------------- ------ ------ ------ ------ ------ ------
Users2 168Gi 528Gi 70 1 7.56Mi 8.29Ki
GPTE_7D899FED-6217-4338-9E59-8F2FDBD7CDBF 168Gi 528Gi 70 1 7.56Mi 8.29Ki
------------------------------------------- ------ ------ ------ ------ ------ ------


which fits

one boot disk and a zpool raidz1 named Users, consisting of a pair of 250gb disks. i recently added a 750gb disk and used it to make a single-disk zpool named Users2


Is the system_profiler from the same machine? Have you been changing its hardware configuration?

Finally, note the difference in negotiated SATA link speed between the 250G disk and the two 750G ones; that and model-related differences in latency could certainly lead to a marked difference in performance. Unfortunately this is nothing but a completely wild-assed-guess, making assumptions about the differences between what you wrote in the first message and what's configured on your system when you are having performance problems.
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

Re: performance degradation over time

Post by jesse » Fri Apr 05, 2013 6:34 pm

yes, i've been screwing with hardware. there were three 250gb disks, one for the OS and two for ZFS. the boot disk will be replaced with an ssd and the other 250s have already been replaced with 750s. the old disks are still in the case, just not powered up when i ran system_profiler.

thanks for the ideas. i will make sure everything on the bus is 3gbps, and try messing with the pools from boot cds. i'll post an update early next week.
jesse Offline


 
Posts: 16
Joined: Fri Sep 14, 2012 11:59 pm

Re: performance degradation over time

Post by raattgift » Fri Apr 05, 2013 7:01 pm

+1 on the quoted bits.

Also:

grahamperrin wrote:For what it's worth: with my own setup, which is not optimised for performance, I suspect that occasional slowness – during scrub of a particular pool – is to be expected when 'going over' points in time where compression was greatest for blocks for a large compressible file (or for a succession of relatively small compressible files such as bands of a .sparsebundle).


That's a reasonable suspicion, but wrong.

The write pipeline is (for snv_149 and beyond) compression, encryption, checksumming and deduplication, in that order.

The normal read pipeline is: check if in ARC, check if in L2ARC, check if in duplicate table, retrieve-properly-checksumming-blocks-from-disk, decrypt, decompress.

ZEVO CE 1.1.1 obviously has not implemented encryption or enabled deduplication, but is extremely unlikely to have otherwise departed from these pipeline.

Scrubbing and resilvering do not use the normal read pipeline; neither can stop when the first properly checksumming block is retrieved from disk, and neither has to decrypt or decompress. Both processes are also metadata-driven. In general, they grab the current transaction group (txg) from the uberblock and stash it in the resilver or scrub name-value pair, and then descend mostly breadth-first from all the uberblocks (and the configuration metadata) to the pool root dataset and snapshot layer metadata, to all the descendant DSL metadata, then back up to the data from the now-known-to-be-correct pool root, examining data blocks by order of birth time, then through its children's data blocks, and so forth. The primary constraints is that no child is considered clean until its parents are considered clean and no child is considered clean unless all its copies and/or parity are also clean.

Once the whole tree of the stashed txg is clean, if the most recently committed txg has a more recent birth date, that tree will be descended through, except that subtrees below blocks with a birth date older than the clean txg aren't examined. This takes account of COW, the layout of the zfs metadata tree, and the transactional nature of zfs.

Data which is highly compressed and written in a sequential bulk then left alone is likely to be faster to scrub than other data, since the blocks will all have closely related parent metadata and highly similar birth dates. That favours locality on disk.

Scrubbing is typically IOPS-limited, so locality on disk is the main determinant of speed.

When you have an object which is regularly updated, by appending or by writes in the middle, you generate a lot of new blocks via COW. Each COW will have an updated birth date, and will be tied to a metadata tree back up to a newer txg.
Therefore for an 8MB band file, when you rewrite a 4096-byte dmg block contained in it, you reduce locality on disk compared to the other blocks backed in that band file. Time machine does this quite a bit, particularly in the blocks holding JHFS+ metadata. So when you are scrubbing away, it's not the compression that slows you down but rather the previous rewriting, since blocks with later birth orders with a common DSL parent are likely to be scattered around a bunch, possibly into widely-separated metaslabs on the same vdev (or, optimistically, onto the pool's other storage vdevs that can thereby produce a concurrency gain).

On the other hand, the DSL parent may have been updated for a variety of reasons (differently aged blocks held in different snapshots, and the posix mtime and atime attribute updates), which means less jumping around to scrub the blocks of a file all at once, and more descending through new metadata (sub) trees. Except in very full vdevs, that recovers locality on disk extremely well. (Indeed in the time machine case, if you have multiple storage vdevs, the scrub will *tend* to be checking the updated blocks for different time machine runs concurrently.)

Scrub slowdowns are usually because of bursts of POSIX-layer data changes -- updates to directories, creating new files, and rewrites of previously existing POSIX files -- such as when making and installing a project like gcc, retrieving a bazillion new Mail messages from an IMAP server, or pushing all those changes into a Time Machine Backups volume (whether SPARSEBUNDLE or SPARSE dmg).

This is usually referred to as zfs's file fragmentation problem. It might not be a problem for some workloads; it should not present a problem for your time machine activity, which should benefit from zfs's large record size, it's sequentialization of writes, and inherent localities of reference within the backup task itself. However, the usual fragmentation workaround is to simply copy the fragmented objects; zfs send/recv can do this because it works with DMU objects rather than blocks. For SPARSEBUNDLE and SPARSE dmgs, you can use your favourite system tool to make a new copy all at once, or you can dig into the bundle and copy suspect bands, which you can probably find using "ls -lat" over time; lower-numbered bands which accumulate recent modification times are hot and likely fragmented. (Beware that "file defragmentation by copying" using POSIX-layer tools is complicated by zfs snapshots). However, that seems like a lot of work to shave seconds or even minutes off occasional scrubs.
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

Previous

Return to General Discussion

Who is online

Users browsing this forum: ilovezfs and 0 guests

cron