Unusably slow performance on pool (suddenly)

All your general support questions for OpenZFS on OS X.

Unusably slow performance on pool (suddenly)

Postby spud603 » Sat Sep 09, 2017 9:58 pm

Hi,

I have a raidz pool that has been running flawlessly for years, but has suddenly (some time in the last few days) become nearly unresponsive. Mounting the pool takes about five minutes, with another five-minute wait to have the root directory structure display at all, and similarly for each subsequent set of subdirectories. This is obviously unusable as is, and it is causing trouble with my system in general as it tries to access the mounted volumes.

Running
Code: Select all
zpool status -v franklin
yields
Code: Select all
pool: franklin
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
   still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
   the pool may no longer be accessible by software that does not support
   the features. See zpool-features(5) for details.
  scan: scrub in progress since Sun Sep 10 00:36:16 2017
    55.4M scanned out of 2.44T at 133K/s, (scan is slow, no estimated time)
    0 repaired, 0.00% done
config:

   NAME                                            STATE     READ WRITE CKSUM
   franklin                                        ONLINE       0     0     0
     raidz1-0                                      ONLINE       0     0     0
       media-E97A7C83-B4CA-4DAA-BF5A-8F890ED5B6C2  ONLINE       0     0     0
       media-9241F0DE-1023-438C-81A7-31D78390484A  ONLINE       0     0     0
       media-FFE58D75-EBC6-4B1A-ACB3-701BD9B85FB2  ONLINE       0     0     0

errors: No known data errors


(I tried running scrub, but as the status message shows, that is not going to finish any time soon)

This is all very disconcerting, because at these speeds the data on the drive (about 1.8 TB) would take several months to retrieve to another file system.

Does anyone have any ideas how to diagnose what is going wrong here? The SMART status of all three dives is fine, the pool has about 135GB of free space, and the system is not taxed at all on disk I/O or CPU. I tried upgrading to the latest version of OpenZFS for OS X (1.6.1) with no improvement. I'm really not sure where else to look.

Thanks in advance for any advice or thoughts you all have!
spud603
 
Posts: 8
Joined: Sat Sep 09, 2017 9:42 pm

Re: Unusably slow performance on pool (suddenly)

Postby lundman » Sun Sep 10, 2017 4:21 pm

ZFS can get slow if the pool is over about ~95% full, but it shouldn't be quite that bad.

If it was just the mounting, it would be worth checking for unlinked-list messages in the system log.

You could also run spindump at mounting / or waiting for a directory to show, so we can see what it is doing.
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Unusably slow performance on pool (suddenly)

Postby spud603 » Sun Sep 10, 2017 6:06 pm

Thanks for the reply. I'll try freeing up some space though I can't imagine that is all of what is going on (hopefully I am wrong!)

Sorry for my ignorance, but how would I do a spindump?

Also, I'm not sure if it's relevant but I am getting a whole lot of messages constantly in my system log from zed similar to the following:
Code: Select all
9/10/17 9:04:45.973 PM zed[51864]: eid=25293 class=delay pool=franklin
9/10/17 9:04:45.986 PM zed[51866]: eid=25294 class=delay pool=franklin
9/10/17 9:04:45.999 PM zed[51868]: eid=25295 class=delay pool=franklin
9/10/17 9:04:46.022 PM zed[51870]: eid=25296 class=delay pool=franklin
9/10/17 9:04:46.040 PM zed[51872]: eid=25297 class=delay pool=franklin
9/10/17 9:04:46.053 PM zed[51874]: eid=25298 class=delay pool=franklin
9/10/17 9:04:46.070 PM zed[51876]: eid=25299 class=delay pool=franklin
9/10/17 9:04:46.086 PM zed[51878]: eid=25300 class=delay pool=franklin
9/10/17 9:04:46.099 PM zed[51880]: eid=25301 class=delay pool=franklin
9/10/17 9:04:46.112 PM zed[51882]: eid=25302 class=delay pool=franklin


Thanks!
spud603
 
Posts: 8
Joined: Sat Sep 09, 2017 9:42 pm

Re: Unusably slow performance on pool (suddenly)

Postby lundman » Sun Sep 10, 2017 8:23 pm

The pool agrees with you that its got delays.

"sudo spindump" makes a spindump, and it tells you the filename at the end. Runs for about 10s.
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Unusably slow performance on pool (suddenly)

Postby spud603 » Mon Sep 11, 2017 10:45 am

Thanks.

I'm attaching a spindump from a fresh reboot of the computer. I have it set up to automatically attach the pool on boot, so this is from the period before it had successfully mounted (no sign of the drive at all under /Volumes/, either through Finder or the command line). `zpool list` was also running (and hanging) while this dump was going.

Again, thanks so much for any thoughts on what may be causing this.
Attachments
spindump.listunmounted.txt
(1.3 MiB) Downloaded 353 times
spud603
 
Posts: 8
Joined: Sat Sep 09, 2017 9:42 pm

Re: Unusably slow performance on pool (suddenly)

Postby lundman » Mon Sep 11, 2017 4:27 pm

Code: Select all
                                     *1002 spa_load + 989 (zfs) [0xffffff7f82004dea]
                                       *1002 vdev_uberblock_load + 129 (zfs) [0xffffff7f820195d5]
                                         *1002 zio_wait + 520 (zfs) [0xffffff7f8205d1f3]


It's spending all its time in there, which is odd - waiting for IO to complete. Everything looks normal in your pool? You would get that if one of the devices was acting weird, or slow.

If you run "zpool iostat -vq $POOL 1" do all devices behave similarly?
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Unusably slow performance on pool (suddenly)

Postby spud603 » Mon Sep 11, 2017 4:44 pm

Thanks so much for looking through that huge file for me :)

That's interesting. Running the iostat command, none of the devices seems slower than the others (all are able to get a read bandwidth of about ~2MB at peak), but the third device is definitely out of sync with the other two. That is: the first two devices will tend to have large reads at the same time, followed by a large read bandwidth on the third device a second or two later.

Is there some way to take that drive offline and see if performance improves without doing a full drive swap? Would it make any sense to physically disconnect that drive and see what happens?
spud603
 
Posts: 8
Joined: Sat Sep 09, 2017 9:42 pm

Re: Unusably slow performance on pool (suddenly)

Postby lundman » Tue Sep 12, 2017 4:38 pm

You can "zpool offline pool device" to take out a device for testing, then online it again when you are done.
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Unusably slow performance on pool (suddenly)

Postby spud603 » Wed Sep 13, 2017 6:23 am

I think that did it! Taking the third drive offline make the pool relatively quick and responsive.

I've ordered a new drive to replace it.

Strange that the drive itself wasn't reporting any errors, but I'm glad it's resolved.

Thanks again for all your help!
spud603
 
Posts: 8
Joined: Sat Sep 09, 2017 9:42 pm


Return to General Help

Who is online

Users browsing this forum: No registered users and 33 guests