Transfers to/from Zevo volume suddenly grind to a halt

Moderators: jhartley, MSR734, nola

Transfers to/from Zevo volume suddenly grind to a halt

Post by wonkywonky » Wed Nov 14, 2012 3:39 am

Summary
I'm having some strange issues with the box I use as a NAS and Plex Media Server.

File transfers to and from Zevo volumes suddenly stall with no warning, causing Finder to freeze up, and eventually necessitating a hard power down. At this point, running anything zpool related (zpool status, zpool list, etc) results in the binary hanging with no output about 80% of the time.

I realize that my use of a Hackintosh introduces the potential for some weird interactions between components, but any ideas would be appreciated.

Hardware
    Hackintosh identified as a MacMini5,1: H67 mobo + i3 + 16GB RAM + boot SSD + ATTO ExpressSAS HBA PCIE x8 card
    4 x 1TB + 4 x 750GB drives off the ATTO HBA
    4 x 500GB drives off the mobo SATA ports
Software
    ML 10.8.2
    Zevo CE 1.1.1
Pool/vdev structure:
    1. Media (ashift=12)
      RAIDZ 4 x 750GB
      RAIDZ 4 x 1TB
    2. Data
      Mirror 2 x 500GB
      Mirror 2 x 500GB
Details
Pool #1 was originally a MacZFS pool (v8) with about 3TB of data, accessed over the network via AFP (using liberate apple file server) and locally by Plex Media Server. Running happily since July 2012 ish. Pool #2 was recreated with Zevo about early October. About 2 weeks back, I started to see file transfers hanging midway through. This happened with both local transfers (to other ZFS or HFS volumes) and transfers over AFP.

When monitoring data rates in iStat Menus during some of the transfers, I could see all drives pushing data at typical rates, until they suddenly stopped. Any subsequent attempts to access the pools would result in beachballing/Finder freezes. Read/write/checksum errors reported by "zpool status" remained at 0/0/0 for all drives in the pool. I've unplugged cables before to check if error rates start increasing, so I know that's working.

Things I've tried
I thought it might be down to some weird interaction between v8 (via MacZFS) and v28 (Zevo) pools, so I decided to blow the pool away few days ago and recreate it with Zevo. (Obviously, copying all the data out to other disks took a long time because of the nature of the problem I'm having). Unfortunately, I'm still seeing transfers stall partway through.

I've tried a fresh install of v10.8.2 and Zevo v1.1.1, but the problem remains.

I have not yet been able to pinpoint a specific set of triggers for the transfer stalling. I've seen it happen halfway through the transfer of a single 3GB files, and I've seen a few 100+GB transfers complete successfully.

My RAM has passed a 3h memtest run. I haven't been able to reproduce the problem so far when using HFS volumes on the ATTO-connected drives. I haven't had time to examine the ATTO HBA itself for failure.

Any ideas, anyone?
wonkywonky Offline


 
Posts: 25
Joined: Fri Sep 14, 2012 11:33 pm

thoughts

Post by grahamperrin » Wed Nov 14, 2012 5:33 am

Technical reference: sysdiagnose and related approaches to gathering information

Whether your own OSx86 environment will respond as expected to the Mac hardware-oriented key chord for sysdiagnose, I don't know.

From what's in the opening post here, I reckon that you might run sysdiagnose from the command line (without the chord) to get at least partial results.

Also I guess that during the incidents you describe:

  • runs of ZEVO-provided binaries (zpool, zfs, zdb etc.) may not respond to Control-C
  • for affected file systems, runs of commands such as lsof may be similarly unresponsive to Control-C
  • responses to Control-T will suggest no progress

… if so, then (loosely) I'd think of the issue as involving a bus, with possible knock-on effects on other buses.

I might take a closer look later, meantime I wonder … is it possible that a hard drive spins down during a copy? If not a spin down, then step back and aim for a holistic view of what might allow something on a bus to not respond (or not respond in good time).
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Transfers to/from Zevo volume suddenly grind to a halt

Post by dbrady » Wed Nov 14, 2012 8:14 am

The output of spindump(8) during the time of the stall would help us diagnose the issue. That command will reveal where ZFS is stuck inside the kernel (or at least what resource it is waiting for).
dbrady Offline


 
Posts: 67
Joined: Wed Sep 12, 2012 12:43 am

Re: Transfers to/from Zevo volume suddenly grind to a halt

Post by wonkywonky » Wed Nov 14, 2012 10:51 am

Thanks Don, I'll get on it. I guess this time I'll actually have to hope the stall happens quickly!

Will it be sufficient to run spindump once I see the stall occur, or will you need samples to cover the period before and after?
wonkywonky Offline


 
Posts: 25
Joined: Fri Sep 14, 2012 11:33 pm

Re: Transfers to/from Zevo volume suddenly grind to a halt

Post by wonkywonky » Thu Nov 15, 2012 9:59 am

spindump...dump attached.

Edit:
This transfer died after about 70GB done I think.

Another one died at 12.59GB done out of 136GB, and I just tried another 132GB transfer that made it all the way to 96.79GB before stalling. In all cases, I was copying from a Zevo single-disk pool with fs copies=2.
Attachments
spindump.zip
(68.44 KiB) Downloaded 5 times
Last edited by wonkywonky on Thu Nov 15, 2012 9:07 pm, edited 1 time in total.
wonkywonky Offline


 
Posts: 25
Joined: Fri Sep 14, 2012 11:33 pm

Re: Transfers to/from Zevo volume suddenly grind to a halt

Post by shuman » Thu Nov 15, 2012 4:08 pm

Would someone like to perform a demo of spindump analysis? I would be curious to know what the top 10 things to look for in the spindump file are.
- Mac Mini (Late 2012), 10.8.5, 16GB memory, pool - 2 Mirrored 3TB USB 3.0 External Drives
shuman Offline

User avatar
 
Posts: 96
Joined: Mon Sep 17, 2012 8:15 am

Spindump analysis

Post by grahamperrin » Thu Nov 15, 2012 4:51 pm

In Stack Exchange:

Spindump analysis instructions?

I added a bounty, and a mention in Ask Different Chat.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Transfers to/from Zevo volume suddenly grind to a halt

Post by d.jacobs » Thu Nov 15, 2012 11:40 pm

I've noticed things like this too - but with iStat installed, I can see that the pool is still active at -something-. Disk Activity shows minor read and write activity on the pool disks, a few hundred kilobytes/sec. If I leave it sit a while, it will eventually free itself and continue in anywhere from a few minutes to an hour or more. I've assumed there is some sort of ZFS housekeeping going on. I'll try to catch a sysdiagnose the next time.
d.jacobs Offline


 
Posts: 5
Joined: Wed Oct 17, 2012 3:55 pm

causes of activity

Post by grahamperrin » Fri Nov 16, 2012 1:50 am

d.jacobs wrote:… the pool is still active at -something- …


When that happens, try a simple
sudo lsof
for the affected file system(s). You might find mds used by root.

Sometimes I forget that my scrubs are automated.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom


Return to General Discussion

Who is online

Users browsing this forum: ilovezfs and 3 guests

cron