Deadlocks: anyone else seeing this with 1.7.2?

All your general support questions for OpenZFS on OS X.

Deadlocks: anyone else seeing this with 1.7.2?

Postby Sharko » Fri Oct 12, 2018 10:37 am

I updated my zfs install from 1.5.2 to 1.7.2 about six weeks back. I did not upgrade my main pool, but I did create a new pool on my mirrored backup drive that uses the native ZFS encryption. Since then, and it could be entirely coincidental, I've been getting occasional deadlocks while running Carbon Copy Cloner (still on version 4.1.23). While logged in as my admin user I have CCC set up to first clone my HFS+ El Capitan boot partition to an encrypted HFS+ partition on the external drive, then it clones my user directory from the ZFS dataset holding my everyday (non-admin) user account.

I back up weekly, and the problem shows up about half the time, so I've seen it happen three or four times: progress stops on CCC, then the disk activity appears to stop. If I try to switch Activity Monitor, or open the Force Quit dialog I get a spinning beach ball of death, which quickly spreads to anything I try to do: can't activate a Terminal window, can't switch tabs or activate Activity Monitor, can't switch to Console. Curiously, I've only seen it happen during the first task, the one that doesn't even involve ZFS.

Well, dang, looks like it just happened again now as I was typing this :shock: Excuse me while I reboot.
Sharko
 
Posts: 230
Joined: Thu May 12, 2016 12:19 pm

Re: Deadlocks: anyone else seeing this with 1.7.2?

Postby Sharko » Fri Oct 12, 2018 10:56 am

OK, where was I...

The fans aren't going full tilt, and none of the CPU cores appear to be pegged. I can't swear to this, but I think I had the memory tab open in Activity Monitor once while this was happening, and memory usage was low (i.e. normal, I have 32GB of ECC ram).

When I check Console, I find nothing interesting in the logs at the time of the deadlock. Maybe the file system is wedged so tight it can't write to a log? I've run Disk Utility's First Aid on the two HFS+ backup partitions, and both come back clean. I've run DriveDX to check the backup disk's SMART data, and that all comes back clean: no uncorrectable errors, no pending sector counts, no offline uncorrectable Sector counts, raw read error rate is green and OK, etc.

So I'm wondering how best to debug this? I'm thinking to keep a terminal window open with top running as a start, but I'm wondering if there is somewhere I should be looking to see what ZFS is up to when this happens.
Sharko
 
Posts: 230
Joined: Thu May 12, 2016 12:19 pm

Re: Deadlocks: anyone else seeing this with 1.7.2?

Postby lundman » Fri Oct 12, 2018 1:30 pm

sudo spindump

It will show all threads in the kernel, and how long they've been stuck there - shows deadlocks quite well.
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Deadlocks: anyone else seeing this with 1.7.2?

Postby jdwhite » Sat Oct 13, 2018 2:59 pm

Sharko wrote:OK, where was I...

The fans aren't going full tilt, and none of the CPU cores appear to be pegged. I can't swear to this, but I think I had the memory tab open in Activity Monitor once while this was happening, and memory usage was low (i.e. normal, I have 32GB of ECC ram).

When I check Console, I find nothing interesting in the logs at the time of the deadlock. Maybe the file system is wedged so tight it can't write to a log? I've run Disk Utility's First Aid on the two HFS+ backup partitions, and both come back clean. I've run DriveDX to check the backup disk's SMART data, and that all comes back clean: no uncorrectable errors, no pending sector counts, no offline uncorrectable Sector counts, raw read error rate is green and OK, etc.

So I'm wondering how best to debug this? I'm thinking to keep a terminal window open with top running as a start, but I'm wondering if there is somewhere I should be looking to see what ZFS is up to when this happens.


TL;DR highly recommend you try O3X 1.7.4.

I was happily running O3X (something pre-1.7.2) on Sierra with no major issues until I updated to High Sierra and O3X 1.7.2. After that I'd experience lockups, often with CPU fans spinning and, depending on when I caught it, the GUI might be responsive but disk access would lock up any app or shell I was using. I can't give you any quantitative figures on memory or CPU usage, but given the system fans were audible on my Mac mini I suspect CPU usage was elevated. I spent three months trying to figure out what was going and didn't ever suspect it might have been O3X. I found a thread on Apple support forums about many people having issues with High Sierra and had thought that I was another statistic. However, I think that was a red herring. Like you, I use CCC to clone my system disk every hour and I would tend to scrub about once a week. The more I exercised the pool, the sooner the system would lock up. Sometimes I went almost a week, others a couple of days -- depending on when I scrubbed or used the disk a lot.

I started suspecting it was O3X after reading about issues regarding 1.7.2 and other posts recommending that people with 1.7.2 issues try 1.7.4 due to significant improvements made since 1.7.2. I also knew that it probably wasn't High Sierra when I performed a fresh install of Sierra (not High) and O3X 1.7.2 and would experience the same hanging behavior mentioned in the previous paragraph.

I've been running Mojave with O3X 1.7.4. Uptime 9 days 20 hrs and counting, and I've been giving the pool a real workout. So far, so good!

-Jason
jdwhite
 
Posts: 11
Joined: Sat May 10, 2014 6:04 pm

Re: Deadlocks: anyone else seeing this with 1.7.2?

Postby Sharko » Tue Jan 22, 2019 1:08 pm

I thought that I had this licked when I upgraded to 1.8.2 a few weeks ago, since I haven't seen Carbon Copy Cloner crashes in a while. However, last night I woke up my El Capitan Mac Pro system and it was unresponsive after sleep - couldn't log back in. So... that got me motivated to set up a hardened SSH system so that I can try and SSH into the box on the local network if/when this happens again. I really want to see what 'sudo spindump' reports!
Sharko
 
Posts: 230
Joined: Thu May 12, 2016 12:19 pm


Return to General Help

Who is online

Users browsing this forum: No registered users and 14 guests