Sharko wrote:OK, where was I...
The fans aren't going full tilt, and none of the CPU cores appear to be pegged. I can't swear to this, but I think I had the memory tab open in Activity Monitor once while this was happening, and memory usage was low (i.e. normal, I have 32GB of ECC ram).
When I check Console, I find nothing interesting in the logs at the time of the deadlock. Maybe the file system is wedged so tight it can't write to a log? I've run Disk Utility's First Aid on the two HFS+ backup partitions, and both come back clean. I've run DriveDX to check the backup disk's SMART data, and that all comes back clean: no uncorrectable errors, no pending sector counts, no offline uncorrectable Sector counts, raw read error rate is green and OK, etc.
So I'm wondering how best to debug this? I'm thinking to keep a terminal window open with top running as a start, but I'm wondering if there is somewhere I should be looking to see what ZFS is up to when this happens.
TL;DR highly recommend you try O3X 1.7.4.
I was happily running O3X (something pre-1.7.2) on Sierra with no major issues until I updated to High Sierra and O3X 1.7.2. After that I'd experience lockups, often with CPU fans spinning and, depending on when I caught it, the GUI might be responsive but disk access would lock up any app or shell I was using. I can't give you any quantitative figures on memory or CPU usage, but given the system fans were audible on my Mac mini I suspect CPU usage was elevated. I spent three months trying to figure out what was going and didn't ever suspect it might have been O3X. I found a thread on Apple support forums about many people having issues with High Sierra and had thought that I was another statistic. However, I think that was a red herring. Like you, I use CCC to clone my system disk every hour and I would tend to scrub about once a week. The more I exercised the pool, the sooner the system would lock up. Sometimes I went almost a week, others a couple of days -- depending on when I scrubbed or used the disk a lot.
I started suspecting it was O3X after reading about issues regarding 1.7.2 and other posts recommending that people with 1.7.2 issues try 1.7.4 due to significant improvements made since 1.7.2. I also knew that it probably wasn't High Sierra when I performed a fresh install of Sierra (not High) and O3X 1.7.2 and would experience the same hanging behavior mentioned in the previous paragraph.
I've been running Mojave with O3X 1.7.4. Uptime 9 days 20 hrs and counting, and I've been giving the pool a real workout. So far, so good!
-Jason