System-wide I/O hang - how to debug?

All your general support questions for OpenZFS on OS X.

System-wide I/O hang - how to debug?

Postby athompson » Tue Dec 12, 2017 11:39 am

Accessing a ZFS pool (single-disk) under High Sierra (10.13.1) with O3X 1.7.0, I'm seeing a symptom where the pool mounts, I can cd(1) and ls(1) around without difficulty, but anything more than trivial activity causes the I/O subsystem to partly lock up.
The first thing I was trying to do was chown -R the entire volume, and that hung. Running "find ." from the top-level pool also hangs after about a screenful of activity.
Doing the same things under ZoL on CentOS does *not* result in a hang.
Scrub completes successfully without errors.
Nothing shows up in the system log at all when I/O freezes... possibly because I/O is partially frozen?
The drive is on a USB-to-SATA bridge, since this is a MacBookPro14,3 - Innostor 0x1f75 0x0611 by StarTech, also aka https://usb-ids.gowdy.us/read/UD/1f75/0611).
Hot-unplugging the drive does *nothing* - it doesn't even interrupt the in-flight I/O!
The only odd thing I can think of is that I still had (I think) v1.6.x installed under Sierra (10.12.x) when I upgraded to High Sierra, but I did install 1.7.0 over top *before* reconnecting the pool.
After the operation on the ZFS pool hangs in an uninterruptible sleep, all I/O on the system seems kind screwed - I can't even open a Finder window, although all other running apps seem to be OK until they need to hit the disk. *Some* I/O to the onboard NVMe disk seems to succeed, though, so ... ?!?

I don't even know how to start troubleshooting this on my Mac - no error messages means I've got no leads to chase down.

Any ideas?

Thanks,
-Adam Thompson
athompson
 
Posts: 4
Joined: Tue Dec 12, 2017 10:10 am

Re: System-wide I/O hang - how to debug?

Postby Brendon » Tue Dec 12, 2017 12:11 pm

Hi,

Your instincts are probably correct, it sounds as though the system is deadlocking which will of course kill most/all IO in the machine.

It maybe a hardware or software or both issue.

You can assist in debugging this by:

1. Recreate the hang condition
2. "spindump" - per https://openzfsonosx.org/wiki/Getting_involved
3. Submit results for analysis

Cheers
Brendon
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Re: System-wide I/O hang - how to debug?

Postby e8vww » Sat Dec 16, 2017 3:28 pm

athompson wrote:Accessing a ZFS pool (single-disk) under High Sierra (10.13.1) with O3X 1.7.0, I'm seeing a symptom where the pool mounts, I can cd(1) and ls(1) around without difficulty, but anything more than trivial activity causes the I/O subsystem to partly lock up.
The first thing I was trying to do was chown -R the entire volume, and that hung. Running "find ." from the top-level pool also hangs after about a screenful of activity.
Doing the same things under ZoL on CentOS does *not* result in a hang.
Scrub completes successfully without errors.
Nothing shows up in the system log at all when I/O freezes... possibly because I/O is partially frozen?
The drive is on a USB-to-SATA bridge, since this is a MacBookPro14,3 - Innostor 0x1f75 0x0611 by StarTech, also aka https://usb-ids.gowdy.us/read/UD/1f75/0611).
Hot-unplugging the drive does *nothing* - it doesn't even interrupt the in-flight I/O!
The only odd thing I can think of is that I still had (I think) v1.6.x installed under Sierra (10.12.x) when I upgraded to High Sierra, but I did install 1.7.0 over top *before* reconnecting the pool.
After the operation on the ZFS pool hangs in an uninterruptible sleep, all I/O on the system seems kind screwed - I can't even open a Finder window, although all other running apps seem to be OK until they need to hit the disk. *Some* I/O to the onboard NVMe disk seems to succeed, though, so ... ?!?

I don't even know how to start troubleshooting this on my Mac - no error messages means I've got no leads to chase down.

Any ideas?

Thanks,
-Adam Thompson


Can't use open zfs on osx over usb. I had the same question: viewtopic.php?f=11&t=3106

Should be in pairs over sata -- surprised this was not in the FAQ, I had to find out the hard way too.
e8vww
 
Posts: 51
Joined: Fri Nov 24, 2017 2:06 pm

Re: System-wide I/O hang - how to debug?

Postby Brendon » Mon Dec 18, 2017 12:13 am

You most certainly can use ZFS on USB. The results might be a little variable due to hardware, but I have two pools on two disks on USB as well as my thunderbolt ones.

Cheers
Brendon
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Re: System-wide I/O hang - how to debug?

Postby athompson » Mon Dec 18, 2017 7:45 am

Well, USB3.0/3.1, (with UASP, although I don't know if MacOS uses it?) certainly makes things quite a lot more reliable than USB2.0.
But yeah, i've run entire RAID arrays off USB before on multiple platforms including macOS: it "just works" assuming you have half-decent USB-to-SATA adapters and good-quality cables.

On the other hand, a bad USB cable or a bad USB-SATA bridge will just leave you tearing your hair out.
-Adam
athompson
 
Posts: 4
Joined: Tue Dec 12, 2017 10:10 am

Re: System-wide I/O hang - how to debug?

Postby athompson » Mon Dec 18, 2017 7:48 am

athompson wrote:Accessing a ZFS pool (single-disk) under High Sierra (10.13.1) with O3X 1.7.0, I'm seeing a symptom where the pool mounts, I can cd(1) and ls(1) around without difficulty, but anything more than trivial activity causes the I/O subsystem to partly lock up.


Turned out to be a corrupt ZFS filesystem combined with a disk that's starting to show bad sectors. Using ZoL I was able to painstakingly narrow down the offending files to a single subdirectory by performing I/O like chown/chmod until Linux either hung or panicked. Deleted the entire subdirectory and - voila - O3X no longer has problems with it, either. Scrubs did *not* reveal the damage, fyi.

-Adam
athompson
 
Posts: 4
Joined: Tue Dec 12, 2017 10:10 am


Return to General Help

Who is online

Users browsing this forum: No registered users and 27 guests