OpenZFS on OS X

Posted: **Mon Jun 17, 2019 8:03 pm**

I have a 3 TB ZFS mirror using zfs 1.9.0 under 10.14.5 and I'm rsyncing files to it and along with running a virtual machine. After a while, I/O hangs. What's going on?

Posted: **Tue Jun 18, 2019 4:45 am**

Not much by the sounds of it…

What's going on? who knows…
You switched it off? you closed your eyes?
Sorry for taking the piss… but mate… we're not mind readers.

Please provide a bit more info, here's some suggestions…

Hardware description of Mac and zpool connectivity.
Output of:
zpool status and zpool list so we can see how your pool is setup and what state it's in.
zfs get all on <dataset in question>
zpool iostat -v 1 600 while running rsync to see if any vdev has poor I/O.

By providing the above, the community will have a better chance to help you.

Cheers,

Posted: **Tue Jun 18, 2019 6:47 am**

Trash can Mac with a JMicron-based 2-bay disk enclosure connected via USB 3.

Code: Select all: pool: externalhd state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details. scan: scrub in progress since Tue Jun 18 10:26:29 2019 826G scanned at 810M/s, 26.0G issued at 25.5M/s, 2.41T total 0 repaired, 1.05% done, 1 days 03:12:32 to go config: NAME STATE READ WRITE CKSUM externalhd ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 media-A1232949-F65F-A64B-B241-D4DBBA49E0A0 ONLINE 0 0 0 media-448C5ECE-160D-7446-BF08-3BED3AE018B5 ONLINE 0 0 0 errors: No known data errors

As you can see, I'm running a scrub.

Code: Select all: NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT externalhd 2.72T 2.41T 319G - - 16% 88% 1.00x ONLINE -

I saw this message on trying to reboot after it hung

Code: Select all: Warning: Pool 'externalhd' has encountered an uncorrectable I/O failure and has been suspended.

Bad USB cable? Bad controller electronics?

Posted: **Tue Jun 18, 2019 7:14 am**

I am separately running a Mac program Drive Genius which is supposed to monitor disk health. It's now reporting that one of the two disks, a Toshiba DT01ACA300, has a significant number of damaged areas. This implies the disk is very sick, but the scrub is still running and not detecting any errors. How are the errors apparent to it, but not zfs? Why should the errors of one disk hang the whole pool?

Posted: **Tue Jun 18, 2019 7:36 am**

mauricev wrote:...one of the two disks, a Toshiba DT01ACA300, has a significant number of damaged areas. This implies the disk is very sick, but the scrub is still running and not detecting any errors. How are the errors apparent to it, but not zfs?

The point of view from the disk and the filesystem can be very different. Drives can attempt to remap bad blocks, etc. I think the more relevant piece is your scrub is only ~1% complete, so it's premature to think that zfs won't find any errors. I don't think there is much point in waiting around to see if it does, though.

maurice wrote:Why should the errors of one disk hang the whole pool?

It doesn't take many resets and / or timeouts to send drive performance right into the drain, and I suspect you're getting a LOT of them. Since you have a mirror, I would stop the scrub, pull the drive, put in a replacement, resilver, and carry on.

Posted: **Tue Jun 18, 2019 3:47 pm**

"an uncorrectable I/O failure and has been suspended. "

ZFS detected the disk more or less vanished, and was forced to give up - you will not get more data from ZFS after that. You can issue "zpool clear pool" and "zpool clear pool device" to ask it to retry talking to the disk, but it seems likely the disk will glitch again.

Posted: **Thu Jun 20, 2019 7:13 am**

I replaced the disk and the pool seems to be working normally.

Posted: **Tue Mar 10, 2020 12:07 am**

Okay I had a similar but not identical experience (on 1.9.4), and I'd like to get some advice. I'm on an old Macbook Pro running High Sierra with Firewire but a dead Thunderbolt port (and only USB2), so I'm using FW800 for storage (slow but mostly adequate).

I've been using a mirrored pool for years and years now, but it's full and I'm moving on. I got a second-hand 4-drive bay, configured it to JBOD, loaded it with four brand spanking new Toshiba P300 3TB drives, and created a RaidZ2 pool on them. Then I proceeded to rsync everything over from one drive to the other. At ~20MB/s average transfer, it could be far better but whatev, I'm on old hardware and copying from FW800 to FW800 over a single connection.

However, today I woke up to see that the rsync was hanging. It was still showing "36% 17.76MB/s 0:04:55" for the progress on the last file, but looking at where the copy was when I went to bed, it seemes to have been showing that for hours. Also, I waited around and nothing happened. I could browse the old pool just fine, but when I navigated to the new zfs filesystem I was copying to, Finder hung. I restarted Finder with option-command-esc, opened a terminal and tried to look at the new filesystem. Bash hung too.

I tried restarting the computer but it hung in restart. I swallowed and just powered down the machine. On restarting, I could successfully mount both pools, and everything copied to the new pool up until the time of the freeze seems okay. ZFS sees nothing wrong with the pools.

Code: Select all: bash-3.2$ zfs --version zfs-1.9.4-0 zfs-kmod-1.9.4-0 bash-3.2$ zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT Borogove 10.9T 997G 9.90T - - 0% 8% 1.00x ONLINE - Jabberwocky 1.36T 1.25T 116G - - 7% 91% 1.00x ONLINE - bash-3.2$ zpool status -v Borogove pool: Borogove state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM Borogove ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 media-A51B7409-55CB-F843-8965-056CB0EA2580 ONLINE 0 0 0 media-E237643B-EFEF-3040-8338-50A9B97DC6CE ONLINE 0 0 0 media-B446407F-226D-3F42-BF90-45B23938F8EB ONLINE 0 0 0 media-671B767A-A3B1-7B45-8B72-A0AA31AD6726 ONLINE 0 0 0 errors: No known data errors bash-3.2$ zpool status -v Jabberwocky pool: Jabberwocky state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(5) for details. scan: none requested config: NAME STATE READ WRITE CKSUM Jabberwocky ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 media-1878AB63-8C37-C543-9B55-62325B92CF75 ONLINE 0 0 0 media-DA0E5735-D45E-2946-90EC-BE3B8E1BA013 ONLINE 0 0 0 errors: No known data errors

I tried looking at log entries, but found nothing that could help me out what happened. I checked power settings and the machine shouldn't have went to sleep or tried to turn off hard drives.

Any idea where to look for pointers to the cause of the issue? Also, if any of the drives are faulty, I'd be really happy to take it back, but I'd have to know which one... I don't know what kind of drive diag works over FW800, Drive Utility says SMART isn't supported on these drives.

Posted: **Tue Mar 10, 2020 2:55 am**

As for SMART diagnostic, you might try SAT SMART Driver (https://binaryfruit.com/drivedx/usb-drive-support), it might read SMART data on your external drive.
P.S.: I've never tried it with FW...

Posted: **Tue Mar 10, 2020 3:05 am**

Okay I have something. It's flipping me out honestly.

Code: Select all: bash-3.2$ zpool status -v Borogove pool: Borogove state: UNAVAIL status: One or more devices are faulted in response to IO failures. action: Make sure the affected devices are connected, then run 'zpool clear'. see: http://zfsonlinux.org/msg/ZFS-8000-HC scan: none requested config: NAME STATE READ WRITE CKSUM Borogove UNAVAIL 0 0 0 insufficient replicas raidz2-0 UNAVAIL 2 32 0 insufficient replicas media-A51B7409-55CB-F843-8965-056CB0EA2580 REMOVED 0 0 0 media-E237643B-EFEF-3040-8338-50A9B97DC6CE REMOVED 0 0 0 media-B446407F-226D-3F42-BF90-45B23938F8EB REMOVED 0 0 0 media-671B767A-A3B1-7B45-8B72-A0AA31AD6726 REMOVED 0 0 0 errors: List of errors unavailable (insufficient privileges)

Well the next time it happens I'll make sure to run it with sudo, unfortunately I tried zpool clear first, and things froze up. It's weird that it says "REMOVED". Makes me wonder if it's a connectivity thing. Now the weird thing is that the old pool is daisy chained from the new box, and that was still up, so it's not a bus-wide issue.

I'm pissed because I'm not prepared to drop a shitload of money on a new file server, or a new Mac AND a Thunderbolt 3 enclosure, and I got the Firewire enclosure (OWC Mercury Elite Pro Qx2) second hand, as these aren't being made anymore. At this point I'm praying to god that one of the hard drives be faulty, and not the enclosure.

I started a scrub, we'll see what's up with that, but of course that only checks the media with data on it.

ps. BTW, strangely it would seem that once again the transfer froze up around the time the display blanked. Now at first I thought it might be a power management thing, but again power management is set to don't turn anything off ever. Now I even turned off display blanking. It would be nice if I could see a log of Firewire subsystem messages, like on Linux, but I haven't found any such thing so far.

OpenZFS on OS X

hanging on 1.9.0

hanging on 1.9.0

Re: hanging on 1.9.0

Re: hanging on 1.9.0

Re: hanging on 1.9.0

Re: hanging on 1.9.0

Re: hanging on 1.9.0

Re: hanging on 1.9.0

Re: hanging on 1.9.0

Re: hanging on 1.9.0

Re: hanging on 1.9.0