OpenZFS on OS X

by **dguisinger01** » Mon Jun 22, 2020 6:17 am

I set up a brand new raid this weekend.
14TB Toshiba Enterprise drives in a 8x14TB RaidZ-2 configuration in a Thunderbay 8 enclosure. Drives are in JBOD mode, I never installed SoftRAID.
The Thunderbay8 is connected via TB3 via a CalDigit dock to my MacBook Pro 16. All TB3 cabling is active cabling.
All brand new enclosure, drives, and cabling.

The Time Machine volume I setup was encrypting overnight, when I came back to my machine this morning ZetaWatch was going crazy about ZFS having detected errors. According to ZetaWatch, the errors were spread across all 8 drives leading me to think the drives momentarily
My understanding is at that point, all I/O should be blocked.......but the array sounded quite active.

I ran "zpool status -x" and got:
pool: thunderbay8
state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://zfsonlinux.org/msg/ZFS-8000-HC
scan: none requested
config:

NAME STATE READ WRITE CKSUM
thunderbay8 ONLINE 511 2.59K 0
raidz2-0 ONLINE 1.00K 22 0
media-1F05BCAF-68AB-114B-BD1A-BA1F22032F11 ONLINE 403 8 0
media-994DC03E-6EC1-DA40-96EA-B5A5CDD416AB ONLINE 400 8 0
media-93B9E6A5-033F-5C47-B353-D6B48E3974A3 ONLINE 413 8 0
media-EA30491F-D5B0-2A42-A916-1190CA1AE5C0 ONLINE 404 10 0
media-E71C76CC-FFE8-0F4E-AADB-D6CD251B3707 ONLINE 399 8 0
media-A31C3A93-C451-7948-B896-BB6C0817EBF6 ONLINE 408 8 0
media-734C3A59-F581-E546-8A08-C130BD886575 ONLINE 416 8 0
media-03C12506-E20C-CB44-9E9F-2BE6B24F8360 ONLINE 403 8 0

I then ran "zpool clear thunderbay8", all sound from the drives stopped, and then my console froze.
After a few minutes I tried restarting my Mac, but it just sat there, so I finally physically powered off my MacBook Pro and restarted.

Any ideas what is going on? This is my first experience with ZFS on OS X....

by **tangles** » Tue Jun 23, 2020 5:05 am

Bit weird that zpool status isn't telling you which disk is faulting…

That could mean the TB cable, power to the thunderbay is not great, or backplane failure might be in the mix (although rare)

I have had an old HP box behave perfectly running FreeNAS, but as soon as there's any disk significant activity to read data off the pool, it would just reboot…
Was caused by faulty power supply…

Suss all your power… don't use power boards… plug directly into the wall!

The TB cable… how long is it? it has to be under 400mm, 12inches if we're talking Thunderbolt 3 connectivity.
Reseat the TB cable… ensure it's connected firmly and ensure the weight of "something" such as the cable itself is not pulling on the port and creating tension in any way…
I really dislike USBc/TB3 physical connections… they feel flimsy at best!! and feel like they can be "flicked" out of the socket with very minimal unintentional effort…

Don't have anything else connected to your Mac other that its power. (just to test)

If all seems in order, let's move onto the disks...

You want to rule out hardware issues before we look at software issues…
So… These drives… are they SMC?
https://www.youtube.com/watch?v=aztTf2gI55k
Linus (and Serve The Home) mention they are a "no go" for ZFS.

With that out of the way.

Recreate your pool but zero out all your disks first… Warning! this is dangerous and so disconnect all other external disks you may have…

To do this, the command is:

Code: Select all: sudo dd if=/dev/zero of=/dev/diskn bs=1024 count=100

Explanation:
if = input file = zeros to be written to the disk
of = output file = diskn where n will most likely be 1 to 8. You need to check/verify this, so in terminal app type:

Code: Select all: diskutil list physical

to see your real/physical disks to be sure.
bs = block size
count = how many blocks to write zeros from the start of the disk.

We only want to wipe out the start of the disk where the partition map exists.

Now recreate your pool… zpool create thunderbay8 raidz2 disk1 disk2 disk3 disk4 disk5 disk6 disk7 disk8

(you can add all the yummies such as atime, compression, checksum etc later… right now, we just want to test the hardware…

as for casesensitivity=insensitive and normalization=formD, I'm not even sure what we should be opting for now when it comes to normalisation… FormD was for HFS and so not sure what to choose to mimic APFS atm… Mr Lundy?

Open a terminal windows and type zpool iostat -v thunderbay8 1 100000

This will show you exactly how much Input/Output (I/O) count and also megabyte read/write each disk and the single vdev and thus pool is achieving… (handy to know at any time!)

Now in another/new Terminal window:
install brew… https://brew.sh shows you how in the terminal which is a just a "one liner" command to install. (brew will become your friend!)

so that you can install iozone. (brew install iozone)

then… in the same/2nd Terminal window, change directory to your pool… (cd /Volumes/thunderbay8) and run time iozone -a, as per the wiki here… https://openzfsonosx.org/wiki/Performance

This will give your newly pool a good workout for a good couple minutes…

In a third Terminal window, periodically run zpool status to see the health of your thunderbay8 pool… zpool iostat in the first Terminal windows will also show you if you have issue if all the values drop to zero… (not nice to see)

If you get a repeat of the errors you saw initially… it's time to observe in the Terminal window that you're running zpool status to see if any disk has become "cactus"… if so… shut down open up the thunderbay and reseat your SATA connections…

Check power connections also and it wouldn't hurt to reseat them as well.

I always write (via sticker or grey lead) the Serial number on the disks in the array so that I can see them the the pool is up and running… this wa it helps to identify which disk (in future) is causing you grief.
this might work for you given your external disk are connected via SATA

Code: Select all: $ system_profiler SPSerialATADataType -detailLevel medium | grep Serial | sed -e 's/[\<\>\"\ ]//g' | -F':' '{print $2}'

boot up again, use Terminal to check the status of your pool and hopefully all is well…

If the same disk (hopefully only one!) is still missing and not resolving now, then shutdown and swap the disk to another a bay…

Boot up again and see if you can identify if it's the disk still… or whether it's the bay…

If it's the bay, you should have a 2nd disk that's now cactus and so it's time to contact the vends and return it under warranty…

If it' just the same disk that's cactus then it's also vendor time and test the disk in the array all by itself using ZFS and see if you get continual errors…

I have performed the above to identify a disk that is not healthy and did a RMA with Seagate… they replaced the disk for me no probs, but it took a couple weeks before I got a replacement (Australia) so it doesn't hurt to go buy another one anyway and keep as a cold spare… or use it as a single disk in other areas if you're keen…

See how you go and report back.

(fingers crossed for you)

by **dguisinger01** » Tue Jun 23, 2020 7:10 am

So I found out later in the afternoon that I had lost power overnight. I suppose a UPS is in order. Array seems to be working fine once again.
I'm using active cables. A 1m cable to the dock, and a 2m cable from the dock to the array.

Don't think "zpool clear" should have locked my terminal window and prevented me from rebooting.

by **tangles** » Thu Jun 25, 2020 5:50 am

am glad to hear.

Got a link of the active 2Mt cable you're using? assuming it's TB3?

Cheers,

OpenZFS on OS X

ZFS reported errors across all 8-disks, zpool clear froze

ZFS reported errors across all 8-disks, zpool clear froze

Re: ZFS reported errors across all 8-disks, zpool clear froz

Re: ZFS reported errors across all 8-disks, zpool clear froz

Re: ZFS reported errors across all 8-disks, zpool clear froz

Who is online