[CRITICAL] zfs pool in DEGRADED condition: how to solve it?

All your general support questions for OpenZFS on OS X.

[CRITICAL] zfs pool in DEGRADED condition: how to solve it?

Postby nixota » Sun Sep 29, 2024 1:17 pm

Hello Savvy Admins/Devs,

SYSTEM CONFIGS:
I'm having a very difficult situation with an open-zfs data pool (mirrored in two/two partitions, for a total of 4 discs) on an external NAS, with #4 size 3.5 inch mechanical hard disks (Seagate pro quality), driven by a macOS vs. 10.15 (Catalina), with ZFS vs. zfs-macOS-2.2.3-rc4.

PROBLEM:
After a first crash (with a few data errors) whose zpool output was (read the OpenZFS message at https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-HC/index.html), resulting in a 'REMOVED' state for the partition from the 2nd mirror (mirror-1, prt 1/2), and an overall 'SUSPENDED' state for the pool,

INITIAL SOLUTION ATTEMPT:
that got temporarily resolved with (CLI commands):

$ sudo zpool online [poolName] mirror-1
(reboot of the OS)
$ sudo zpool clear [poolName]

After another reboot, Initially the pool was 'ONLINE' and accessible, [SEE the attached image] and ZFS started a data resilvering process. But then, all of a sudden it seems that the resilvering process stops and all the I/O commands get hanging with no possibility to proceed further, resulting in a 'mirror-1' partition in 'REMOVED' state, and a final 'DEGRADED' status for the pool.

I tried more than 10 reboots +'online' the 'mirror-1' + 'clear' the pool and wait for the resilvering process to finish, but to no avail: all the times the pool is initially accessible but soon the resilvering process stops and the I/O hangs indefinitely.

The final resulting state of the pool is shown in the attached image.

SOLUTION?
I might (superficially) guess that one of the hard drives (the one tagged as 'mirror-1 media-etc') is a faulty unit and crashes unexpectedly during the resilvering process; probably should substitute that with a new hard disk and resilver it with the $ zpool replace command?

HELP
I kindly ask for your expertise and help, I'm unable to proceed and I'm worrying very much for the data. :shock:

Thank you in advance, with Best Regards
Attachments
pool-DEGRADED-status-resilvering-does-NOT-ent.png
output from two successive '$ zpool status' calls on the CLI
pool-DEGRADED-status-resilvering-does-NOT-ent.png (103.66 KiB) Viewed 3133 times
nixota
 
Posts: 11
Joined: Tue Apr 04, 2023 10:12 pm

Re: [CRITICAL] zfs pool in DEGRADED condition: how to solve

Postby tangles » Tue Oct 01, 2024 5:14 am

destroy the pool.
recreate pool
put your data back on.

If you have the resources, buy spinrite from Steve Gibson and run it over your disks first before recreating your setup.

if you don't have a backup, :o :evil: pull two disks so it can't resilver. backup your data, and then goto line 1. Re-attempt with other 2 disks if need be and off to line 1 also.

if you're still getting issues, ditch the chassis and direct connect the disks some other way. ← spend money if you have to.
something like https://www.akitio.com/adapters/thunder-sata-go may help
or
if you have access to a desktop PC, install Ubuntu with the option of using ZFS for the boot disk, this way you have zfs drivers installed. Connect up your disks and attempt to restore health this way. Don't update the pool though as you may make it unreadable back on macOS if you're unsure what you're doing.

At all times, never use USB. ever.

good luck.
tangles
 
Posts: 198
Joined: Tue Jun 17, 2014 6:54 am

Re: [CRITICAL] zfs pool in DEGRADED condition: how to solve

Postby nixota » Tue Oct 01, 2024 1:20 pm

Hello 'tangles'

thank you very much for your detailed assistance and suggestions. The situations looks dire, indeed!
I will take extreme care in understanding all the steps of countermeasure that you suggest.

PS. for the sake of general information: Seagate has a 'Data Recovery Plan' available for quality hard disks still covered by warranty. Luckily, I can rely on them for the things I cannot perform myself.

Let's hope for the best! Thanks again, Best Regards
nixota
 
Posts: 11
Joined: Tue Apr 04, 2023 10:12 pm

Re: [CRITICAL] zfs pool in DEGRADED condition: how to solve

Postby jawbroken » Wed Oct 02, 2024 6:34 am

I'm not an expert on recovery, but before destroying your pools and going to drastic measures, I'm not sure why you can't just try to replace the drive with another new drive.

e.g. https://serverfault.com/questions/70987 ... ther-fault
https://www.truenas.com/community/threa ... hat.45648/
jawbroken
 
Posts: 92
Joined: Wed Apr 01, 2015 4:46 am

Re: [CRITICAL] zfs pool in DEGRADED condition: how to solve

Postby nixota » Fri Oct 04, 2024 9:50 am

Hello again everybody,

sure I am following your suggestions. It's confirmed: the trouble was an HD that defaulted.

I already performed a thorough backup from the working mirror-0 and I will try to resilver the other mirror-1 with another HD in substitution of the faulty one.

Thanks again everyone for the precious help, with Best Regards
nixota
 
Posts: 11
Joined: Tue Apr 04, 2023 10:12 pm


Return to General Help

Who is online

Users browsing this forum: Google [Bot] and 9 guests