zfs send|rec mac-to-mac checksum mismatch

All your general support questions for OpenZFS on OS X.

zfs send|rec mac-to-mac checksum mismatch

Postby incumbent » Sun Oct 31, 2021 9:29 am

Something weird happened with my NVMe pool, and it reports errors now:

Code: Select all
130 % sudo zpool status -v
  pool: cascade
 state: ONLINE
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
config:

   NAME                                            STATE     READ WRITE CKSUM
   cascade                                         ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       media-ADA20CBC-0425-C240-92F0-B5561C66D520  ONLINE       0     0     0
       media-91F056E7-163E-CB45-9538-114FCEF1A78B  ONLINE       0     0     0
     mirror-1                                      ONLINE       0     0     0
       media-38C73A7D-AE0C-9248-A9D4-7FA67B3BBD0B  ONLINE       0     0     0
       media-08E70174-71F3-6E47-A87F-99020D3A2AB0  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>
        <metadata>:<0x101>
        cascade/people/zemory/crypt:<0x8315>
        cascade/av:<0x0>
        cascade/av:<0x106ac>
        /cascade/people/zemory/research/notes/.Spotlight-V100/Store-V2/C6280536-0099-4C5F-B4B9-AF33EDB1B352/live.0.indexArrays
        /cascade/people/zemory/research/.Spotlight-V100/Store-V2/4B83275F-E9BA-41D6-AC24-5BA35D911DEB/0.indexArrays
        cascade/people/zemory:<0x170085>


I have no idea what <metadata>:<0x0> means. I did a `clear` when it was in a fractured state, and it brought the pool back online.

I need to power down and move a pool from another workstation for local send/rec, but I decided to try it over ssh or netcat before resorting to rewiring my desktop, and it fails in either scenario:

Code: Select all
emory@DreamOn ~ % sudo nc -w 120 -l 8023 | sudo /usr/local/zfs/bin/zfs receive -F -v pond/cascade/people
receiving full stream of cascade/people@3daysago into pond/cascade/people@3daysago
cannot receive new filesystem stream: checksum mismatch


Code: Select all
130 % sudo zfs send -v -R cascade/people@snapnow | nc -w 20 dreamon 8023
full send of cascade/people@3daysago estimated size is 1.90M
//snip
full send of cascade/people/zemory@3daysago estimated size is 312G
send from @3daysago to cascade/people/zemory@yesterday estimated size is 497K
send from @yesterday to cascade/people/zemory@today estimated size is 624B
send from @today to cascade/people/zemory@snapnow estimated size is 124K
//snip
TIME        SENT   SNAPSHOT cascade/people@3daysago
12:19:11    901K   cascade/people@3daysago
warning: cannot send 'cascade/people@3daysago': signal received
warning: cannot send 'cascade/people@yesterday': Broken pipe
warning: cannot send 'cascade/people@today': Broken pipe
warning: cannot send 'cascade/people@snapnow': Broken pipe


I do a three day rotate; "today, yesterday, 3daysago" are my standard snapshot names. i made one called `snapnow` to use as my backup set.

I don't know if I can rollback from a snapshot before this pool had some sort of an event (possibly unplugged? all four drives are in the same enclosure and it's direct-attached to a mac Mini (OWC Express 4M2) but it's got an eGPU and it's possible I banged a cable while hot-plugging the display or something. I have now idea how to refer to

Code: Select all
        <metadata>:<0x0>
        <metadata>:<0x101>


to recover that data.

in the meantime i'm doing an `rsync -azvP /cascade /Volumes/rescueMe/` where rescueMe is a 4TB USB3 magnetic drive and it's going about as fast as you'd imagine, but i really don't want to haul this down from b2 again like when an enclosure zapped my pool :facepalm:
incumbent
 
Posts: 40
Joined: Mon Apr 25, 2016 8:52 am

Re: zfs send|rec mac-to-mac checksum mismatch

Postby jawbroken » Sun Oct 31, 2021 10:28 pm

I don't have any particular expertise here but I believe the <metadata> errors are in ZFS metadata itself. I found this discussion for OpenZFS on Linux that suggests running two scrubs might be required to after zpool clear to clear these error reports (for some reason it reports errors from from previous two error reports?).
jawbroken
 
Posts: 61
Joined: Wed Apr 01, 2015 4:46 am

Re: zfs send|rec mac-to-mac checksum mismatch

Postby incumbent » Mon Nov 01, 2021 5:15 am

whew okay thank you, i'll read that while i wait for rsync to finish. i've had errors like this before and zpool clear/reboot/scrub made it vanish but i never understood what or why. i'm up to just over 1TB written so far. i should have plugged in two and zpool'ed it but i was in a hurty!
incumbent
 
Posts: 40
Joined: Mon Apr 25, 2016 8:52 am

Re: zfs send|rec mac-to-mac checksum mismatch

Postby incumbent » Wed Nov 03, 2021 9:33 am

My mini has had a couple of kernel panics,
Code: Select all
panic(cpu 2 caller 0xffffff801a2c5b68): m_free: freeing an already freed mbuf @uipc_mbuf.c:4817
is how they start and it's probably related to my eGPU.

But regardless, after a reboot I did a scrub (which was delightfully fast) and everything is copasetic. I'm writing a script to handle backing up my zpool to a single-drive pools. My off-site backups are done with Arq to Wasabi and B2 but I had to wait for B2 a lot when a pool got zapped by an enclosure. It was probably half of my waking time for three days and I've got Arq backing up the pool to another external drive until I get something more elegant in place.

I was pretty nervous about those metadata faults, I'm glad I created the pool properly to recover. thanks for the encouraging words, I was filled with dread.
incumbent
 
Posts: 40
Joined: Mon Apr 25, 2016 8:52 am


Return to General Help

Who is online

Users browsing this forum: Google [Bot] and 24 guests