kernel panic on zpool import

Developer discussions.

kernel panic on zpool import

Postby LATB » Tue Mar 01, 2016 4:35 pm

Hi -- hope this is the right place to report this issue:

Suddenly today my macbook crashed with a kernel panic, and it would not come back but crash after each reboot.

I tracked it down to crashing in zpool during the startup sequence. So I disabled the org.openzfsonosx.zpool-import-all.plist and the laptop runs again.

On my macbook I'm running the official 1.4.5 release, and have (amongst others) my home directory mounted as a zfs file system.

Running zpool -a by hand immediately crashed the machine again, reproducibly. I connected an external backup of this pool (which however is a couple of days old), and could zpool import that one without problems.

So, something is "broken" in that pool, making zpool import panic the kernel.

The traceback shows
panic(cpu 0 caller 0xffffff7f95920681): "VERIFY3(" "0" " " "==" " " "zap_remove_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)" ") " "failed (" "0" " " "==" " " "2" ")\n"@zfs_dir.c:763

Any help how to get this zpool back would be welcome. If tracebacks or debug info is needed I can provide that.

Thanks a lot and best regards, LatB
LATB
 
Posts: 7
Joined: Tue Jan 05, 2016 12:01 pm

Re: kernel panic on zpool import

Postby lundman » Wed Mar 02, 2016 5:03 pm

You can import the pool read-only, as unlinked-drain is not processed then. You might have to build a version without that VERIFY to let it pass the bad objected on your unlinked-drain list, unclear how it made it there in the first place. This is not a place that uses zfs_recover unfortunately.
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: kernel panic on zpool import

Postby LATB » Wed Mar 02, 2016 7:23 pm

Thank you for the reply!

I installed from sources and interestingly enough didn't kernel panic anymore on import -- but did so after "a while" of running (5-10 mins). And always panicked when exporting.

Fortunately I could send/receive most of the zfs filesystems to a backup pool (external USB), but panicked when sending 2 specific ones (reproducibly).

Fortunately had recent backups of those. I can only recommend the sanoid/syncoid tools from Jim Salter at sanoid.net !!

Anyway, I tried the read-only import, but again panicked when exporting. Destroy didn't work either.

All panics on the same line 764 (in the latest version) in zfs_dir.c

Eventually I erased the partition and created a new pool.

BTW, no errors seen anywhere, always 0 error in scrubbing etc.

I guess I'm back in business, but now very worried that something bad can happen anytime.

Cheers, LatB
LATB
 
Posts: 7
Joined: Tue Jan 05, 2016 12:01 pm

Re: kernel panic on zpool import

Postby lundman » Thu Mar 03, 2016 4:31 am

Earlier versions of o3x would place things on unlinked-drain list, but never process them, and depending on the size of this list, it can happen at import, a few minutes after. But always on export. As the elements are loaded in from the list, but not released until the system needs more room (or when exported and everything zfs is released).

It is a known issue "from before" days, and we could have fixed it without redoing the pool, but since you already have, that is one way too. The newer versions of o3x cleans unlinked-drain list properly, and rarely places items on it.

After import, it will even tell you the size of the list, if you grep the /var/log/system.log file, you can see - more for statistics, but you can confirm the list is rarely used with new version.
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: kernel panic on zpool import

Postby LATB » Thu Mar 03, 2016 12:07 pm

Hmm, unfortunately its getting worse, and not only did re-creating the pool NOT fix it, but now also my backup pool panics.

Before re-creating the affected pool on my laptop I had zfs sent/received the zfs filesystems from the bad pool over to an existing pool on an external drive.
Then I sent/received the file systems back on the re-created pool.

For 2 of the zfs file systems this didn't work (panic when sending/receiving) -- those I copied from an older backup elsewhere.

Bringing everything back up then allowed to run for a while, but now the same kernel panics are back.

And worse, now I also get kernel panics when zpool exporting the backup pool, tool!!

So, is this linked drain list something that "comes" with the zfs filesystem and "infected" the backup disk pool when I sent/received the laptop filesystems?

Is there a way to fix or am I loosing both the laptop pool and the backup disk pool?

I searched for "drain" in system.log and see things like
ZFS: unlinked drain completed (9635)
SPL: Warning: ZFS replay transaction error 2, dataset zs/etc

Help is much appreciated! LatB
LATB
 
Posts: 7
Joined: Tue Jan 05, 2016 12:01 pm

Re: kernel panic on zpool import

Postby lundman » Thu Mar 03, 2016 4:17 pm

Sounds like you are able to compile your own zfs? I can make a branch for you to allow unlinked_drain remove to fail (which is what seems to be happening here).
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: kernel panic on zpool import

Postby LATB » Fri Mar 04, 2016 1:09 pm

that would be great! (or should I just disable the VERIFY3U and recompile?)

Will this likely result in a fully working file system/pool again? Otherwise I would rather re-create everything from backup.

Thanks again for the help with this.

Cheers, LatB
LATB
 
Posts: 7
Joined: Tue Jan 05, 2016 12:01 pm

Re: kernel panic on zpool import

Postby lundman » Sun Mar 06, 2016 4:44 pm

OK, there is a branch called "latb" for you, and essentially it is removing the VERIFY part (which panics on error), but replacing with zfs_panic_recover().

So after you load the zfs.kext (you don't need to change spl), you set the recover sysctl to 1, then import the pool
Code: Select all
sysctl -w kstat.zfs.darwin.tunable.zfs_recover=1


which changes the panic to a print. In particular, this one:

Code: Select all
   if (zap_remove_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx) != 0) {
      zfs_panic_recover("zfs: zfs_rmnode(zap_remove_int(z_unlinkedobj, %llu", zp->z_id);
   }
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: kernel panic on zpool import

Postby diekhans » Wed Mar 23, 2016 1:53 pm

If it's helpful, I appear to be having the same problem. It started after upgrading to 10.11.4. It only panics if zpool imports during boot
If I disconnect the drives, boot, and then connect them, all is well. See:

https://openzfsonosx.org/forum/viewtopic.php?f=26&t=2756
diekhans
 
Posts: 6
Joined: Tue Mar 22, 2016 10:56 pm

Re: kernel panic on zpool import

Postby warwickt » Tue Jun 25, 2019 1:23 am

Hi Guys, thanks for the advice.

Same issue when MacOS 10.12.6 kernel panic whilst performing zfs send -L -c -v -I zfsfilesystem01/dataset@snap001 zfsfilesystem01/dataset@snapnnn | pv | ssh warwick@macmini-08 /usr/local/bin/zfs receive -v remotefilesystem@dataset_mirror

was: openzfsonosx = V1.8.1, ARC limited to 8GB (of 16GB) on failing system.

What worked for me:

    1)re-installed MacOS 10.12.6 Sierra from Recovery Partition . No success with boot in SAFE MODE to clear caches
    2) disabled SIP (csrutil disable in recovery partition)
    3) Restarted failed macmini without external thunderbolt zpool devices attached
    4) plugged offending disk enclosure Thunderbolt back into macmini
    5) zpool import -o readonly=on zfsfilesystem01
    6) zpool unmount -a
    7) zpool export zfsfilesystem01 | zpool export -a
    8) upgraded to OpenZFSonOSX v1.8.1 to OpenZFSonOSX 1.9.0
    9) /etc/zsysctl.conf: kstat.zfs.darwin.tunable.zfs_recover=1 | sysctl kstat.zfs.darwin.tunable.zfs_recover=1
    10) restarted (without attached disk enclosure just to test out)
    11) plugged enclosure back into macmini
    12) zpool import zfsfilesystem01
Diagnostics / Observations:
    - was able to successfully zpool import zfsfilesystem01 on Freebsd 11.2 system wth no errors at all
    - had same symptom of macOS Kernal panics on other macminis (MAcOS with openzfsonosx @ 1.8.1 )
    - also had same VERIFY3 failures in dumps of KP at openzfsonosx V1.8.1

So far looks stable again.

Running a scrub on this zfs pool zfsfilesystem01... probably wont have any affect...

Thanks again Lads

Warwickt
Hong Kong
warwickt
 
Posts: 2
Joined: Tue Jun 25, 2019 12:39 am

Next

Return to OpenZFS on OS X Development

Who is online

Users browsing this forum: No registered users and 10 guests