After working for 6 months, OpenZFS 1.8.2 now KP's on boot

All your general support questions for OpenZFS on OS X.

After working for 6 months, OpenZFS 1.8.2 now KP's on boot

Postby Wowfunhappy » Tue Apr 23, 2019 12:58 pm

I have been using OpenZFS 1.8.2 on macOS High Sierra since November. It has been nothing but stable and trouble free!

Yesterday evening, I decided to turn off my computer for about an hour to avoid distractions. I did not do anything special before shutting down—I haven't installed any software updates recently, or done anything special with my pool.

When I booted my computer back up, it kernel panicked maybe 30 seconds after logging in (I now suspect this is when OpenZFS tried to mount my dataset). Same with the next five reboots.

A boot into Safe Mode and a look at the log indicated OpenZFS as the likely culprit, so I uninstalled it, and my computer worked again! I tried reinstalling OpenZFS in case it was some weird fluke, but my computer kernel panic'd immediately after the install was completed.

Downgrading OpenZFS to 1.7.2 fixed the problem.

A panic log is below, does this provide any insight into what happened? I should probably mention I'm on a Hackintosh, but I don't *think* that was the problem...

Code: Select all
Anonymous UUID:       A9969654-7837-D6AD-6B7C-1BFD8AF13863

Tue Apr 23 00:15:13 2019

*** Panic Report ***
panic(cpu 0 caller 0xffffff801f188f6f): Kernel trap at 0xffffff7fa15c5702, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000000000000170, CR3: 0x0000000036225000, CR4: 0x00000000003627e0
RAX: 0x0000000000000000, RBX: 0x0000000000000000, RCX: 0x0000000000000000, RDX: 0x0000000000000000
RSP: 0xffffffa438f53db0, RBP: 0xffffffa438f53e20, RSI: 0x0000000000000000, RDI: 0xffffff83dd65d000
R8:  0xffffff7fa16c3c30, R9:  0xffffff83dd64f000, R10: 0x0000000000000000, R11: 0xffffff801f9ebc80
R12: 0x0000000000000000, R13: 0x0000000000000000, R14: 0xffffff83dd65d000, R15: 0xffffff83db5a4000
RFL: 0x0000000000010202, RIP: 0xffffff7fa15c5702, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x0000000000000170, Error code: 0x0000000000000000, Fault CPU: 0x0, PL: 0, VF: 1

Backtrace (CPU 0), Frame : Return Address
0xffffffa438f53880 : 0xffffff801f06e1c6
0xffffffa438f538d0 : 0xffffff801f196a74
0xffffffa438f53910 : 0xffffff801f188d44
0xffffffa438f53980 : 0xffffff801f0201e0
0xffffffa438f539a0 : 0xffffff801f06dc3c
0xffffffa438f53ad0 : 0xffffff801f06d9fc
0xffffffa438f53b30 : 0xffffff801f188f6f
0xffffffa438f53ca0 : 0xffffff801f0201e0
0xffffffa438f53cc0 : 0xffffff7fa15c5702
0xffffffa438f53e20 : 0xffffff7fa15c5697
0xffffffa438f53ea0 : 0xffffff7fa15c5697
0xffffffa438f53f20 : 0xffffff7fa15bd493
0xffffffa438f53f50 : 0xffffff7fa15b92ef
0xffffffa438f53fa0 : 0xffffff801f01f557
      Kernel Extensions in backtrace:
         net.lundman.zfs(1.7.2)[0F708776-FDC2-39C5-87CE-42CFF3C5DD48]@0xffffff7fa1553000->0xffffff7fa1824fff
            dependency: com.apple.iokit.IOStorageFamily(2.1)[F27A8A2A-6662-3608-83BD-415037509E01]@0xffffff7fa1523000
            dependency: net.lundman.spl(1.7.2)[0AB91572-CACF-39DF-86B3-116FF8CDCB8E]@0xffffff7fa032e000


Thanks! For now, I'm happily hanging out on 1.7.2! 8)
Last edited by Wowfunhappy on Thu Apr 25, 2019 6:51 am, edited 1 time in total.
Wowfunhappy
 
Posts: 14
Joined: Sat Jul 21, 2018 11:58 am

Re: After working for 6 months, OpenZFS 1.8.2 now KP's on bo

Postby lundman » Tue Apr 23, 2019 9:10 pm

That 1.7.2 works for you is great, gives us a little breathing room. One thing that would help us, is if you could turn on keepsyms, so instead of addresses, we get function names. Detailed instructions are in wiki, but essentially you just add "keepsyms=1" to the nvram boot-args.

Another option to take is to disable the automatic import on boot (handled by the zpool-autoimport.sh run by launchctl - remove the plist, or comment out import line, or start the script with "exit 0").

Then you can try various kinds of imports. However, if you are ok to keep using 1.7.2 that is fine. We have a 1.9.0 RC coming out real soon that you could test and make sure it doesn't panic though.
User avatar
lundman
 
Posts: 590
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: After working for 6 months, OpenZFS 1.8.2 now KP's on bo

Postby Wowfunhappy » Sat May 18, 2019 9:36 pm

Okay, so the bad news (for me) is that the issue began happening on 1.7.2. The good news is, this gave me an opportunity to grab a panic log with keep symbols enabled! :D Note: this was generated with 1.7.2, not the latest.

Code: Select all
Anonymous UUID:       A9969654-7837-D6AD-6B7C-1BFD8AF13863

Sun May 19 01:14:59 2019

*** Panic Report ***
panic(cpu 2 caller 0xffffff801b988f6f): Kernel trap at 0xffffff7f9ddc5702, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000000000000170, CR3: 0x0000000032a26000, CR4: 0x00000000003627e0
RAX: 0x0000000000000000, RBX: 0x0000000000000000, RCX: 0x0000000000000000, RDX: 0x0000000000000000
RSP: 0xffffffa436a23db0, RBP: 0xffffffa436a23e20, RSI: 0x0000000000000000, RDI: 0xffffff83d9dd7000
R8:  0xffffff7f9dec3c30, R9:  0xffffff83d9dc9000, R10: 0x0000000000000000, R11: 0xffffff801c1ebc80
R12: 0x0000000000000000, R13: 0x0000000000000000, R14: 0xffffff83d9dd7000, R15: 0xffffff83d9dc2000
RFL: 0x0000000000010202, RIP: 0xffffff7f9ddc5702, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x0000000000000170, Error code: 0x0000000000000000, Fault CPU: 0x2, PL: 0, VF: 1

Backtrace (CPU 2), Frame : Return Address
0xffffffa436a23880 : 0xffffff801b86e1c6 mach_kernel : _handle_debugger_trap + 0x4c6
0xffffffa436a238d0 : 0xffffff801b996a74 mach_kernel : _kdp_i386_trap + 0x114
0xffffffa436a23910 : 0xffffff801b988d44 mach_kernel : _kernel_trap + 0x4e4
0xffffffa436a23980 : 0xffffff801b8201e0 mach_kernel : _return_from_trap + 0xe0
0xffffffa436a239a0 : 0xffffff801b86dc3c mach_kernel : _panic_trap_to_debugger + 0x21c
0xffffffa436a23ad0 : 0xffffff801b86d9fc mach_kernel : _panic + 0x5c
0xffffffa436a23b30 : 0xffffff801b988f6f mach_kernel : _kernel_trap + 0x70f
0xffffffa436a23ca0 : 0xffffff801b8201e0 mach_kernel : _return_from_trap + 0xe0
0xffffffa436a23cc0 : 0xffffff7f9ddc5702 net.lundman.zfs : _vdev_dtl_reassess + 0xb6
0xffffffa436a23e20 : 0xffffff7f9ddc5697 net.lundman.zfs : _vdev_dtl_reassess + 0x4b
0xffffffa436a23ea0 : 0xffffff7f9ddc5697 net.lundman.zfs : _vdev_dtl_reassess + 0x4b
0xffffffa436a23f20 : 0xffffff7f9ddbd493 net.lundman.zfs : _spa_vdev_state_exit + 0x3e
0xffffffa436a23f50 : 0xffffff7f9ddb92ef net.lundman.zfs : _spa_async_thread + 0x14e
0xffffffa436a23fa0 : 0xffffff801b81f557 mach_kernel : _call_continuation + 0x17
      Kernel Extensions in backtrace:
         net.lundman.zfs(1.7.2)[0F708776-FDC2-39C5-87CE-42CFF3C5DD48]@0xffffff7f9dd53000->0xffffff7f9e024fff
            dependency: com.apple.iokit.IOStorageFamily(2.1)[F27A8A2A-6662-3608-83BD-415037509E01]@0xffffff7f9dd23000
            dependency: net.lundman.spl(1.7.2)[0AB91572-CACF-39DF-86B3-116FF8CDCB8E]@0xffffff7f9cb2e000


This time around, the problem began when I returned to my Mac partition, after booting into Windows for a few hours to play a game. I didn't realize before, but thinking back, the last time this problem began (with 1.8.2), I think the last OS I had booted was Windows.

This may be relevant, because—I should have mentioned this before—I'm also using the ZFS Windows driver. The Windows driver is set to read-only, so it shouldn't be making any changes to the drive, but it could be. Windows, however, has no trouble reading my pool, and has never blue screened on me.
Wowfunhappy
 
Posts: 14
Joined: Sat Jul 21, 2018 11:58 am

Re: After working for 6 months, OpenZFS 1.8.2 now KP's on bo

Postby lundman » Sat May 18, 2019 10:27 pm

Have you given 1.9rc a go? We've certainly seen the vdev_dtl_reassess panic before, but no idea what causes it, or how to trigger it.
User avatar
lundman
 
Posts: 590
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: After working for 6 months, OpenZFS 1.8.2 now KP's on bo

Postby Wowfunhappy » Sat May 18, 2019 11:10 pm

I was able to fix the problem by temporarily disconnecting one of the drives in my mirrored pool, then adding it back and letting it resilver. Works on 1.8.2 too.

I'll try the RC, though not sure if it will tell us anything unless the problem comes up again...
Wowfunhappy
 
Posts: 14
Joined: Sat Jul 21, 2018 11:58 am

Re: After working for 6 months, OpenZFS 1.8.2 now KP's on bo

Postby Wowfunhappy » Sun May 19, 2019 9:25 am

Nope, still happens on the RC. Log from that version below just in case it's useful for some reason, although really nothing changed.

Code: Select all
Anonymous UUID:       A9969654-7837-D6AD-6B7C-1BFD8AF13863

Sun May 19 13:16:19 2019

*** Panic Report ***
panic(cpu 1 caller 0xffffff8012388f6f): Kernel trap at 0xffffff7f947cfcd7, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000000000000180, CR3: 0x0000000029479000, CR4: 0x00000000003627e0
RAX: 0x0000000000000000, RBX: 0x0000000000000000, RCX: 0x0000000000000000, RDX: 0x0000000000000000
RSP: 0xffffffa42c353db0, RBP: 0xffffffa42c353e20, RSI: 0x0000000000000000, RDI: 0xffffff83d015d000
R8:  0xffffff7f948f0ce0, R9:  0xffffff83d014d000, R10: 0x0000000000000000, R11: 0x0000000000000001
R12: 0x0000000000000000, R13: 0x0000000000000000, R14: 0xffffff83d015d000, R15: 0xffffff83ce0a2000
RFL: 0x0000000000010202, RIP: 0xffffff7f947cfcd7, CS:  0x0000000000000008, SS:  0x0000000000000000
Fault CR2: 0x0000000000000180, Error code: 0x0000000000000000, Fault CPU: 0x1, PL: 0, VF: 1

Backtrace (CPU 1), Frame : Return Address
0xffffffa42c353880 : 0xffffff801226e1c6 mach_kernel : _handle_debugger_trap + 0x4c6
0xffffffa42c3538d0 : 0xffffff8012396a74 mach_kernel : _kdp_i386_trap + 0x114
0xffffffa42c353910 : 0xffffff8012388d44 mach_kernel : _kernel_trap + 0x4e4
0xffffffa42c353980 : 0xffffff80122201e0 mach_kernel : _return_from_trap + 0xe0
0xffffffa42c3539a0 : 0xffffff801226dc3c mach_kernel : _panic_trap_to_debugger + 0x21c
0xffffffa42c353ad0 : 0xffffff801226d9fc mach_kernel : _panic + 0x5c
0xffffffa42c353b30 : 0xffffff8012388f6f mach_kernel : _kernel_trap + 0x70f
0xffffffa42c353ca0 : 0xffffff80122201e0 mach_kernel : _return_from_trap + 0xe0
0xffffffa42c353cc0 : 0xffffff7f947cfcd7 net.lundman.zfs : _vdev_dtl_reassess + 0xbf
0xffffffa42c353e20 : 0xffffff7f947cfc63 net.lundman.zfs : _vdev_dtl_reassess + 0x4b
0xffffffa42c353ea0 : 0xffffff7f947cfc63 net.lundman.zfs : _vdev_dtl_reassess + 0x4b
0xffffffa42c353f20 : 0xffffff7f947c6681 net.lundman.zfs : _spa_vdev_state_exit + 0x3e
0xffffffa42c353f50 : 0xffffff7f947c228a net.lundman.zfs : _spa_async_thread + 0x1b8
0xffffffa42c353fa0 : 0xffffff801221f557 mach_kernel : _call_continuation + 0x17
      Kernel Extensions in backtrace:
         net.lundman.zfs(1.9)[4768C5F0-353E-3BD4-B447-FC2BBA5FD152]@0xffffff7f94753000->0xffffff7f94a73fff
            dependency: com.apple.iokit.IOStorageFamily(2.1)[F27A8A2A-6662-3608-83BD-415037509E01]@0xffffff7f94723000
            dependency: net.lundman.spl(1.9.0)[2A4540C6-04E9-3A37-AEC4-8B3D251D70B1]@0xffffff7f9352e000


I'm now wondering if this is a hardware issue, I may try buying a new drive. Odd, because the drive I need to disconnect to remove the panic is less than a year old, and it's a good HGST drive.
Wowfunhappy
 
Posts: 14
Joined: Sat Jul 21, 2018 11:58 am

Re: After working for 6 months, OpenZFS 1.8.2 now KP's on bo

Postby lundman » Sun May 19, 2019 9:04 pm

_vdev_dtl_reassess + 0xbf

Code: Select all
(lldb) target create --no-dependents --arch x86_64 ../zfs/module/zfs/zfs
(lldb) image lookup --verbose --address vdev_dtl_reassess+0xbf
    LineEntry: [0x000000000007ccd7-0x000000000007ccde): /Users/lundman/Developer/mojave/zfs/module/zfs/vdev.c:2518:40

2518:   dsl_scan_t *scn = spa->spa_dsl_pool->dp_scan;


Peculiar, it doesn't have a pool related? I went out and found all commits with "spa_dsl_pool" differences to ZOL, and found 4 (update 5).
I produced a branch of these differences, as well as
Code: Select all
      if (spa->spa_dsl_pool == NULL) {
         printf("%s: spa->spa_dsl_pool == NULL\n", __func__);


So if you could try that as well, it would be interesting to know if it still can happen.
If you have a Terminal open running
Code: Select all
log stream --source --predicate 'sender == "zfs" OR sender == "spl"' --style compact


and keep an eye out for that print above. That we handle the NULL case there is unlikely to fix it, probably will just die somewhere
else a bit later.

OpenZFSonOsX-1.9.0-vdev-10.13.pkg
(8.55 MiB) Downloaded 23 times
User avatar
lundman
 
Posts: 590
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: After working for 6 months, OpenZFS 1.8.2 now KP's on bo

Postby Wowfunhappy » Fri May 24, 2019 6:26 am

Thanks, just saw this now! I'll give this a spin and let you know if anything happens.

I am increasingly convinced there's a hardware problem involved here—the panic seems to re-appear a few days after re-adding the problem drive. Sometimes ZFS will just say it can't find the drive, other times it will kernel panic and I'll need to disconnect the drive in order to boot. Of course, it would be great if ZFS didn't kernel panic regardless!
Wowfunhappy
 
Posts: 14
Joined: Sat Jul 21, 2018 11:58 am

Re: After working for 6 months, OpenZFS 1.8.2 now KP's on bo

Postby Wowfunhappy » Thu May 30, 2019 5:41 am

It's really hard to say for sure, but I really do think something changed with this build. I've rebooted many times since installing and have not experienced any kernel panics.
Wowfunhappy
 
Posts: 14
Joined: Sat Jul 21, 2018 11:58 am


Return to General Help

Who is online

Users browsing this forum: Bing [Bot] and 2 guests

cron