Import of external pool causes kernel panic Mojave/O3X 1.9.2

All your general support questions for OpenZFS on OS X.

Import of external pool causes kernel panic Mojave/O3X 1.9.2

Postby Sharko » Tue Nov 24, 2020 9:32 pm

Hi, so it appears that my main external backup enclosure has a problem... it has one zpool and several datasets, and each dataset has a couple dozen snapshots. I (accidentally, as it turns out, grrr) deleted four of the oldest snapshots on one of the datasets using the % notation to specify a range of snapshots, and that triggered a kernel panic. Now, each time I reboot and try and import that same zpool on the enclosure I get a kernel panic. I did manage to import it successfully once by specifying readonly=on during import, but it has been pretty repeatable about crashing the machine otherwise.

Some facts about the situation:
Mac Pro 5,1, 32GB ram, with ZFS limited to 8GB
MacOS Mojave 10.14.6 up to date with security patches as of today
Running OpenZFSOnOSX 1.9.2
Enclosure is an OWC Mercury Elite Dual, with two 4TB spinning rust disks in a mirror zpool, connected via ESATA port
Structure of the pool is ELITE/ENCRYPTED/SHOME_BACKUP, where ELITE is zpool name, ENCRYPTED is an encrypted container dataset (zfs native encryption), and SHOME_BACKUP is the dataset that I deleted snapshots from.
As I recall, ELITE had about 500GB of free space on it prior to this episode.

So, what do you think my next steps should be to either debug and capture diagnostic data, or recover from the error? Would updating to 1.9.4 help, do you think? This is all just backup data, so I suppose I could import read-only and then destroy the pool to start over... yuck.
Sharko
 
Posts: 146
Joined: Thu May 12, 2016 12:19 pm

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby Sharko » Wed Nov 25, 2020 8:31 am

Turns out that there was some unique data on the pool, so I'm replicating that to a spare drive as first priority. Then, I wondered if anyone has any opinions about whether using the -F flag on import might return the pool to an operational state? The last command I issued to the pool while it was read/write was that destroy command, which applied to four snapshots; does that imply that "-F 4" would be the correct syntax to try? The only thing that gives me pause is this statement under the -F option in the man page: "This option is ignored if the pool is importable or already imported." My pool might not be functionally importable (it crashes the machine), but it might not appear to be non-importable.
Sharko
 
Posts: 146
Joined: Thu May 12, 2016 12:19 pm

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby lundman » Wed Nov 25, 2020 3:44 pm

You are supposed to get the last (highest) txg for your pool, usually from zdb, then import -F "txg" where you subtract one from txg and try, then 2, and try, ...

You should also try setting "zfs_recover=1", then "import -N" to see if you can import it (without mounting). If that works, try "zfs mount dataset" that you need.
User avatar
lundman
 
Posts: 1030
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby Sharko » Wed Nov 25, 2020 4:25 pm

OK, thanks for that. I haven't used zdb yet, but there's always a first time!

Where would I be setting "zfs_recover=1"? Is that another option on the zpool import command line? I'm away from my Mac, so I can't do a man zpool.

Kurt
Sharko
 
Posts: 146
Joined: Thu May 12, 2016 12:19 pm

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby lundman » Wed Nov 25, 2020 5:13 pm

sysctl kstat.zfs.darwin.tunable.zfs_recover=1
User avatar
lundman
 
Posts: 1030
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby Sharko » Wed Nov 25, 2020 11:06 pm

zdb doesn't appear to want to work with a readonly pool - I can get diagnostic information from my working pools using it, but not the bad one:

Code: Select all
sh-3.2# zdb -u MICRON

Uberblock:
   magic = 0000000000bab10c
   version = 5000
   txg = 215826
   guid_sum = 8429642594950362400
   timestamp = 1606369886 UTC = Wed Nov 25 21:51:26 2020
   mmp_magic = 00000000a11cea11
   mmp_delay = 0
   checkpoint_txg = 0

sh-3.2#


Code: Select all
sh-3.2# zdb -u ELITE
zdb: can't open 'ELITE': No such file or directory
sh-3.2#


Here is the original crash report:
Code: Select all
Tue Nov 24 20:45:53 2020

*** Panic Report ***
panic(cpu 0 caller 0xffffff7f9ac0022e): zfs: allocating allocated segment(offset=970732429312 size=4096) of (offset=970732376064 size=229376)

Backtrace (CPU 0), Frame : Return Address
0xffffffa428c631f0 : 0xffffff8019fa965d mach_kernel : _handle_debugger_trap + 0x47d
0xffffffa428c63240 : 0xffffff801a0e5235 mach_kernel : _kdp_i386_trap + 0x155
0xffffffa428c63280 : 0xffffff801a0d696a mach_kernel : _kernel_trap + 0x50a
0xffffffa428c632f0 : 0xffffff8019f569d0 mach_kernel : _return_from_trap + 0xe0
0xffffffa428c63310 : 0xffffff8019fa9077 mach_kernel : _panic_trap_to_debugger + 0x197
0xffffffa428c63430 : 0xffffff8019fa8ec3 mach_kernel : _panic + 0x63
0xffffffa428c634a0 : 0xffffff7f9ac0022e net.lundman.spl : _vcmn_err + 0x8e
0xffffffa428c635c0 : 0xffffff7f9becc03a net.lundman.zfs : _zfs_panic_recover + 0x6a
0xffffffa428c63620 : 0xffffff7f9be81955 net.lundman.zfs : _range_tree_add_impl + 0x1fc
0xffffffa428c636c0 : 0xffffff7f9be7e856 net.lundman.zfs : _metaslab_free_concrete + 0x193
0xffffffa428c63710 : 0xffffff7f9be7f686 net.lundman.zfs : _metaslab_free + 0x113
0xffffffa428c63750 : 0xffffff7f9bf002bd net.lundman.zfs : _zio_dva_free + 0x23
0xffffffa428c63760 : 0xffffff7f9befc503 net.lundman.zfs : _zio_nowait + 0x133
0xffffffa428c637c0 : 0xffffff7f9be6d2ab net.lundman.zfs : _dsl_scan_free_block_cb + 0x91
0xffffffa428c63800 : 0xffffff7f9be3054f net.lundman.zfs : _bpobj_iterate_impl + 0xe6
0xffffffa428c63950 : 0xffffff7f9be308d2 net.lundman.zfs : _bpobj_iterate_impl + 0x469
0xffffffa428c63aa0 : 0xffffff7f9be6a57d net.lundman.zfs : _dsl_scan_sync + 0x290
0xffffffa428c63c90 : 0xffffff7f9be8f754 net.lundman.zfs : _spa_sync + 0xa5b
0xffffffa428c63ed0 : 0xffffff7f9be9b485 net.lundman.zfs : _txg_sync_thread + 0x273
0xffffffa428c63fa0 : 0xffffff8019f560ce mach_kernel : _call_continuation + 0x2e
      Kernel Extensions in backtrace:
         net.lundman.spl(1.9.2)[EAA28CC7-9F6A-3C7B-BB90-691EBDC3A258]@0xffffff7f9abfe000->0xffffff7f9bdf2fff
         net.lundman.zfs(1.9.2)[4C34A112-866A-3499-B5EA-4DF7CDF1FFF4]@0xffffff7f9be23000->0xffffff7f9c1dcfff
            dependency: com.apple.iokit.IOStorageFamily(2.1)[DFD9596C-E596-376A-8A00-3B74A06C2D02]@0xffffff7f9bdf3000
            dependency: net.lundman.spl(1.9.2)[EAA28CC7-9F6A-3C7B-BB90-691EBDC3A258]@0xffffff7f9abfe000

BSD process name corresponding to current thread: kernel_task
Boot args: keepsyms=y cwae=2

Mac OS version:
18G6042

Kernel version:
Darwin Kernel Version 18.7.0: Fri Oct 30 12:37:06 PDT 2020; root:xnu-4903.278.44.0.2~1/RELEASE_X86_64
Kernel UUID: 86E800F2-0020-3EB1-84F8-4054B2DF7E61
Kernel slide:     0x0000000019c00000
Kernel text base: 0xffffff8019e00000
__HIB  text base: 0xffffff8019d00000
System model name: MacPro5,1 (Mac-F221BEC8)

System uptime in nanoseconds: 762215523913
last loaded kext at 252380180267: com.apple.filesystems.msdosfs   1.10 (addr 0xffffff7fa00df000, size 69632)


And here is crash report for the last time I tried to import the pool non-readonly:

Code: Select all
Tue Nov 24 21:02:36 2020

*** Panic Report ***
panic(cpu 0 caller 0xffffff7f9d00022e): zfs: allocating allocated segment(offset=970732429312 size=4096) of (offset=970732376064 size=229376)

Backtrace (CPU 0), Frame : Return Address
0xffffffa424e0b1f0 : 0xffffff801c3a965d mach_kernel : _handle_debugger_trap + 0x47d
0xffffffa424e0b240 : 0xffffff801c4e5235 mach_kernel : _kdp_i386_trap + 0x155
0xffffffa424e0b280 : 0xffffff801c4d696a mach_kernel : _kernel_trap + 0x50a
0xffffffa424e0b2f0 : 0xffffff801c3569d0 mach_kernel : _return_from_trap + 0xe0
0xffffffa424e0b310 : 0xffffff801c3a9077 mach_kernel : _panic_trap_to_debugger + 0x197
0xffffffa424e0b430 : 0xffffff801c3a8ec3 mach_kernel : _panic + 0x63
0xffffffa424e0b4a0 : 0xffffff7f9d00022e net.lundman.spl : _vcmn_err + 0x8e
0xffffffa424e0b5c0 : 0xffffff7f9e2cc03a net.lundman.zfs : _zfs_panic_recover + 0x6a
0xffffffa424e0b620 : 0xffffff7f9e281955 net.lundman.zfs : _range_tree_add_impl + 0x1fc
0xffffffa424e0b6c0 : 0xffffff7f9e27e856 net.lundman.zfs : _metaslab_free_concrete + 0x193
0xffffffa424e0b710 : 0xffffff7f9e27f686 net.lundman.zfs : _metaslab_free + 0x113
0xffffffa424e0b750 : 0xffffff7f9e3002bd net.lundman.zfs : _zio_dva_free + 0x23
0xffffffa424e0b760 : 0xffffff7f9e2fc503 net.lundman.zfs : _zio_nowait + 0x133
0xffffffa424e0b7c0 : 0xffffff7f9e26d2ab net.lundman.zfs : _dsl_scan_free_block_cb + 0x91
0xffffffa424e0b800 : 0xffffff7f9e23054f net.lundman.zfs : _bpobj_iterate_impl + 0xe6
0xffffffa424e0b950 : 0xffffff7f9e2308d2 net.lundman.zfs : _bpobj_iterate_impl + 0x469
0xffffffa424e0baa0 : 0xffffff7f9e26a57d net.lundman.zfs : _dsl_scan_sync + 0x290
0xffffffa424e0bc90 : 0xffffff7f9e28f754 net.lundman.zfs : _spa_sync + 0xa5b
0xffffffa424e0bed0 : 0xffffff7f9e29b485 net.lundman.zfs : _txg_sync_thread + 0x273
0xffffffa424e0bfa0 : 0xffffff801c3560ce mach_kernel : _call_continuation + 0x2e
      Kernel Extensions in backtrace:
         net.lundman.spl(1.9.2)[EAA28CC7-9F6A-3C7B-BB90-691EBDC3A258]@0xffffff7f9cffe000->0xffffff7f9e1f2fff
         net.lundman.zfs(1.9.2)[4C34A112-866A-3499-B5EA-4DF7CDF1FFF4]@0xffffff7f9e223000->0xffffff7f9e5dcfff
            dependency: com.apple.iokit.IOStorageFamily(2.1)[DFD9596C-E596-376A-8A00-3B74A06C2D02]@0xffffff7f9e1f3000
            dependency: net.lundman.spl(1.9.2)[EAA28CC7-9F6A-3C7B-BB90-691EBDC3A258]@0xffffff7f9cffe000

BSD process name corresponding to current thread: kernel_task
Boot args: keepsyms=y cwae=2

Mac OS version:
18G6042

Kernel version:
Darwin Kernel Version 18.7.0: Fri Oct 30 12:37:06 PDT 2020; root:xnu-4903.278.44.0.2~1/RELEASE_X86_64
Kernel UUID: 86E800F2-0020-3EB1-84F8-4054B2DF7E61
Kernel slide:     0x000000001c000000
Kernel text base: 0xffffff801c200000
__HIB  text base: 0xffffff801c100000
System model name: MacPro5,1 (Mac-F221BEC8)

System uptime in nanoseconds: 48086723995
last loaded kext at 17764739550: com.apple.driver.AppleBluetoothMultitouch   96 (addr 0xffffff7fa0ead000, size 61440)
loaded kexts:


The two reports look pretty identical.

Not a lot to say about this pool; I created it from scratch under 1.9.2 when I got the error message about bookmark_v2 rendering my snapshots risky on an old pool that I had upgraded from zfs 1.5.x. The history is pretty standard: I basically created the pool, created the zfs native encryption container dataset, and then I've been sending it datasets with zfs send | zfs receive. About the only potentially non-standard thing that I've done with it is that I have automatic snapshots set up on my source pool, but I don't send all snapshots over to this backup pool. I manually send over the monthly snapshots, mostly.

I've copied all the unique data off this pool, but I'll keep it in its present state for a day or two in case you'd like me to do any tests with it, Mr. Lundman? Thank you for your assistance, as always.
Sharko
 
Posts: 146
Joined: Thu May 12, 2016 12:19 pm

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby lundman » Wed Nov 25, 2020 11:25 pm

you can use "zdb -l /dev/diskX" to just read the label and get txg.

However, you panic has:
zfs_panic_recover

which means you can get past it setting zfs_recover - which is worth trying before using -T txg
User avatar
lundman
 
Posts: 1030
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby Sharko » Sun Nov 29, 2020 9:05 am

OK, setting zfs_recover via sysctl seems to have worked. I can import the pool normally! How much should I trust it, do you think? Should I roll back any txg just to be safe? Should I run a scrub, and assume that if that passes all is well?
Sharko
 
Posts: 146
Joined: Thu May 12, 2016 12:19 pm

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby lundman » Sun Nov 29, 2020 2:14 pm

The problem is still there, so you need zfs_recover on. But it's not a particularly bad problem. Long term, you'd want to rebuild the pool probably, but it's not a drop-everything-right-now thing.
User avatar
lundman
 
Posts: 1030
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Import of external pool causes kernel panic Mojave/O3X 1

Postby Sharko » Fri Jan 01, 2021 4:13 pm

So, I rebuilt that backup pool from scratch on my OWC ELITE dual disk enclosure. Should I turn off zfs_recover now, since leaving it on might sort of cover up problems in the future?

Kurt
Sharko
 
Posts: 146
Joined: Thu May 12, 2016 12:19 pm

Next

Return to General Help

Who is online

Users browsing this forum: No registered users and 4 guests