Difference between revisions of "Panic"

From OpenZFS on OS X
Jump to: navigation, search
m (Best effort for M1 (arm64))
(Best effort for M1 (arm64))
Line 53: Line 53:
  
 
=== Best effort for M1 (arm64) ===
 
=== Best effort for M1 (arm64) ===
 +
 +
 +
# xcrun llvm-objdump --disassemble --arch arm64e /Library/Extensions/zfs.kext/Contents/MacOS/zfs|head
 +
/Library/Extensions/zfs.kext/Contents/MacOS/zfs:
 +
(__TEXT_EXEC,__text) section
 +
_atomic_cas_ptr:
 +
  94000: 02 fc e1 c8 casal x1, x2, [x0]
  
 
  org.openzfsonosx.zfs(2.1)[0BF8CB05-9B3B-3182-8DE6-AF14261D75B8]@ 0xfffffe0022410000->0xfffffe00226fffff
 
  org.openzfsonosx.zfs(2.1)[0BF8CB05-9B3B-3182-8DE6-AF14261D75B8]@ 0xfffffe0022410000->0xfffffe00226fffff
  0xfffffe0022410000 - 0x88000 = 0xFFFFFE0022388000
+
  0xfffffe0022410000 - 0x94000 = 0xFFFFFE0022388000
 
   
 
   
 
    lr: 0xfffffe002348abe4  fp: 0xfffffe3feaf526f0
 
    lr: 0xfffffe002348abe4  fp: 0xfffffe3feaf526f0

Revision as of 05:28, 30 May 2023

Kernel panics

One of the most useful settings to assist with debugging is telling Darwin kernel to keep the symbols from kexts. This can be set using the nvram command, and requires a reboot.

First check to see if you have any special boot-args set and add the new keepsyms instruction.

# nvram boot-args="keepsyms=y debug=0x144"

and reboot the machine for it to take effect.

[Table 20-1] in Apple's Kernel Programming Guide has a summary of the meaning of the debug options.


Panic decoding

If you get a panic but you do not have symbols enabled, it can be decoded using the atos command.

In a standard panic log, you will see something like:

# cd /Library/Logs/DiagnosticReports/
# less Kernel_2014-03-13-093629_OSX109.panic
Backtrace (CPU 0), Frame : Return Address
0xffffff8088843b10 : 0xffffff7f85e25759  : 0xffffff7f80dcf760 
0xffffff8088843b40 : 0xffffff7f85e25560  : 0xffffff7f80dcf423 
0xffffff8088843be0 : 0xffffff7f85e08f27  : 0xffffff7f80dc491a

      Kernel Extensions in backtrace:
        net.lundman.spl(1.0)[7F69C13B-C730-3475-99E9-53861AC6C54E]@0xffffff7f85d2a000->0xffffff7f85d36fff
        net.lundman.zfs(1.0)[5637421D-EE17-33F1-ACB2-8FA38BC5A5A6]@0xffffff7f80d54000->0xffffff7f85f38fff

We can then run the command

 # xcrun atos -arch x86_64 -l 0xffffff7f80d54000 -o ../zfs/module/zfs/zfs.kext/Contents/MacOS/zfs   0xffffff7f80dcf760 0xffffff7f80dcf423 0xffffff7f80dc491a
got symbolicator for ../zfs/module/zfs/zfs.kext/Contents/MacOS/zfs, base address 0
spa_taskqs_init (in zfs) (spa.c:888)
spa_create_zio_taskqs (in zfs) (spa.c:972)
spa_activate (in zfs) (spa.c:1094)

Which can be repeated for spl, and spl load address as well, if needed.

And for kernel addresses, look for "kernel slide:" value, I assumed 0 in this example

xcrun atos -arch x86_64 -d -o /Volumes/KernelDebugKit/mach_kernel -s 0   0xffffff8000222f79 0xffffff80002dc24e 0xffffff80002f3746 


If you are not panicking, but would like to print the stack at a certain point in the kext, you can use

OSReportWithBacktrace("I am here: vp %p\n", vp);

in `printf` style notation.


Best effort for M1 (arm64)

# xcrun llvm-objdump --disassemble --arch arm64e /Library/Extensions/zfs.kext/Contents/MacOS/zfs|head
/Library/Extensions/zfs.kext/Contents/MacOS/zfs:
(__TEXT_EXEC,__text) section
_atomic_cas_ptr:
  94000:	02 fc e1 c8	casal	x1, x2, [x0]
org.openzfsonosx.zfs(2.1)[0BF8CB05-9B3B-3182-8DE6-AF14261D75B8]@ 0xfffffe0022410000->0xfffffe00226fffff
0xfffffe0022410000 - 0x94000 = 0xFFFFFE0022388000

		  lr: 0xfffffe002348abe4  fp: 0xfffffe3feaf526f0
		  lr: 0xfffffe002348a9c8  fp: 0xfffffe3feaf52760

# atos -o module/os/macos/zfs/zfs -arch arm64e -l 0xFFFFFE0022388000 0xfffffe002348abe4 0xfffffe002348a9c8 ....

Alternate symbol lookup with lldb

Panic:

panic(cpu 5 caller 0xffffff80088d1066): trying to interlock destroyed mutex (0xffffff8029196000)
Backtrace (CPU 5), Frame : Return Address
0xffffff81f49fba80 : 0xffffff8008822fa9 
0xffffff81f49fbb00 : 0xffffff80088d1066 
0xffffff81f49fbb10 : 0xffffff800889c75e 
0xffffff81f49fbbe0 : 0xffffff80088ae60c 
0xffffff81f49fbc00 : 0xffffff7f8a4252e0
0xffffff81f49fbdf0 : 0xffffff80089ffea9 
        net.lundman.zfs(1.0)[0EC79B06-3C9F-3529-8450-42222507F310]@0xffffff7f8a33c000->0xffffff7f8a545fff

Assuming you have the same build as panic report, in this case 1.2.7

# lldb
(lldb) target create --no-dependents --arch x86_64 module/zfs/zfs   #Binary before moved into zfs.kext
(lldb) target modules load --file zfs __TEXT 0xffffff7f8a33c000
(lldb) image lookup --verbose --address 0xffffff7f8a4252e0

     Address: zfs[0x00000000000e92e0] (zfs.__TEXT.__text + 950160)
     Summary: zfs`zfs_vnop_pageout + 1264 at zfs_vnops_osx.c:1236
      Module: file = "/Users/lundman/x/zfs/module/zfs/zfs", arch = "x86_64"
 CompileUnit: id = {0x00000000}, file = "/Users/lundman/x/zfs/module/zfs/zfs_vnops_osx.c", language = "c89"
   LineEntry: [0xffffff7f8a4252da-0xffffff7f8a4252f0): /Users/lundman/x/zfs/module/zfs/zfs_vnops_osx.c:1236

zfs_vnops_osx.c:1236

    tx = dmu_tx_create(zfsvfs->z_os);
   dmu_tx_hold_write(tx, zp->z_id, off, len);
   dmu_tx_hold_bonus(tx, zp->z_id);
   err = dmu_tx_assign(tx, TXG_NOWAIT);

Or just for the kernel

(lldb) target create --no-dependents --arch x86_64 mach_kernel
(lldb) target modules load --file mach_kernel --slide 0x000000000b600000
(lldb) image lookup -a 0xffffff800b8d6aa7

Kernel to Userland

Connect to crashed kernel:

# Load the compiled binary, just makes it easier for lldb to find it when connecting
(lldb) target create --no-dependents --arch x86_64 module/os/macos/zfs/zfs.kext/Contents/MacOS/zfs                   
          Current executable set to 'module/os/macos/zfs/zfs.kext/Contents/MacOS/zfs' (x86_64).
# Connect to crashed VM
(lldb) kdp-remote 172.16.248.128
Version: Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/DEVELOPMENT_X86_64; UUID=12647AE2-57FB-35DB-AF57-4B25D060D845; stext=0xffffff802f200000

In this case we have died when trying to copyinstr() from userland:

(lldb) up
frame #1: 0xffffff7f8db91238 zfs`history_str_get(zc=0xffffff90a8dec000) at zfs_ioctl.c:295:6
   292 			return (NULL);
   293
   294 		buf = kmem_alloc(HIS_MAX_RECORD_LEN, KM_SLEEP);
-> 295 		if (copyinstr((void *)(uintptr_t)zc->zc_history,
   296 		    buf, HIS_MAX_RECORD_LEN, NULL) != 0) {
   297 			history_str_free(buf);
   298 			return (NULL);
(lldb) p zc->zc_history
(uint64_t) $1 = 4413168320

Find the task ptr for userprocess, a few ways;

(lldb) showproc 0xffffff804afb3700
task                 vm_map               ipc_space            #acts flags    pid       process             io_policy  wq_state  command
0xffffff803bb76498   0xffffff804056cf00   0xffffff803bbd4200       2 D        687   0xffffff804afb3700                 1  1  0    zpool

(lldb) showpid 687
task                 vm_map               ipc_space            #acts flags    pid       process             io_policy  wq_state  command
0xffffff803bb76498   0xffffff804056cf00   0xffffff803bbd4200       2 D        687   0xffffff804afb3700                 1  1  0    zpool

(lldb) showprocinfo 0xffffff804afb3700
Process 0xffffff804afb3700
	name zpool
	pid:687    task:0xffffff803bb76498   p_stat:2      parent pid: 686
Cred: euid 0 ruid 0 svuid 0
Flags: 0x4006
	0x00000002 - has a controlling tty
	0x00000004 - process is 64 bit
	0x00004000 - process has called exec

You can view user process threads:

(lldb) showtaskuserstacks 0xffffff803bb76498
Process:         zpool [687]
Path:            zpool
Identifier:      zpool
Version:         ??? (???)
Code Type:       X86-64 (Native)
Parent Process:  bash [686]

Date/Time:       2020-05-08 09:20:50.000 -0800
OS Version:      Mac OS X 10.8 (18G103)
Report Version:  8

Exception Type:  n/a
Exception Codes: n/a
Crashed Thread:  0

Application Specific Information:
Synthetic crash log generated from Kernel userstacks

Binary Images:
Unknown dyld all_image_infos version number 15
0x10efd9000 - 0x10f018fff zpool                                              
 (??? - ???) <7234AC69-6CE8-321D-9ACC-4F73BB964650> /Users/lundman/src/zfs/openzfs/scripts/..//cmd/zpool/.libs/zpool

[snip]

Create a gdb-server to debug zpool process:

(lldb) beginusertaskdebugging 0xffffff803bb76498
Starting debug session for zpool at localhost:7936.

From another shell, start to debug the zpool process, using the convenience wrapper script: All it does is run "lldb zpool" after setting DYLD_LIBRARY_PATH so you can run it in the build directory.

± ./scripts/debug-macos.sh zpool
(lldb) target create "/Users/lundman/src/zfs/openzfs/scripts/..//cmd/zpool/.libs/zpool"
Current executable set to '/Users/lundman/src/zfs/openzfs/scripts/..//cmd/zpool/.libs/zpool' (x86_64).
(lldb) gdb-remote localhost:7936
Process 687 stopped
* thread #1, stop reason = signal SIGINT
    frame #0: 0x00007fff63fbfb5a libsystem_kernel.dylib
->  0x7fff63fbfb5a: jae    0x7fff63fbfb64
    0x7fff63fbfb5c: movq   %rax, %rdi
    0x7fff63fbfb5f: jmp    0x7fff63fbf421
    0x7fff63fbfb64: retq

(lldb) up
frame #6: 0x000000010efdc434 zpool`zpool_export_one(zhp=0x00007ff51440b750, data=0x00007ffee0c25718) at zpool_main.c:1706:13
   1703         if (cb->hardforce) {
   1704                 if (zpool_export_force(zhp, history_str) != 0)
   1705                         return (1);
-> 1706         } else if (zpool_export(zhp, cb->force, history_str) != 0) {
   1707                 return (1);
   1708         }
   1709
(lldb) p history_str
(char [2049]) $0 = "zpool export BOOM"

Links

https://developer.apple.com/library/mac/qa/qa1264/_index.html

https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KernelProgramming/build/build.html#//apple_ref/doc/uid/TP30000905-CH221-BABDGEGF

https://developer.apple.com/library/mac/documentation/Darwin/Reference/Manpages/man8/kext_logging.8.html