Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

All your general support questions for OpenZFS on OS X.

Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby xenophon » Thu Mar 25, 2021 7:06 am

Hi, and greetings from Athens; thanks for this great OS X port!

I've been running OpenZFS for years, and installed the development version for Big Sur in December - updating it as newer pkgs became available. A recurrent problem with my -otherwise stable- OSX86 build has been this kernel panic upon wake from sleep, after about 13-14 days of uptime:

Code: Select all
panic(cpu 4 caller 0xffffff80161ee1e6): Kernel trap at 0xffffff7fb834f495, type 13=general protection, registers:
CR0: 0x0000000080010033, CR2: 0x000000010d2580f0, CR3: 0x0000000f0ce6121a, CR4: 0x00000000003626e0
RAX: 0x0100010001000100, RBX: 0x0000000000000002, RCX: 0x0000000200000000, RDX: 0x0000000000000002
RSP: 0xffffffc1f041b1d0, RBP: 0xffffffc1f041b430, RSI: 0x0000000000000000, RDI: 0x0000000000000001
R8:  0x0000000000000010, R9:  0x000000000000002c, R10: 0x0000000000000000, R11: 0x0000000000000000
R12: 0xffffffc1f041b4c0, R13: 0x0000000000000001, R14: 0x0000000000000001, R15: 0x0000000000000194
RFL: 0x0000000000010246, RIP: 0xffffff7fb834f495, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x000000010d2580f0, Error code: 0x0000000000000000, Fault CPU: 0x4, PL: 0, VF: 0

Backtrace (CPU 4), Frame : Return Address
0xffffff8015f581e0 : 0xffffff80160bab4d mach_kernel : _handle_debugger_trap + 0x3dd
0xffffff8015f58230 : 0xffffff80161fd7e3 mach_kernel : _kdp_i386_trap + 0x143
0xffffff8015f58270 : 0xffffff80161ede1a mach_kernel : _kernel_trap + 0x55a
0xffffff8015f582c0 : 0xffffff801605fa2f mach_kernel : _return_from_trap + 0xff
0xffffff8015f582e0 : 0xffffff80160ba3ed mach_kernel : _DebuggerTrapWithState + 0xad
0xffffff8015f58400 : 0xffffff80160ba6d8 mach_kernel : _panic_trap_to_debugger + 0x268
0xffffff8015f58470 : 0xffffff80168bef9a mach_kernel : _panic + 0x54
0xffffff8015f584e0 : 0xffffff80161ee1e6 mach_kernel : _sync_iss_to_iks + 0x2c6
0xffffff8015f58660 : 0xffffff80161edecd mach_kernel : _kernel_trap + 0x60d
0xffffff8015f586b0 : 0xffffff801605fa2f mach_kernel : _return_from_trap + 0xff
0xffffff8015f586d0 : 0xffffff7fb834f495 net.lundman.zfs : _collect_a_seq + 0x375
0xffffffc1f041b430 : 0x100010001000100
No mapping exists for frame pointer
Backtrace terminated-invalid frame pointer 0x100010001000100
      Kernel Extensions in backtrace:
         net.lundman.zfs(2.0)[ECE8A16E-4291-3907-B14F-C8A732F68C6E]@0xffffff7fb820a000->0xffffff7fb8532fff
            dependency: com.apple.iokit.IOStorageFamily(2.1)[B5300908-BF34-3D47-8776-FB154A6DEE4C]@0xffffff8018b3f000->0xffffff8018b50fff

Process name corresponding to current thread: ChronoSync
Boot args: keepsyms=1 darkwake=3 shikigva=80 debug=0x100 igfxonln=1 igfxfw=2 forceRenderStandby=0 alcid=11

Mac OS version:
20D91

Kernel version:
Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64
Kernel UUID: C86236B2-4976-3542-80CA-74A6B8B4BA03
KernelCache slide: 0x0000000015e00000
KernelCache base:  0xffffff8016000000
Kernel slide:      0x0000000015e10000
Kernel text base:  0xffffff8016010000
__HIB  text base: 0xffffff8015f00000
System model name: iMac19,1 (Mac-AA95B1DDAB278B95)
System shutdown begun: NO
Panic diags file available: YES (0x0)
Hibernation exit count: 0

System uptime in nanoseconds: 528898779578654
Last Sleep:           absolute           base_tsc          base_nano
  Uptime  : 0x0001e107d7e10456
  Sleep   : 0x0001dfef827f92eb 0x000000012b30d3ef 0x0001df54cc4e5292
  Wake    : 0x0001dfef9cabe137 0x000000012b2e8c8f 0x0001dfef8bcc6890


(ChronoSync launches a scheduled, daily 2am script to backup a folder within a 2*SSD zpool, and immediately crashes the system; all prior runs had been uneventful)

Thought you mind find the info useful. Once again, many thanks!

Xen
xenophon
 
Posts: 14
Joined: Tue Jul 28, 2015 11:58 pm

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby lundman » Fri Mar 26, 2021 9:47 pm

Neat, what on earth is _collect_a_seq. That is interesting
User avatar
lundman
 
Posts: 1030
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby xenophon » Fri Mar 26, 2021 11:17 pm

This was with the early March Big Sur build.
Updated yesterday (funny "media ejected" message when exporting pools prior to installation) and will post in two weeks time ;-)
Once again, thanks, just thanks.
xenophon
 
Posts: 14
Joined: Tue Jul 28, 2015 11:58 pm

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby xenophon » Fri Apr 16, 2021 6:23 am

Had another kp last night (after daily rsync scripts running uneventfuly for 18 days of uptime).

Will install the rc4 Big Sur package and report again.

Once again, thanks for this awsome project.

Xen

Code: Select all
panic(cpu 6 caller 0xffffff80155ee1e6): Kernel trap at 0xffffff7fb774e425, type 13=general protection, registers:
CR0: 0x0000000080010033, CR2: 0x00000001074b4000, CR3: 0x00000009084ee0fc, CR4: 0x00000000003626e0
RAX: 0x0100010001000100, RBX: 0x0000000000000002, RCX: 0x0000000200000000, RDX: 0x0000000000000002
RSP: 0xffffffc1ef3bb1d0, RBP: 0xffffffc1ef3bb430, RSI: 0x0000000000000000, RDI: 0x0000000000000001
R8:  0x0000000000000010, R9:  0x000000000000002c, R10: 0x0000000000000000, R11: 0x0000000000000000
R12: 0xffffffc1ef3bb4c0, R13: 0x0000000000000001, R14: 0x0000000000000001, R15: 0x0000000000000194
RFL: 0x0000000000010246, RIP: 0xffffff7fb774e425, CS:  0x0000000000000008, SS:  0x0000000000000010
Fault CR2: 0x00000001074b4000, Error code: 0x0000000000000000, Fault CPU: 0x6, PL: 0, VF: 0

Backtrace (CPU 6), Frame : Return Address
0xffffff80153591e0 : 0xffffff80154bab4d mach_kernel : _handle_debugger_trap + 0x3dd
0xffffff8015359230 : 0xffffff80155fd7e3 mach_kernel : _kdp_i386_trap + 0x143
0xffffff8015359270 : 0xffffff80155ede1a mach_kernel : _kernel_trap + 0x55a
0xffffff80153592c0 : 0xffffff801545fa2f mach_kernel : _return_from_trap + 0xff
0xffffff80153592e0 : 0xffffff80154ba3ed mach_kernel : _DebuggerTrapWithState + 0xad
0xffffff8015359400 : 0xffffff80154ba6d8 mach_kernel : _panic_trap_to_debugger + 0x268
0xffffff8015359470 : 0xffffff8015cbef9a mach_kernel : _panic + 0x54
0xffffff80153594e0 : 0xffffff80155ee1e6 mach_kernel : _sync_iss_to_iks + 0x2c6
0xffffff8015359660 : 0xffffff80155edecd mach_kernel : _kernel_trap + 0x60d
0xffffff80153596b0 : 0xffffff801545fa2f mach_kernel : _return_from_trap + 0xff
0xffffff80153596d0 : 0xffffff7fb774e425 net.lundman.zfs : _collect_a_seq + 0x375
0xffffffc1ef3bb430 : 0x100010001000100
No mapping exists for frame pointer
Backtrace terminated-invalid frame pointer 0x100010001000100
      Kernel Extensions in backtrace:
         net.lundman.zfs(2.0)[72483E1E-DD77-3873-BC77-C1B8AFF10A58]@0xffffff7fb760a000->0xffffff7fb7931fff
            dependency: com.apple.iokit.IOStorageFamily(2.1)[B5300908-BF34-3D47-8776-FB154A6DEE4C]@0xffffff8017f3f000->0xffffff8017f50fff

Process name corresponding to current thread: ChronoSync
Boot args: keepsyms=1 darkwake=3 shikigva=80 debug=0x100 igfxonln=1 igfxfw=2 forceRenderStandby=0 alcid=11

Mac OS version:
20D91

Kernel version:
Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64
Kernel UUID: C86236B2-4976-3542-80CA-74A6B8B4BA03
KernelCache slide: 0x0000000015200000
KernelCache base:  0xffffff8015400000
Kernel slide:      0x0000000015210000
Kernel text base:  0xffffff8015410000
__HIB  text base: 0xffffff8015300000
System model name: iMac19,1 (Mac-AA95B1DDAB278B95)
System shutdown begun: NO
Panic diags file available: YES (0x0)
Hibernation exit count: 0

System uptime in nanoseconds: 977249427128799
Last Sleep:           absolute           base_tsc          base_nano
  Uptime  : 0x000378cd9cf6066d
  Sleep   : 0x000377e3fdc2e8f1 0x000000012b5d11d5 0x00037749ab277687
  Wake    : 0x000377e41597ad3d 0x000000012b339c89 0x000377e40716f959
xenophon
 
Posts: 14
Joined: Tue Jul 28, 2015 11:58 pm

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby xenophon » Tue Jun 22, 2021 9:00 am

Hi to all; this is an update with the release version on Big Sur (Intel).

I think that file operations involving large (e.g. 40G) files cause huge memory usage, perhaps ballooning the kernel_task process. As I rsync daily a particular filemaker database (size = 41Gb single file), perhaps this eventually causes the zfs extension to crash (see excerpts above).

For example, after a clean Big Sur 11.4 startup, rsyncing said file gradually increases system memory usage, as illustrated below

memory.png
memory.png (120 KiB) Viewed 486 times

(translation: physical memory 64 Gb, memory used started at around 14 Gb on startup, crept to 36.79 G during the rsync operation)

I'm not that knowledgeable, but seems that kernel_task balloons accordingly. In my usage scenario, after exactly 15 days of uptime --and a kernel_task hovering around 40Gb from day 2--, the system panics.

Does OpenZFSonOSX require memory tweaking (as discussed, say, in viewtopic.php?f=26&t=3457)?

I would appreciate any help!

Greetings from Athens and, as always, thanks for an indispensible project!
xenophon
 
Posts: 14
Joined: Tue Jul 28, 2015 11:58 pm

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby lundman » Tue Jun 22, 2021 4:08 pm

Does it work to manually ask ZFS to release memory?

sudo sysctl -w kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure=megabytes_to_free
User avatar
lundman
 
Posts: 1030
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby tangles » Tue Jun 22, 2021 4:10 pm

Hi,

On a side note, I’d seriously consider setting up another ZFS pool if possible to receive your backups.

Pushing > 40Gb each time is a bit… well mad.

ZFS send/receive would only send changed blocks of the 40Gb file and would be done almost instantly…

Maybe u have no control on the other end and that’s fair enough, but I felt compelled to mention it is all.
tangles
 
Posts: 185
Joined: Tue Jun 17, 2014 6:54 am

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby lundman » Tue Jun 22, 2021 4:26 pm

After discussing it with rottegift, it is probably not a memory leak, you have plenty after all. However, greek characters, and there is that 256 filename panic bug we just fixed,
and your stack goes in a straight line to:

zfs_lookup( zfs_dirlook( zfs_dirent_lock( u8_strcmp( do_norm_compare( collect_a_seq( ...

https://github.com/openzfsonosx/openzfs ... 1dcc23be08

It could be related, it will definitely panic if you send a filename longer than 256 ascii with 2.0.1 version (that's about 76 utf8 characters).

We aren't far off pushing out 2.1.0rc. Soit would be interesting for you to try that.
User avatar
lundman
 
Posts: 1030
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby xenophon » Tue Jun 22, 2021 10:04 pm

Thank you all. Will do some more reproducible testing and report back.
xenophon
 
Posts: 14
Joined: Tue Jul 28, 2015 11:58 pm

Re: Big Sur 2.0 pkg - kernel panics after 2 weeks of uptime

Postby xenophon » Wed Jun 23, 2021 6:46 am

Thanks, once again. While I realize that manipulating huge files with legacy methods (cp, rsync, Finder) is suboptimal, it still has to be done every now and then ;-)

So here's what I did, for troubleshooting purposes:

Code: Select all
cp single_large40Gb_nonunicode_file /Volumes/HFS/


Sure enough, memory usage crept upwards, until it reached 50Gb. Unlike my earlier diagnosis, kernel_task memory footprint was unaffected. Then I tried to reclaim as per above post:

Code: Select all
% periodic memory checks during 'cp'
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 7538044928
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 9005330432
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 9654071296
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 15730307072
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 33401139200
% now let's reclaim the memory
xenophoncosteas@Altair87 ~ % sudo sysctl -w kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure=1000
Password:
kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure: 0 -> 1048576000
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 33399599104
xenophoncosteas@Altair87 ~ % sudo sysctl -w kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure=10000
kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure: 0 -> 10485760000
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 26151055360
xenophoncosteas@Altair87 ~ % sudo sysctl -w kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure=10000
kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure: 0 -> 10485760000
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 15797055488
% reclaiming 10Gb at a time
xenophoncosteas@Altair87 ~ % sudo sysctl -w kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure=10000
kstat.spl.misc.spl_misc.spl_spl_free_manual_pressure: 0 -> 10485760000
xenophoncosteas@Altair87 ~ % sysctl kstat.spl.misc.spl_misc.os_mem_alloc
kstat.spl.misc.spl_misc.os_mem_alloc: 7417655296


I know (from prior comments by lundman) that high memory usage is benign and not a bug per se. Still, thought you might like to know that the Unicode/UTF bug is probably not responsible in my case. It's been years that my OS X / ZFS systems crash at around the 15-day mark and have always suspected some Apple housekeeping procedure to be responsible.

Ah. You can't win them all. Cheers (and thanks)!

Xen
xenophon
 
Posts: 14
Joined: Tue Jul 28, 2015 11:58 pm

Next

Return to General Help

Who is online

Users browsing this forum: No registered users and 3 guests

cron