Unstable system with no traces of cause.

All your general support questions for OpenZFS on OS X.

Unstable system with no traces of cause.

Postby FunMiles » Sun Oct 03, 2021 12:43 pm

Whenever I mount a particular pool, I get a crash within a time that depends on operations. But it rarely exceeds two hours if any activity (e.g. scrub or zfs send) is going on.

However I've not gotten any log indicating the cause of the crash except for once. where I got the log at the end of this post.
Now I was wondering if I would be able to mount my pool onto linux OpenZFS and if so, what step I would need to take. (Zpool upgrade?)

If it is not possible to get it under linux, what are the next step to address this?

Only log I got:
panic(cpu 2 caller 0xffffff8007e45d0a): Kernel trap at 0xffffff7f8b88ce93, type 14=page fault, registers:
CR0: 0x000000008001003b, CR2: 0x0000000000000000, CR3: 0x000000000ac3c000, CR4: 0x00000000001626e0
RAX: 0x0000000000000000, RBX: 0xffffffa4021c7cd0, RCX: 0x0000000000000000, RDX: 0x0000000003000000
RSP: 0xffffff83b47dbf20, RBP: 0xffffff83b47dbfa0, RSI: 0xffffffa4021c7cf8, RDI: 0x0000000000000000
R8: 0x0000000000000000, R9: 0x0000000000989680, R10: 0x5555555555555555, R11: 0x0000000000000000
R12: 0x0000000000000001, R13: 0xffffff7f8bc4d048, R14: 0xffffffa4021c7cf0, R15: 0xffffffa52f7f54a0
RFL: 0x0000000000010202, RIP: 0xffffff7f8b88ce93, CS: 0x0000000000000008, SS: 0x0000000000000010
Fault CR2: 0x0000000000000000, Error code: 0x0000000000000002, Fault CPU: 0x2, PL: 0, VF: 1

Backtrace (CPU 2), Frame : Return Address
0xffffff83b47db980 : 0xffffff8007d1966d
0xffffff83b47db9d0 : 0xffffff8007e53dd5
0xffffff83b47dba10 : 0xffffff8007e4595e
0xffffff83b47dba60 : 0xffffff8007cbfa40
0xffffff83b47dba80 : 0xffffff8007d18d37
0xffffff83b47dbb80 : 0xffffff8007d19127
0xffffff83b47dbbd0 : 0xffffff80084be38c
0xffffff83b47dbc40 : 0xffffff8007e45d0a
0xffffff83b47dbdc0 : 0xffffff8007e45a08
0xffffff83b47dbe10 : 0xffffff8007cbfa40
0xffffff83b47dbe30 : 0xffffff7f8b88ce93
0xffffff83b47dbfa0 : 0xffffff8007cbf13e
Kernel Extensions in backtrace:
org.openzfsonosx.zfs(2.1)[D8D37025-5993-3A02-8E85-9537CACFBA27]@0xffffff7f8b617000->0xffffff7f8cdbcfff
dependency: com.apple.iokit.IOStorageFamily(2.1)[952AB52A-016F-3ED1-BFA5-5AF400169BD9]@0xffffff7f88665000

BSD process name corresponding to current thread: kernel_task

Mac OS version:
19H1419
FunMiles
 
Posts: 27
Joined: Sat Sep 30, 2017 12:05 pm

Re: Unstable system with no traces of cause.

Postby FunMiles » Sun Oct 03, 2021 4:43 pm

I tried another approach. I bought a thunderbolt3 to thunderbolt2 adapter and connected my disk enclosure to my latest laptop running Big Sur.
The scrub that was in progress any time the pool was mounted and that would crash the Catalina running machine gave some error information under the zpool status command:
pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub in progress since Sun Oct 3 11:30:22 2021
4.67T scanned at 625M/s, 3.80T issued at 65.6M/s, 8.40T total
84K repaired, 45.19% done, 20:25:50 to go
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
media-A6052B79-D9D0-3D47-962E-5EC8BBA07067 ONLINE 68 87 3 (repairing)
media-C0923150-FBB0-F84B-9A65-0E3D6188FE46 ONLINE 0 0 0
media-0518C44D-A367-F54B-B406-3021CAB6C00C ONLINE 0 0 0
ST2000DM001-1ER164-Z4Z0B8GN ONLINE 0 0 0
ST2000DM001-1ER164-Z4Z0BTDY ONLINE 0 0 0

errors: No known data errors


The computer crashed right after I pasted this data
When logging in, I got a system crash notification this time. It is not very informative right out of the box however.
Nonetheless it could indicate that it is the repair process that is causing the crash.
As the repair seems to be on one of the drives, would replacing that drive and re-silvering have a chance of fixing the issue?
panic(cpu 1 caller 0xfffffff01e174350): x86 CPU CATERR detected
Debugger message: panic
Memory ID: 0x6
OS release type: User
OS version: 18P4759a
macOS version: 20G165
Kernel version: Darwin Kernel Version 20.6.0: Tue Jun 22 21:55:04 PDT 2021; root:xnu-7195.141.2~1/RELEASE_ARM64_T8010
Kernel UUID: E6B23446-BF1C-3A39-9311-EDEAEB220BF9
iBoot version: iBoot-6723.140.2
secure boot?: YES
x86 EFI Boot State: 0x16
x86 System State: 0x0
x86 Power State: 0x0
x86 Shutdown Cause: 0x1
x86 Previous Power Transitions: 0x10001000100
PCIeUp link state: 0x89473611
Paniclog version: 13
Kernel slide: 0x0000000016214000
Kernel text base: 0xfffffff01d218000
mach_absolute_time: 0x158664e1b9da
Epoch Time: sec usec
Boot : 0x614a77c3 0x00003f25
Sleep : 0x61546f3d 0x00042a5c
Wake : 0x61547640 0x000e3609
Calendar: 0x615a4bd1 0x0003f9ac

CORE 0: PC=0xfffffff01dee2338, LR=0xfffffff01dee2338, FP=0xffffffe804543ec0
CORE 1 is the one that panicked. Check the full backtrace for details.
Panicked task 0xffffffe199d68630: 3586 pages, 227 threads: pid 0: kernel_task
Panicked thread: 0xffffffe199ff1a00, backtrace: 0xffffffe80400b700, tid: 419
lr: 0xfffffff01d9674f0 fp: 0xffffffe80400b750
lr: 0xfffffff01d967348 fp: 0xffffffe80400b7c0
lr: 0xfffffff01da92d90 fp: 0xffffffe80400b890
lr: 0xfffffff01df8d5fc fp: 0xffffffe80400b8a0
lr: 0xfffffff01d96707c fp: 0xffffffe80400bc20
lr: 0xfffffff01d96707c fp: 0xffffffe80400bc80
lr: 0xfffffff01e997200 fp: 0xffffffe80400bca0
lr: 0xfffffff01e174350 fp: 0xffffffe80400bcd0
lr: 0xfffffff01e1626e0 fp: 0xffffffe80400bd30
lr: 0xfffffff01e1645dc fp: 0xffffffe80400bdc0
lr: 0xfffffff01e161d84 fp: 0xffffffe80400be50
lr: 0xfffffff01e066894 fp: 0xffffffe80400be80
lr: 0xfffffff01dee2338 fp: 0xffffffe80400bec0
lr: 0xfffffff01dee1bb8 fp: 0xffffffe80400bf00
lr: 0xfffffff01df985a0 fp: 0x0000000000000000

FunMiles
 
Posts: 27
Joined: Sat Sep 30, 2017 12:05 pm

Re: Unstable system with no traces of cause.

Postby FunMiles » Sun Oct 03, 2021 8:54 pm

I pulled out the drive reporting an error and it didn't help. Doing a send of a snapshot crashed the OS again.

Anybody got an idea? Is going the Linux route a possibility? I am not sure how to get the disk read by my linux box, as it is on a thunderbolt-2 enclosure though.
FunMiles
 
Posts: 27
Joined: Sat Sep 30, 2017 12:05 pm

Re: Unstable system with no traces of cause.

Postby nodarkthings » Mon Oct 04, 2021 1:16 am

Hi!
I'm not able to answer you (I'm still using 10.11 with v1.9.4 as my main system... :mrgreen: ) but I had many crashes on my 10.14 partition, with system crash notifications not informative at all, and I've also initially suspected ZFS, as sometimes it was in the backtrace, but I've finally found that it was Little Snitch instead! :o
Same result with Lulu, but no problem with Hands Off!
So, just in case... ;)
nodarkthings
 
Posts: 174
Joined: Mon Jan 26, 2015 10:32 am

Re: Unstable system with no traces of cause.

Postby FunMiles » Mon Oct 04, 2021 4:55 am

nodarkthings wrote:Hi!
I'm not able to answer you (I'm still using 10.11 with v1.9.4 as my main system... :mrgreen: ) but I had many crashes on my 10.14 partition, with system crash notifications not informative at all, and I've also initially suspected ZFS, as sometimes it was in the backtrace, but I've finally found that it was Little Snitch instead! :o
Same result with Lulu, but no problem with Hands Off!
So, just in case... ;)

I understand what you're saying. But in my case, there
Are a few things that point to ZFS. The two strongest being that the crashes follow the import of the pool either on Catalina or Big Sur and that different actions on the filesystems can trigger a crash.
FunMiles
 
Posts: 27
Joined: Sat Sep 30, 2017 12:05 pm

Re: Unstable system with no traces of cause.

Postby jawbroken » Mon Oct 04, 2021 5:21 am

I think it would help if you turned on keepsyms so the backtrace was more informative. I believe it is:
Code: Select all
sudo nvram boot-args="-v keepsyms=1"

and rebooting.
jawbroken
 
Posts: 61
Joined: Wed Apr 01, 2015 4:46 am

Re: Unstable system with no traces of cause.

Postby FunMiles » Mon Oct 04, 2021 6:36 am

jawbroken wrote:I think it would help if you turned on keepsyms so the backtrace was more informative. I believe it is:
Code: Select all
sudo nvram boot-args="-v keepsyms=1"

and rebooting.

I did that. Lots of tiny font text scrolling at boot but unfortunately, the info is exactly the same.

I still would like to know if Linux OpenZFS would be able to load my disk? The initial pool was made about 7 or 8 years ago.
If it is, I have to option. First would be to try cstor under docker (I am likely to try it today as it looks like I can try it without additional hardware), the second would be to buy a USB based docking station for the disks and hope they show up under my linux box as JBOD.

panic(cpu 0 caller 0xfffffff017b48350): x86 CPU CATERR detected
Debugger message: panic
Memory ID: 0x6
OS release type: User
OS version: 18P4759a
macOS version: 20G165
Kernel version: Darwin Kernel Version 20.6.0: Tue Jun 22 21:55:04 PDT 2021; root:xnu-7195.141.2~1/RELEASE_ARM64_T8010
Kernel UUID: E6B23446-BF1C-3A39-9311-EDEAEB220BF9
iBoot version: iBoot-6723.140.2
secure boot?: YES
x86 EFI Boot State: 0x16
x86 System State: 0x0
x86 Power State: 0x0
x86 Shutdown Cause: 0x1
x86 Previous Power Transitions: 0x405060400
PCIeUp link state: 0x89271614
Paniclog version: 13
Kernel slide: 0x000000000fbe8000
Kernel text base: 0xfffffff016bec000
mach_absolute_time: 0xe79dd83a41
Epoch Time: sec usec
Boot : 0x615a6d78 0x000d16a0
Sleep : 0x615b09c1 0x000a1641
Wake : 0x615b09c3 0x000e68f3
Calendar: 0x615b0f60 0x0001fef5

CORE 0 is the one that panicked. Check the full backtrace for details.
CORE 1: PC=0xfffffff017368080, LR=0xfffffff01736806c, FP=0xffffffe810e9bee0
Panicked task 0xffffffe19a004630: 3640 pages, 227 threads: pid 0: kernel_task
Panicked thread: 0xffffffe19a1bb800, backtrace: 0xffffffe811003700, tid: 394
lr: 0xfffffff01733b4f0 fp: 0xffffffe811003750
lr: 0xfffffff01733b348 fp: 0xffffffe8110037c0
lr: 0xfffffff017466d90 fp: 0xffffffe811003890
lr: 0xfffffff0179615fc fp: 0xffffffe8110038a0
lr: 0xfffffff01733b07c fp: 0xffffffe811003c20
lr: 0xfffffff01733b07c fp: 0xffffffe811003c80
lr: 0xfffffff01836b200 fp: 0xffffffe811003ca0
lr: 0xfffffff017b48350 fp: 0xffffffe811003cd0
lr: 0xfffffff017b366e0 fp: 0xffffffe811003d30
lr: 0xfffffff017b385dc fp: 0xffffffe811003dc0
lr: 0xfffffff017b35d84 fp: 0xffffffe811003e50
lr: 0xfffffff017a3a894 fp: 0xffffffe811003e80
lr: 0xfffffff0178b6338 fp: 0xffffffe811003ec0
lr: 0xfffffff0178b5bb8 fp: 0xffffffe811003f00
lr: 0xfffffff01796c5a0 fp: 0x0000000000000000
FunMiles
 
Posts: 27
Joined: Sat Sep 30, 2017 12:05 pm

Re: Unstable system with no traces of cause.

Postby lundman » Mon Oct 04, 2021 7:41 pm

Hmm the panic logs look like arm ones, but its clearly x86. I wonder if newer bigsur releases removed the keepsyms work - that would be tedious.

I need to run the panic report through atos to get names.
User avatar
lundman
 
Posts: 1335
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Unstable system with no traces of cause.

Postby FunMiles » Tue Oct 05, 2021 5:54 pm

lundman wrote:Hmm the panic logs look like arm ones, but its clearly x86. I wonder if newer bigsur releases removed the keepsyms work - that would be tedious.

I need to run the panic report through atos to get names.

Just to make sure that I didn't mistype the arguments, I did:

michel@Michels-MacBook-Pro ~ % nvram boot-args
boot-args -v keepsyms=1

So it appears that the arguments are correct.
FunMiles
 
Posts: 27
Joined: Sat Sep 30, 2017 12:05 pm


Return to General Help

Who is online

Users browsing this forum: No registered users and 24 guests