2.1.6 kernel panics on degraded raidz2 (also affects v2.2)

Developer discussions.

Re: 2.1.6 kernel panics on degraded raidz2 (also affects v2.

Postby Haravikk » Sat Mar 09, 2024 6:58 am

So I finally had time to sit and mess around with the different _impl tunables to see if I could narrow down the problem and I've found it.

The issue lies specifically with setting either of the following two implementations:

Code: Select all
kstat.zfs.darwin.tunable.icp_aes_impl=aesni
kstat.zfs.darwin.tunable.icp_gcm_impl=pclmulqdq


Setting either of these to those values (or fastest, which seems to select them) causes a kernel panic the moment a dataset is mounted (or zvol unlocked). All other implementations seem to be fine if left on fastest for now, it's just these two that are causing a macmini4,1 or an imac10,1 to crash.

Both models are Intel Core Duo systems from roughly 2009/2010 with the same basic CPU architecture version, the Mac Mini I believe is a P8600 (2.66ghz dual core) and the iMac is an E7600 (3.06ghz).

I'm guessing these don't support the instructions required by the aesni and pclmulqdq implementations, which does beg the question of why ZFS is identifying them as doing so? Either that or the compiled implementations are using some other instruction that these CPUs do not support. Either way these implementations need to be disabled for these and similar CPUs, or the tests to detect the instruction sets adjusted to do so.

I'm attaching the crash logs from the kernel panics caused when I set each of these options explicitly during testing. For the purposes of my tests all other _impl tunables were set to generic, original or scalar as appropriate; IIRC it was zfs_vdev_raidz_impl=original, zfs_fletcher_4_impl=scalar and everything else set to generic, I have since verified that with icp_aes_impl=x86_64 and icp_gcm_impl=generic, it is safe for all others to be set to fastest.

I'm going to report this issue to the main ZFS github as well, to see if anyone else can verify if it is a macOS specific issue or not (I've never gotten Linux to run reliably on either of these machines so I can't test).

Update: Here is the Github issue if you'd like to track it or comment.
Attachments
Crashes.zip
(7.04 KiB) Downloaded 46 times
Haravikk
 
Posts: 82
Joined: Tue Mar 17, 2015 4:52 am

Re: 2.1.6 kernel panics on degraded raidz2 (also affects v2.

Postby lundman » Sat Mar 16, 2024 2:49 pm

invalid opcode - sounds like the cpuid problem, try 2.2.3 ?
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: 2.1.6 kernel panics on degraded raidz2 (also affects v2.

Postby Haravikk » Sat Mar 16, 2024 2:54 pm

lundman wrote:invalid opcode - sounds like the cpuid problem, try 2.2.3 ?

Sorry, forgot to update this thread – I'm the one that was posting on github as well; the Mac Mini I've been mainly testing against is currently running an excruciatingly slow copy operation, should be done some time tomorrow so I'll try and test after it finishes, or on Monday.
Haravikk
 
Posts: 82
Joined: Tue Mar 17, 2015 4:52 am

Previous

Return to OpenZFS on OS X Development

Who is online

Users browsing this forum: No registered users and 73 guests