OpenZFS on OS X

by **RobRehnmark** » Wed Oct 18, 2017 4:01 pm

kernel_task is running at 100% cpu utilisation.
It's no immediate disaster since I have 6 cores / 12 threads but there is definitely something going on and I think it has to do with ZFS.
I had the problem with instant reboots when trying to scrub but then I could scrub again.
Importing the pool, or at least some of the file systems on it, seems to be problematic. It takes quite a bit of time compared to before.
It seems it's ballooning a bit in memory usage too. I don't think I have used the file system much since rebooting but kernel_task is now at 12,67 GB out of the 24 GB I have.

I would be grateful if someone could advice on how to get the right logs, etc. in order to try to figure out what's going on.

I'm running High Sierra and just installed from source a few days ago.

Code: Select all: 138 1 0xffffff7f82e55000 0x3f8 0x3f8 net.lundman.kernel.dependencies.30 (12.5.0) EABE2046-57AE-4F8D-8EEB-15176843E226 139 1 0xffffff7f82e56000 0x11f5000 0x11f5000 net.lundman.spl (1.6.2) 47DF1BD7-98FE-38D1-88DF-2BB62C86AAC9 <138 7 5 4 3 1> 140 1 0xffffff7f8404b000 0x2b4000 0x2b4000 net.lundman.zfs (1.6.2) E26C25E4-D481-3968-B88C-BD1B70BA4403 <139 26 7 5 4 3 1>

Code: Select all: Processes: 413 total, 2 running, 2 stuck, 409 sleeping, 2246 threads 01:58:33 Load Avg: 2.73, 2.89, 2.72 CPU usage: 1.50% user, 8.96% sys, 89.52% idle SharedLibs: 248M resident, 60M data, 30M linkedit. MemRegions: 66402 total, 5064M resident, 287M private, 1932M shared. PhysMem: 23G used (13G wired), 986M unused. VM: 1893G vsize, 1091M framework vsize, 0(0) swapins, 0(0) swapouts. Networks: packets: 7305008/2689M in, 8801675/9351M out. Disks: 1190676/22G read, 1567230/19G written. PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPR PGRP PPID STATE BOOSTS %CPU_ME %CPU_OTHRS UID FAULTS COW 0 kernel_task 101.6 14:07:10 585/13 0 2 13G+ 0B 0B 0 0 running 0[0] 0.00000 0.00000 0 128834+ 0 302 WindowServer 4.7 19:11.18 5 2 626 219M 97M 0B 302 1 sleeping *0[1] 0.00167 0.00246 88 1577465+ 25156

Edit:
I tried exporting the pool but got nowhere, it didn't even finish the process.
Then I couldn't even get anywhere with zpool status.

Edit 2:
It seems it was one of the disks going bad.
The smart command failed a couple of times during bios post and now ZFS have removed the drive from the pool, degrading it.
No more 100% kernel_task.
The drive will be replaced.

by **lundman** » Wed Oct 18, 2017 4:46 pm

It would have been a little interesting to see a spindump while it was going crazy, maybe we have a too tight loop when something is dying...

by **RobRehnmark** » Wed Oct 18, 2017 5:16 pm

I should be able to replicate.
Please advice on how to get the info you want.

by **lundman** » Thu Oct 19, 2017 4:16 pm

sudo spindump

then copy the /tmp/spindump.txt file it generates somewhere for us to get

by **RobRehnmark** » Thu Oct 19, 2017 7:39 pm

Please excuse me if this question seems dumb but how do I bring the removed device online again?
I tried using <zpool online POOL DEVICE-ID/NODE NAME/UUID/GUID> but only get "no such device in pool".

Example from terminal:

Code: Select all: pc61:~ robert$ zpool status -L ocean pool: ocean state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: scrub canceled on Wed Oct 18 12:39:16 2017 config: NAME STATE READ WRITE CKSUM ocean DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 disk6 ONLINE 0 0 0 disk3 ONLINE 0 0 0 disk4 ONLINE 0 0 0 disk7 ONLINE 0 0 0 disk2 ONLINE 0 0 0 disk5 REMOVED 0 0 0 errors: No known data errors pc61:~ robert$ zpool online ocean disk5 cannot online disk5: no such device in pool pc61:~ robert$

EDIT:
Ok, so after I ...
Exported and imported with -d /dev it seemed like I could online the device.
But it still said REMOVED.
Another export (-f) and normal import.
Still labeled REMOVED and now it will not accept zpool online

Code: Select all: bash-3.2# zpool online ocean media-41A0CCCD-87AA-834D-B08E-9199274F3250 cannot online media-41A0CCCD-87AA-834D-B08E-9199274F3250: cannot relabel '/private/var/run/disk/by-id/media-41A0CCCD-87AA-834D-B08E-9199274F3250': unable to read disk capacity

It seems I will have to accept that the disk is really dead but maybe I can get it to play nice enough to bring it online for some more trouble and a spindump.
A new drive will arrive in the mail today or tomorrow.

by **RobRehnmark** » Fri Oct 20, 2017 6:48 am

I'm sorry I never managed to bring the disk online to get the spindump.
The new drive is resilvering now.
Many thanks to all the people working on O3X, my files are safe.

OpenZFS on OS X

kernel_task constantly running at over 100%

kernel_task constantly running at over 100%

Re: kernel_task constantly running at over 100%

Re: kernel_task constantly running at over 100%

Re: kernel_task constantly running at over 100%

Re: kernel_task constantly running at over 100%

Re: kernel_task constantly running at over 100%

Who is online