Frequent Kernel Panics "zero io_children" with Snow Leopard

This forum is to find answers to problems you may be having with ZEVO Community Edition.

Moderators: jhartley, MSR734, nola

Frequent Kernel Panics "zero io_children" with Snow Leopard

Post by BjoKa » Fri Mar 15, 2013 8:30 pm

Using ZEVO on a Macbook Pro with Snow Leopard 10.6.8, I see frequent kernel panics, about 2-3 per day. Approximately half of these are of type "Assertion failed: zero io children" (this report) and another half is about "Page Fault" (see other topic).

The panics occur without any clearly visible pattern, although running Thunderbird and performing IMAP synchronization with large mailboxes ( > 3 GB per mailbox file) seems to increase the probability of an immediate kernel panic. However, panics do also occur when Thunderbird is not running.

Another program significantly increasing the crash risk is Spotlight (more precisely: mds and mdimporter) trying to rebuild the Spotlight index after a crash. I think Spotlight is responsible for most if not all crashes shortly after boot after a previous crash.

Relatively often (approx. 1/4 of all cases) I see two panics right after each other, usually first a page fault, followed by an assertion failed. See below for full list. The panics started around 6 days after installing ZEVO.

All panics of type "zero io children" show similar but not identical backtraces.

System configuration:
  1. Macbook Pro from late 2010
  2. MacOSX Snow Leopard 10.6.8, fully updated
  3. 8 GB of RAM
  4. internal SSD disk with four partitions (five, counting the EFI partition):
    s2 is system / boot partition,
    s3 is ZFS pool
    s4 is L2ARC for that pool,
    s5 is another HFS+ volume
  5. ZEVO community edition 1.1.1 (build 2012-09-23)
  6. MacPorts version 2.1.3

The ZFS pool has three child file systems, with mount points set to /Developer /Users/bj (my main account) and /opt (for holding MacPorts).
On all dataset, I have enable compression. DeDup is Off. Copies is set to 1 (default), normalization is formD.

Here all non-default properties:
Code: Select all
bj $ zfs get all  | grep -v -e '@2013' | grep -v -e 'default$' -e '-$'
NAME                           PROPERTY              VALUE                  SOURCE
ZFSStore/Developer             mountpoint            /Developer             local
ZFSStore/Developer             compression           on                     local
ZFSStore/bj                    mountpoint            /Users/bj              local
ZFSStore/bj                    compression           on                     local
ZFSStore/bj                    snapdir               visible                local
ZFSStore/opt                   mountpoint            /opt                   local
ZFSStore/opt                   compression           on                     local


A cron job takes automatic snapshots of the Users/bj dataset once every hour between 7:00 and 23:00. Currently the system has 454 snapshots.

The pool size is 288GB, with 188GB allocated, i.e 65% used. 115GB are used by file systems, the rest is used by snapshots.

List of panics, including link to panic file

  1. zevo-crash-2013-02-25.txt uptime 157h58 Kernel trap at 0xffffff7f8135a768, type 14=page fault
  2. zevo-crash-2013-02-25_02.txt uptime 00h54 zio.c:474 ZFS assertion failed: *countp > 0 (0x0 > 0x0)
  3. zevo-crash-2013-02-26_01.txt uptime 11h37 Kernel trap at 0xffffff7f81a7c768, type 14=page fault
  4. zevo-crash-2013-02-26_02.txt uptime 00h11 zio.c:474 ZFS assertion failed: *countp > 0 (0x0 > 0x0)
  5. zevo-crash-2013-02-26_03.txt uptime 03h07 Kernel trap at 0xffffff7f80c740a6, type 14=page fault
  6. zevo-crash-2013-02-26_04.txt uptime 02h44 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  7. zevo-crash-2013-02-26_05.txt uptime 00h00'26 Kernel trap at 0xffffff7f81a3e365, type 14=page fault
  8. zevo-crash-2013-02-27_01.rtf uptime 15h39 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  9. zevo-crash-2013-02-27_02.rtf uptime 00h16 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  10. zevo-crash-2013-03-01_01.rtf uptime 23h44 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf576718 0 0xffffff80bf5769a0 ... zio.c:478
  11. zevo-crash-2013-03-04_01.txt uptime 43h27 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  12. zevo-crash-2013-03-04_02.txt uptime 00h00'7 metaslab.c:1428 ZFS assertion failed: DVA_IS_VALID(dva)
  13. zevo-crash-2013-03-04_03.txt uptime 01h57 Kernel trap at 0xffffff7f819ff55d, type 14=page fault
  14. zevo-crash-2013-03-04_04.txt uptime 08h53 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  15. zevo-crash-2013-03-04_05.txt uptime 00h15 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  16. zevo-crash-2013-03-05_01.txt uptime 22h07 Kernel trap at 0xffffff7f819ff55d, type 14=page fault
  17. zevo-crash-2013-03-06_01.txt uptime 14h42 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf444400 0 0xffffff80bf444688 ... zio.c:478
  18. zevo-crash-2013-03-07_01.txt uptime 14h26 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf5db720 0xffffff80bf5db9d8 0xffffff80bf5db9a8 ... zio.c:478
  19. zevo-crash-2013-03-07_02.txt uptime 01h23 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf63ca58 0xffffff80bf63cd10 0xffffff80bf63cce0 ... zio.c:478
  20. zevo-crash-2013-03-07_03.txt uptime 09h05 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff810824a568 0 0xffffff810824a7f0 ... zio.c:478
  21. zevo-crash-2013-03-08_01.txt uptime 06h13 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf57e730 0 0xffffff80bf57e9b8 ... zio.c:478
  22. zevo-crash-2013-03-08_02.txt uptime 00h25 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  23. zevo-crash-2013-03-08_03.txt uptime 00h18 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf4fb0b8 0xffffff80bf4fb378 0xffffff80bf4fb340 ... zio.c:478
  24. zevo-crash-2013-03-09_01.txt uptime 05h02 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf8cbd60 0 0xffffff80bf8cbfe8 ... zio.c:478
  25. zevo-crash-2013-03-09_02.txt uptime 08h43 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80cd4bd7f8 0 0xffffff80cd4bda80 ... zio.c:478
  26. zevo-crash-2013-03-13_01.txt uptime 38h46 Kernel trap at 0xffffff7f819e2a96, type 14=page fault
  27. zevo-crash-2013-03-13_02.txt uptime 10h04 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf7c20a0 0 0xffffff80bf7c2328 ... zio.c:478
  28. zevo-crash-2013-03-13_03.txt uptime 04h23 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf613720 0xffffff80bf6139e0 0xffffff80bf6139a8 ... zio.c:478
  29. zevo-crash-2013-03-13_04.txt uptime 04h04 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  30. zevo-crash-2013-03-14_01.txt uptime 15h07 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  31. zevo-crash-2013-03-14_02.txt uptime 04h20 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  32. zevo-crash-2013-03-15_01.txt uptime 12h17 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf4ab708 0 0xffffff80bf4ab990 ... zio.c:478
  33. zevo-crash-2013-03-15_02.txt uptime 02h24 metaslab.c:1428 ZFS assertion failed: DVA_IS_VALID(dva)
  34. zevo-crash-2013-03-15_03.txt uptime 00h00'6 metaslab.c:1428 ZFS assertion failed: DVA_IS_VALID(dva)
  35. zevo-crash-2013-03-15_04.txt uptime 00h00'5 metaslab.c:1428 ZFS assertion failed: DVA_IS_VALID(dva)
  36. zevo-crash-2013-03-15_05.txt uptime 03h55 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff811fbf2ae0 0xffffff811fbf2d98 0xffffff811fbf2d68 ... zio.c:478
  37. zevo-crash-2013-03-15_06.txt uptime 00h01'3 Kernel trap at 0xffffff7f81a7c0a6, type 14=page fault
  38. zevo-crash-2013-03-15_07.txt uptime 01h50 metaslab.c:1428 ZFS assertion failed: DVA_IS_VALID(dva)
  39. zevo-crash-2013-03-15_08.txt uptime 00h13 zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf1bc400 0xffffff80bf1bc6c0 0xffffff80bf1bc688 ... zioc:478

Almost 40 crashes in 18 days renders Zevo pretty unusable. Even the old Z-410 beta have been more stable.

@ Don Brady et.al. : If you need more information, please let me know. I'll also happy to run any extra debug version you may have or to provide any help in solving this that I can give. I have several years of programming experience and some experience in Mac OSX kernel debugging. I needed, I can setup a remote machine for interactive kernel tracing and provide remote access. Feel free to contact me.


The 14 crashes with "zero io_children" are distributed over 3 different backtraces:

Code: Select all
panic(cpu 0 caller 0xffffff7f81a7c0da): "zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf576718 0 0xffffff80bf5769a0\n"@/staging/zevo/src/uts/common/fs/zfs/zio.c:478
Backtrace (CPU 0), Frame : Return Address
0xffffff80c0263d70 : 0xffffff8000204d15
0xffffff80c0263e70 : 0xffffff7f81a7c0da
0xffffff80c0263ec0 : 0xffffff7f81a7ad31
0xffffff80c0263ef0 : 0xffffff7f81a78369
0xffffff80c0263f30 : 0xffffff7f819e8c39
0xffffff80c0263fa0 : 0xffffff80002c8527
      Kernel Extensions in backtrace (with dependencies):
         com.getgreenbytes.filesystem.zfs(2012.09.23)@0xffffff7f819db000->0xffffff7f81b1afff
            dependency: com.apple.iokit.IOStorageFamily(1.6.3)@0xffffff7f810f0000



Code: Select all
panic(cpu 1 caller 0xffffff7f81a7c0da): "zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf5db720 0xffffff80bf5db9d8 0xffffff80bf5db9a8\n"@/staging/zevo/src/uts/common/fs/zfs/zio.c:478
Backtrace (CPU 1), Frame : Return Address
0xffffff80c2163d70 : 0xffffff8000204d15
0xffffff80c2163e70 : 0xffffff7f81a7c0da
0xffffff80c2163ec0 : 0xffffff7f81a7ad31
0xffffff80c2163ef0 : 0xffffff7f81a78369
0xffffff80c2163f30 : 0xffffff7f819e8c39
0xffffff80c2163fa0 : 0xffffff80002c8527
      Kernel Extensions in backtrace (with dependencies):
         com.getgreenbytes.filesystem.zfs(2012.09.23)@0xffffff7f819db000->0xffffff7f81b1afff
            dependency: com.apple.iokit.IOStorageFamily(1.6.3)@0xffffff7f810f0000



Code: Select all
panic(cpu 0 caller 0xffffff7f81a7c0da): "zio_notify_parent: zero io_children [0, 0], err 0, 0xffffff80bf444400 0 0xffffff80bf444688\n"@/staging/zevo/src/uts/common/fs/zfs/zio.c:478
Backtrace (CPU 0), Frame : Return Address
0xffffff80aa4dbd60 : 0xffffff8000204d15
0xffffff80aa4dbe60 : 0xffffff7f81a7c0da
0xffffff80aa4dbeb0 : 0xffffff7f81a7ad31
0xffffff80aa4dbee0 : 0xffffff7f81a78369
0xffffff80aa4dbf20 : 0xffffff7f819e968c
0xffffff80aa4dbfa0 : 0xffffff80002c8527
      Kernel Extensions in backtrace (with dependencies):
         com.getgreenbytes.filesystem.zfs(2012.09.23)@0xffffff7f819db000->0xffffff7f81b1afff
            dependency: com.apple.iokit.IOStorageFamily(1.6.3)@0xffffff7f810f0000



Best regards

Björn
BjoKa Offline


 
Posts: 14
Joined: Sat Feb 02, 2013 3:18 pm
Location: Germany

Cross reference

Post by grahamperrin » Sun Mar 17, 2013 9:01 am

BjoKa wrote:… half is about "Page Fault" (see other topic). …


For convenience: Frequent Kernel Panics "Page Fault" with Snow Leopard
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Links

Post by grahamperrin » Sun Mar 17, 2013 9:08 am

In general discussion under Kernel Panic (idle state) (2012-09-28), beginning with a zio_notify_parent: zero io_children kernel panic on Snow Leopard:


(I have flagged that topic to be moved from general discussion, to troubleshooting.)

Does that November 2012 description fit with this topic?
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Frequent Kernel Panics "zero io_children" with Snow Leop

Post by BjoKa » Sun Mar 17, 2013 9:50 am

I've checked the two above linked topics.

Some of my crashes also happend while the box was idle, however I cant really say if it is the same root cause as in Kernel Panic (idle state).

Regarding the November 22 post (viewtopic.php?p=3154#p3154), it may be related. I tried to grep my kext for compiler signatures, but "grep -i -e gnu -e clang -e gcc -e llvm -r /System/Library/Extensions/ZFS*" came back empty handed. So I don't know which compiler was used to build 1.1.1. Judging from the dates, it is probably the LLVM/CLANG compiler.

Given that it is almost four months since Don confirmed a possible compiler incompatibility, it leaves a bit of a bad taste that no re-release of 1.1.1 has been announced so far. :(
BjoKa Offline


 
Posts: 14
Joined: Sat Feb 02, 2013 3:18 pm
Location: Germany

Link

Post by grahamperrin » Sun Mar 17, 2013 9:58 am

BjoKa wrote:… no re-release of 1.1.1 …


editions and versions of ZEVO, community and open source – in particular the roundup on page two.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Frequent Kernel Panics "zero io_children" with Snow Leop

Post by raattgift » Thu Mar 21, 2013 9:45 am

"internal SSD disk with four partitions (five, counting the EFI partition):
"s2 is system / boot partition,
"s3 is ZFS pool
"s4 is L2ARC for that pool,"

You won't gain any benefit from that L2ARC. L2ARC is primarily of use to absorb seek delays, and the seek delay is pretty much guaranteed to be the same on two slices of the same medium (for ANY medium). While L2ARC isn't going to worsen seek delays on an SSD (as it would on a rotating medium), its mere existence may cause you some performance problems. Each L2ARC entry eats ~256 bytes of system memory, and if the L2ARC grows large it will squeeze out space for in-memory ARC pages.

Out of curiosity, have you used any of the third-party patches available which adjust the check for APPLE SSE when deciding whether the SSD has TRIM support ? What's the make and model of the SSD ?
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm


Return to Troubleshooting

Who is online

Users browsing this forum: hlxpgxmum and 0 guests

cron