Thunderbolt DAS issue?

All your general support questions for OpenZFS on OS X.

Thunderbolt DAS issue?

Postby zisper » Sat Aug 13, 2016 2:40 pm

Hi;

I've been having some issues with ZFS when disks are inserted in a thunderbolt DAS enclosure - a Highpoint Rocketstor 6324LS. My computer is a MacMini 5,1. The Pool is set up as a 3 way mirror, though only two of the disks are currently in use. It works fine (ableit slowly, as I'm stuck with USB2 on the Mini) with the disks in a USB dock. However as soon as I start to move any (or all) of the disks into the thunderbolt enclosure I have issues with io hangs on the pool every (what looks to be) 30 seconds. The pool is currently resilvering so I'd expect to see constant io, which I do on USB, but not with any (or all) disks in the thunderbolt enclosure. See the iostat output below:

Code: Select all
Sudo zpool iostat 5
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
PlayPool     423G   508G    205     44  19.7M   216K
PlayPool     423G   508G      2      0   156K      0
PlayPool     423G   508G      2      0  6.89K      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0  14.4K      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0  91.0K  1.59K
PlayPool     423G   508G     46    118   203K   481K
PlayPool     423G   508G     60    112   222K   408K
PlayPool     423G   508G     43    103   167K   421K
PlayPool     423G   508G     49    103   186K   411K
PlayPool     423G   508G     19     40  71.0K   149K
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G     24     69  85.3K   268K
PlayPool     423G   508G     98    200   376K   791K
PlayPool     423G   508G     10     21  38.2K  76.1K
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G     58    121   218K   479K
PlayPool     423G   508G    103    204   399K   807K
PlayPool     423G   508G    102    217   394K   856K
PlayPool     423G   508G    110    225   674K  1.36M
PlayPool     423G   508G    106    171   272K   463K
PlayPool     423G   508G    194    236  1.69M  1.10M
PlayPool     423G   508G    199     57  16.3M   278K
PlayPool     423G   508G    211     34  25.5M   161K
PlayPool     423G   508G    218     29  26.8M   142K
PlayPool     423G   508G     92      1  11.4M  8.80K
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G     41     23  4.86M   116K
PlayPool     423G   508G    129      0  16.2M  3.30K
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      8     20   781K  97.1K
PlayPool     423G   508G    218      4  27.4M  20.8K
PlayPool     423G   508G    100     25  12.1M   124K
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G     26     24  3.01M   118K
PlayPool     423G   508G     16      0  2.12M      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G    107     24  13.2M   117K
PlayPool     423G   508G      0      0      0      0


If I've also got another disk in the DAS, formatted as a standard HFS+ volume. If I try and do any significant operations on the HFS volume (rsync a directory, or copy a single large file onto the disk) then the HFS volume will be ejected, with subsequent warnings from OS X. I'd put it down to a problem with the DAS, however if I try and do the same sort of operations with _only_ HFS disks in DAS I haven't been able to get it to fail (though I haven't had the time to really push it yet).
I'm not sure where to go as far as investigating this further - are there specific log files I should be looking at? I've been checking /var/log/system.log, but can't see anything related to the ~30sec timeouts.
zisper
 
Posts: 16
Joined: Wed Jul 20, 2016 2:48 am

Re: Thunderbolt DAS issue?

Postby Brendon » Sat Aug 13, 2016 5:54 pm

I would suggest that if you know how to build the code, give the current master a go. There are some changes that improve memory performance and relieve some odd the glitching behavior, which sounds similar to what you are experiencing.

Brendon
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Re: Thunderbolt DAS issue?

Postby zisper » Sun Aug 14, 2016 4:35 am

I don't think that's helped, I'm seeing the same io pauses:

Code: Select all
pool iostat 5
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
PlayPool     423G   508G    899    910   108M   108M
PlayPool     423G   508G     87     83  7.23M  5.73M
PlayPool     423G   508G     48     91  5.39M  6.28M
PlayPool     423G   508G     49     89  5.58M  5.98M
PlayPool     423G   508G    956    971   117M   117M
PlayPool     423G   508G    941    968   109M   110M
PlayPool     423G   508G  1.12K  1.09K   114M   114M
PlayPool     423G   508G    967    880  74.0M  74.4M
PlayPool     423G   508G     74     69  6.32M  6.38M
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G    485    432  28.2M  28.4M
PlayPool     423G   508G  1.10K    973  87.4M  87.4M
PlayPool     423G   508G  1.03K    935  82.8M  82.2M
PlayPool     423G   508G    453    382  21.9M  22.6M
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G    198    191  9.86M  10.1M
PlayPool     423G   508G  1.08K    872  44.7M  44.6M
PlayPool     423G   508G    579    549  22.9M  22.8M
PlayPool     423G   508G    196    198  22.9M  22.9M
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0  4.00K      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G     16     37  1.33M  1.56M
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G    478    499  59.2M  59.5M
PlayPool     423G   508G      2      1   282K   256K
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      0      0      0      0
PlayPool     423G   508G      6      0   793K      0
PlayPool     423G   508G    150    182  18.3M  19.3M


But, I'm not 100% sure that the old "released" kexts haven't been reloaded rather than the ones I built, as the version numbers seem the same:

Code: Select all
kextstat | grep lundman
  118    1 0xffffff7f80dac000 0x3f8      0x3f8      net.lundman.kernel.dependencies.25 (12.5.0) E91EDA87-1C24-4964-9C4B-9DFCD40C05FF
  119    1 0xffffff7f80dad000 0x50000    0x50000    net.lundman.spl (1.5.2) FA239C8B-1F56-36B5-8FF1-7FE08E204EF5 <118 7 5 4 3 1>
  120    1 0xffffff7f80dfd000 0x247000   0x247000   net.lundman.zfs (1.5.2) 9953BBEF-2212-3EFF-90C4-649D148938E9 <119 16 7 5 4 3 1>


I did follow the uninstall process, so I should be running the current master. And I just noticed that I'm getting a bunch of new messages in /var/log/system.log, this is a sample of what I get each time the pool io resumes for a few seconds:

Code: Select all
ug 14 22:30:39 Mini-i5 zed[1780]: eid=544 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1782]: eid=545 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1784]: eid=546 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1786]: eid=547 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1788]: eid=548 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1790]: eid=549 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1792]: eid=550 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1794]: eid=551 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1796]: eid=552 class=delay pool=PlayPool
Aug 14 22:30:39 Mini-i5 zed[1798]: eid=553 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1800]: eid=554 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1802]: eid=555 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1804]: eid=556 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1806]: eid=557 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1808]: eid=558 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1810]: eid=559 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1812]: eid=560 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1814]: eid=561 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1816]: eid=562 class=delay pool=PlayPool
Aug 14 22:30:40 Mini-i5 zed[1818]: eid=563 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1820]: eid=564 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1822]: eid=565 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1824]: eid=566 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1826]: eid=567 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1828]: eid=568 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1830]: eid=569 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1832]: eid=570 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1834]: eid=571 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1836]: eid=572 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1838]: eid=573 class=delay pool=PlayPool
Aug 14 22:30:41 Mini-i5 zed[1840]: eid=574 class=delay pool=PlayPool
Aug 14 22:30:42 Mini-i5 zed[1842]: eid=575 class=delay pool=PlayPool
Aug 14 22:30:42 Mini-i5 zed[1844]: eid=576 class=delay pool=PlayPool
Aug 14 22:30:42 Mini-i5 zed[1846]: eid=577 class=delay pool=PlayPool
Aug 14 22:30:42 Mini-i5 zed[1848]: eid=578 class=delay pool=PlayPool


There's also a _lot_ of messages in the zfs events log - look how close those timestamps are:

Code: Select all
Aug 14 2016 22:31:45.292312000 ereport.fs.zfs.delay
        class = "ereport.fs.zfs.delay"
        ena = 0xf54612a99800001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xc49be25784c8b17f
                vdev = 0x8c51c468134e04a7
        (end detector)
        pool = "PlayPool"
        pool_guid = 0xc49be25784c8b17f
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x8c51c468134e04a7
        vdev_type = "mirror"
        vdev_ashift = 0xc
        vdev_complete_ts = 0x0
        vdev_delta_ts = 0x0
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        parent_guid = 0xc49be25784c8b17f
        parent_type = "root"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x0
        zio_flags = 0xca8
        zio_stage = 0x800000
        zio_pipeline = 0xb80000
        zio_delay = 0x7755988dc
        zio_timestamp = 0x0
        zio_delta = 0x0
        zio_offset = 0x107f6ee000
        zio_size = 0x1b200
        zio_objset = 0x159
        zio_object = 0x9fe5
        zio_level = 0x0
        zio_blkid = 0x945
        time = 0x57b064b1 0x116c53c0
        eid = 0x2cb

Aug 14 2016 22:31:45.293076000 ereport.fs.zfs.delay
        class = "ereport.fs.zfs.delay"
        ena = 0xf5461e3c1600001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xc49be25784c8b17f
                vdev = 0x8c51c468134e04a7
        (end detector)
        pool = "PlayPool"
        pool_guid = 0xc49be25784c8b17f
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x8c51c468134e04a7
        vdev_type = "mirror"
        vdev_ashift = 0xc
        vdev_complete_ts = 0x0
        vdev_delta_ts = 0x0
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        parent_guid = 0xc49be25784c8b17f
        parent_type = "root"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x0
        zio_flags = 0xca8
        zio_stage = 0x800000
        zio_pipeline = 0xb80000
        zio_delay = 0x775532e3a
        zio_timestamp = 0x0
        zio_delta = 0x0
        zio_offset = 0x107f70a000
        zio_size = 0x1b800
        zio_objset = 0x159
        zio_object = 0x9fe5
        zio_level = 0x0
        zio_blkid = 0x947
        time = 0x57b064b1 0x1177fc20
        eid = 0x2cc

Aug 14 2016 22:31:45.293632000 ereport.fs.zfs.delay
        class = "ereport.fs.zfs.delay"
        ena = 0xf54626cb1a00001
        detector = (embedded nvlist)
                version = 0x0
                scheme = "zfs"
                pool = 0xc49be25784c8b17f
                vdev = 0x8c51c468134e04a7
        (end detector)
        pool = "PlayPool"
        pool_guid = 0xc49be25784c8b17f
        pool_context = 0x0
        pool_failmode = "wait"
        vdev_guid = 0x8c51c468134e04a7
        vdev_type = "mirror"
        vdev_ashift = 0xc
        vdev_complete_ts = 0x0
        vdev_delta_ts = 0x0
        vdev_read_errors = 0x0
        vdev_write_errors = 0x0
        vdev_cksum_errors = 0x0
        parent_guid = 0xc49be25784c8b17f
        parent_type = "root"
        vdev_spare_paths =
        vdev_spare_guids =
        zio_err = 0x0
        zio_flags = 0xca8
        zio_stage = 0x800000
        zio_pipeline = 0xb80000
        zio_delay = 0x7755efabd
        zio_timestamp = 0x0
        zio_delta = 0x0
        zio_offset = 0x107f726000
        zio_size = 0x1b000
        zio_objset = 0x159
        zio_object = 0x9fe5
        zio_level = 0x0
        zio_blkid = 0x946
        time = 0x57b064b1 0x11807800
        eid = 0x2cd



Any suggestions?
zisper
 
Posts: 16
Joined: Wed Jul 20, 2016 2:48 am

Re: Thunderbolt DAS issue?

Postby Brendon » Wed Aug 17, 2016 2:45 am

Please report output of sysctl -a | grep kext when running the code you compiled. The reason I want to check this is because the only known cause of big IO pauses like that is overloading of the OS memory allocator. None of us have your enclosure, so can comment on its compatability. Also make sure none of your disks are having problems? Slow performance can be caused by a failing disk - however seems a little unlikely given you work on on USB. Also check your thunderbolt cable is good.

Beyond that I suggest you join the IRC channel and have a chat with lundman.

Cheers
Brendon
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Re: Thunderbolt DAS issue?

Postby zisper » Wed Aug 17, 2016 4:20 am

Here you go:

Code: Select all
ysctl -a | grep kext
debug.kextlog: 4083
debug.swd_kext_name:
spl.kext_version: 1.5.2-4_g79d69d3
zfs.kext_version: 1.5.2-46_g8800155


So I think it's back to the 512kb allocator? (That's definitely what's in the SPL codebase.). I'm getting another thunderbolt cable to give that a try - I do now think the enclosure issues are unrelated to ZFS though, I can reliably get a standard HFS+ formatted disk in the Thunderbolt DAS to fall be ejected under heavy "sustained" IO. (It'll hold up for a few GB of transfer and then drop out.) I'm running through some troubleshooting with the supplier. Fingers crossed.

Thanks for the advice.
zisper
 
Posts: 16
Joined: Wed Jul 20, 2016 2:48 am


Return to General Help

Who is online

Users browsing this forum: No registered users and 23 guests