Extreme Performance Issues with v2.1.6 (and v2.2.3)

All your general support questions for OpenZFS on OS X.

Re: Extreme Performance Issues with v2.1.6 (and v2.2.2)

Postby Haravikk » Wed Mar 20, 2024 2:56 am

armdn wrote:Nope. You are not the one with such trouble. I have the same problem on two different machines. Mac Pro 2008 with 32GB or RAM and Mac Pro 2010 with 128GB of RAM and it is the same thing when I started write operations on pools. The pools itself also different, at first I thought it was hard drive corruption somewhere else, but no, there isn't. I downgraded on MP2008 OpenZFS to version 2.1.0 from 2.2.2 and... everything starts to work smooth and nice. Of course I recreated pool.

On 2.1.6 through 2.2.2 the system caught I/O locks and almost unusable when I started writing anything... After several gigabytes write speed downs to 1-3 mb and huge system locks appears.

While that's not good news it's nice to know it's not just somehow my problem alone! You mention system locks as an issue, how are you monitoring these? I'm hoping my spindumps might show where the time is being wasted, but it can't hurt to get as much information as possible!
Haravikk
 
Posts: 82
Joined: Tue Mar 17, 2015 4:52 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby armdn » Sun Mar 24, 2024 5:56 am

Further more. Today I started moving data from the pool to another disk in order to recreate the pool to version 2.1.0... And rsync showed me a gradual drop in performance when reading from a pool running ZFS version 2.2.2. A few hours later, the system began to slow down so much that it had to be forced to reboot. This is some kind of nightmare. I made a short copy via rsync with a drop in performance and did a spindump in parallel.

Update: i disabled primary cache and secondary cache, and then rsynced files.

Code: Select all
zfs set primarycache=none RAIDZ
zfs set secondarycache=none RAIDZ


Performance now seems normal. Caches are broken.

spindump.txt.zip
(1.21 MiB) Downloaded 30 times
armdn
 
Posts: 16
Joined: Mon Mar 24, 2014 9:05 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby Haravikk » Sun Mar 24, 2024 8:40 am

Woah, so we have an actual possible culprit at last, awesome work!

I've just done the same on my affected system and I can confirm that it's much, much more responsive with primarycache and secondarycache both set to none, though of course this means it's not quite as responsive as v2.1.0 (as no cache means more requests hitting the drives).

I really wish I'd seen your post before spending the last two and bit hours trying everything else except disabling caching while trying to narrow this down. All I succeeded in doing was confirming that encryption and compression as definitely not the problem, by wasting time doing a huge send/receive of one of my affected datasets, removing encryption and compression so I could test it with neither of these enabled.

But yeah, the problem is definitely something to do with ARC. The fact that there aren't a load of Linux OpenZFS users complaining about the same issue makes me think this is macOS specific, either that or whatever the issue is is affected macOS worse than Linux.
Haravikk
 
Posts: 82
Joined: Tue Mar 17, 2015 4:52 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby cgiard » Sun Mar 24, 2024 8:44 am

The note in the 2.2.0 release post mentioned "Fully adaptive ARC" as one of the changes, so there were big changes related to the ARC.
cgiard
 
Posts: 22
Joined: Sat Dec 20, 2014 8:10 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby cgiard » Sun Mar 24, 2024 8:45 am

And also "Persistent L2ARC fixes" in 2.1.6's release post.
cgiard
 
Posts: 22
Joined: Sat Dec 20, 2014 8:10 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby Haravikk » Sun Mar 24, 2024 10:04 am

Since we have a culprit and a reproducible workaround for the problem, I've posted a Github issue for this to try and narrow it down to the important bits; let me know if you think there's anything I've missed. There's an issue on the main OpenZFS Github that I'm currently tracking that sounds suspiciously similar, though their problem only affects an encrypted dataset so could be unrelated. Having an issue for the macOS problem will make it easier to reference it if it's confirmed to be the same problem (or at least similar enough to be noteworthy).
Last edited by Haravikk on Sun Mar 24, 2024 10:10 am, edited 1 time in total.
Haravikk
 
Posts: 82
Joined: Tue Mar 17, 2015 4:52 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby Haravikk » Sun Mar 24, 2024 10:10 am

cgiard wrote:And also "Persistent L2ARC fixes" in 2.1.6's release post.

Hmm, while I do have an L2ARC configured for one of my pools, would fixes for persistence be likely to cause this kind of problem? Unfortunately it's getting too late in the day now for me to try testing with my L2ARC removed. @armdn, do you have an L2ARC device configured on any of your pools? Might confirm it as a possibility if we're both using one, could also give a hint as to why not everyone is seeing this problem as most probably don't bother with one.

It sounds like a possibility at least, I'll have to find some time to do additional testing.

Another weak possible cause from the v2.1.6 changelog that might also be worth looking into are the updates to lz4 and zstd; I know that the ARC used to only store decompressed records, but I don't remember which ZFS version changed it to retain the compressed records instead (to maximise space). But if records in the ARC were being decompressed often and it wasn't performing well, or there's some weird interaction, that could also be a possible cause? I just realised that when I was testing a decrypted dataset I removed zstd compression but probably would have still been using lz4, so if that's part of the problem somehow I wouldn't have seen an improvement. That's pure speculation though, it'll take quite a while to setup new test datasets, probably won't have time for a few days. I don't see any options for configuring lz4 or zstd implementations, are there any or is it just one default implementation each?
Haravikk
 
Posts: 82
Joined: Tue Mar 17, 2015 4:52 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby armdn » Mon Mar 25, 2024 10:06 am

Haravikk wrote:Since we have a culprit and a reproducible workaround for the problem, I've posted a Github issue for this to try and narrow it down to the important bits; let me know if you think there's anything I've missed. There's an issue on the main OpenZFS Github that I'm currently tracking that sounds suspiciously similar, though their problem only affects an encrypted dataset so could be unrelated. Having an issue for the macOS problem will make it easier to reference it if it's confirmed to be the same problem (or at least similar enough to be noteworthy).


Distributed version has a wide range from 10.13.6 to 10.15.6 at least.
armdn
 
Posts: 16
Joined: Mon Mar 24, 2014 9:05 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby armdn » Mon Mar 25, 2024 10:45 am

Haravikk wrote:
cgiard wrote:And also "Persistent L2ARC fixes" in 2.1.6's release post.

Hmm, while I do have an L2ARC configured for one of my pools, would fixes for persistence be likely to cause this kind of problem? Unfortunately it's getting too late in the day now for me to try testing with my L2ARC removed. @armdn, do you have an L2ARC device configured on any of your pools? Might confirm it as a possibility if we're both using one, could also give a hint as to why not everyone is seeing this problem as most probably don't bother with one.

It sounds like a possibility at least, I'll have to find some time to do additional testing.

Another weak possible cause from the v2.1.6 changelog that might also be worth looking into are the updates to lz4 and zstd; I know that the ARC used to only store decompressed records, but I don't remember which ZFS version changed it to retain the compressed records instead (to maximise space). But if records in the ARC were being decompressed often and it wasn't performing well, or there's some weird interaction, that could also be a possible cause? I just realised that when I was testing a decrypted dataset I removed zstd compression but probably would have still been using lz4, so if that's part of the problem somehow I wouldn't have seen an improvement. That's pure speculation though, it'll take quite a while to setup new test datasets, probably won't have time for a few days. I don't see any options for configuring lz4 or zstd implementations, are there any or is it just one default implementation each?


Yes, I use L2ARC and ZIL devices (SAS SLC SSD). But I tried without them. That is, I threw L2ARC/ZIL devices out of the pool and nothing changed. Apparently this is a direct problem with the primary ARC and ZIL caches.
armdn
 
Posts: 16
Joined: Mon Mar 24, 2014 9:05 am

Re: Extreme Performance Issues with v2.1.6 (and v2.2.3)

Postby armdn » Tue Mar 26, 2024 12:55 am

armdn
 
Posts: 16
Joined: Mon Mar 24, 2014 9:05 am

PreviousNext

Return to General Help

Who is online

Users browsing this forum: Bing [Bot] and 130 guests