Are these speeds out of the norm?

All your general support questions for OpenZFS on OS X.

Re: Are these speeds out of the norm?

Postby e8wwv » Sun Oct 21, 2018 3:47 pm

for what its worth I am still running into a speed drop with multiple pools. I've had this issue since at least 12/2017

Image
speed drop after 1h - copying 750gb from one zfs mirror to another - 1h scale

Image
same as above - 24h scale

Image
copying between mirrors no.1 & 2 produces speed drop internet -> mirror no. 3

There are still many performance improvements needed for this port to catch up.
e8wwv
 
Posts: 14
Joined: Sat Apr 21, 2018 3:38 am

Re: Are these speeds out of the norm?

Postby lundman » Sun Oct 21, 2018 4:49 pm

So we are mostly talking about writes performance dropping? Or are we saying both?

Or put it another way, which performance problem should we focus on first.
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Are these speeds out of the norm?

Postby jdwhite » Sun Oct 21, 2018 6:24 pm

I wrote a simple benchmark that illustrates my issue with deleting files taking a painfully long time.

Code: Select all
#!/bin/bash
mkdir rmbench >/dev/null 2>&1
cd rmbench || exit
echo -n Creating files...
for f in {1..100}; do dd if=/dev/zero of=rmbenchfile.${f} bs=512 count=1 >/dev/null 2>&1; done
echo; echo Removing files...
sync
time rm rmbenchfile.*
exit

All the disks I'm testing on are in the same set of 4-bay OWC thunderbolt 2 enclosures chained off the Mac mini (specs in original post).
I ran the test three times and took the average.
I created an APFS and HFS partition on a single disk, a single disk ZFS pool, and the rest is a 6-disk RAIDZ2 pool.

APFS: 0m0.012s
HFS: 0m0.11s
ZFS single disk pool (zfstest): 0m1.056s
ZFS RAIDZ2 pool (spool): 0m25.268s

I'll accept that the zfstest pool (single disk) took roughly 9x a long to delete the same files as APFS/HFS, but the RAIDZ2 pool took ~25x as long as the zfstest pool and that I'm trying to explain.
Code: Select all
zfstest  sync                   standard               default
spool    sync                   standard               default

I have also tested using the unlink() system call from Perl; it's just as slow as rm(1).

I created a variant of this test that renames the 100 files to something else in the same filesystem and that took about .25 seconds on average to rename all 100, so really wondering what's going on with rm/unlink.
jdwhite
 
Posts: 11
Joined: Sat May 10, 2014 6:04 pm

Re: Are these speeds out of the norm?

Postby lundman » Mon Oct 22, 2018 4:24 pm

I'm reasonably sure I know the issue with unlink. VFS will call reclaim on everything, and part of that, VNOP_FSYNC. Here O3X always calls zil_commit() when we shouldn't.

Ie:
Code: Select all
diff --git a/module/zfs/zfs_vnops.c b/module/zfs/zfs_vnops.c
index c5f201a..a8f1c0b 100644
--- a/module/zfs/zfs_vnops.c
+++ b/module/zfs/zfs_vnops.c
@@ -2927,7 +2927,8 @@ zfs_fsync(vnode_t *vp, int syncflag, cred_t *cr, caller_context_t *ct)

        (void) tsd_set(zfs_fsyncer_key, (void *)zfs_fsync_sync_cnt);

-       if (zfsvfs->z_os->os_sync != ZFS_SYNC_DISABLED) {
+       if (zfsvfs->z_os->os_sync != ZFS_SYNC_DISABLED &&
+                       !vnode_isrecycled(vp)) {
                ZFS_ENTER(zfsvfs);
                ZFS_VERIFY_ZP(zp);
                zil_commit(zfsvfs->z_log, zp->z_id);


Which looks like
master:
real 0m0.677s
real 0m0.663s

patched:
real 0m0.132s
real 0m0.132s
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Are these speeds out of the norm?

Postby jdwhite » Mon Oct 22, 2018 6:03 pm

lundman wrote:I'm reasonably sure I know the issue with unlink. VFS will call reclaim on everything, and part of that, VNOP_FSYNC. Here O3X always calls zil_commit() when we shouldn't.


Thanks for looking at this. I was concerned there might be an issue with my 6-disk RAIDZ2 pool (not that I'm ruling that out), which seems to be working well, otherwise, for 3 years now.
jdwhite
 
Posts: 11
Joined: Sat May 10, 2014 6:04 pm

Re: Are these speeds out of the norm?

Postby leeb » Mon Oct 22, 2018 6:39 pm

jdwhite wrote:
lundman wrote:I'm reasonably sure I know the issue with unlink. VFS will call reclaim on everything, and part of that, VNOP_FSYNC. Here O3X always calls zil_commit() when we shouldn't.


Thanks for looking at this. I was concerned there might be an issue with my 6-disk RAIDZ2 pool (not that I'm ruling that out), which seems to be working well, otherwise, for 3 years now.


Echoing thanks for looking at this, I think it's affected a few operations I've been trying recently but more subtly. Also thanks to jdwhite for going to the trouble to make a good concise test case too, wonder if it might be worth trying to put together a small simple synthetic test suite of scripts to stress different parts of a fs. Could be helpful for trying out different options and looking for any regressions between versions?

Incidentally I tried on two of my main pools too:
Striped mirror 4xSSD: real 0m0.070s
Simple mirror 6TB HDD: real 0m8.234s

Interesting to see the variation between different pool types here. I'll have to be more careful about testing stuff, I made my main one to just throw hardware at some previous problems I had because I wanted to stick with ZFS no matter what but that can throw off being a good tester I guess.
leeb
 
Posts: 43
Joined: Thu May 15, 2014 12:10 pm

Re: Are these speeds out of the norm?

Postby lundman » Mon Oct 22, 2018 9:42 pm

If there are those who can test that patch and show the before/after timing, that would be neat. I will most likely commit it in master anyway, as it is more correct. The file data isn't more safe because we txgsync it twice ("sync ; sync" anyone?)
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Are these speeds out of the norm?

Postby tangles » Tue Oct 23, 2018 12:04 am

Hi Lundy,

With that patch you have a couple posts up, can you please show me the syntax for how I would apply it into the zfsadm means of pulling down master?

I manually edited the file with vi, then make clean and sudo make install which compiled okay, but was wondering how I do it programatically as I've never used diff before and the google machine isn't quite getting me over the line.

I take it I save the patch above as a file before applying? but I don't understand the paths with git having "a/module/..." and "b/module/..."

ta.

oh and with patch applied:
Code: Select all
cMacPro:sammy madmin$ ./test.bash
Creating files...
Removing files...

real   0m5.403s
user   0m0.004s
sys   0m0.026s
cMacPro:sammy madmin$

on pool:
Code: Select all
cMacPro:sammy madmin$ zpool status
  pool: sammy
 state: ONLINE
  scan: none requested
config:

   NAME                                            STATE     READ WRITE CKSUM
   sammy                                           ONLINE       0     0     0
     raidz1-0                                      ONLINE       0     0     0
       media-85EBB63A-D32C-B946-87CC-3AE463D7DDB9  ONLINE       0     0     0
       media-B9CFDE35-10AB-4D4C-8D24-E3336563A4A6  ONLINE       0     0     0
       media-1459AFC7-CFAE-F846-BE89-B912BB43B6CC  ONLINE       0     0     0
       media-E860D9D4-2D15-854E-86F1-FC2E227A5D44  ONLINE       0     0     0
       media-3D60691D-D1F7-7248-ABA0-52C9383700C5  ONLINE       0     0     0

errors: No known data errors


which are 5 x 1TB Samsung 5400rpm SATA disks
tangles
 
Posts: 195
Joined: Tue Jun 17, 2014 6:54 am

Re: Are these speeds out of the norm?

Postby lundman » Tue Oct 23, 2018 1:36 am

> real 0m5.403s

is that the one that matches 25s? Or how does it compare?


Now the classic way with patches, is to use "patch". Test it:

patch --dry-run -p1 < thetextfile

where the "p1" tells it to delete 1 path, ie, "a/module/zfs" -> "module/zfs". So -p2 would be "a/module/zfs" -> "zfs".
it is nearly always -p1.

Then apply it by dropping "--dry-run".

Now there is a git way, i believe "git am" but damned if it ever works for me.
User avatar
lundman
 
Posts: 1337
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Are these speeds out of the norm?

Postby tangles » Tue Oct 23, 2018 3:15 am

I ran jdwhite's test.

zfsadm install with your patch applied: 0m5.403s

zfsadm install without the patch: 0m5.169s

So that's comparing your patch on my test pool I use being the 5 x 1TB Samsung SATA 5400rpm disks…

changing jdwhites to loop to 1000 instead of just 100…. will post new times shortly…

whoa!

no patch applied looping a 1000 times now instead of 100:
Code: Select all
cMacPro:sammy madmin$ ./test.bash
Creating files...
Removing files...

real   0m55.479s
user   0m0.026s
sys   0m0.240s
cMacPro:sammy madmin$


Deleted all trace of ZFS, restarted and and ran zfsadm with patch applied:
Code: Select all
cMacPro:sammy madmin$ ./test.bash
Creating files...
Removing files...

real   0m0.134s
user   0m0.021s
sys   0m0.109s
cMacPro:sammy madmin$ vi test.bash


not sure what's going on here…

how can 1000 loop be faster than the 100 loop… I've done something wrong...
Last edited by tangles on Tue Oct 23, 2018 3:56 am, edited 1 time in total.
tangles
 
Posts: 195
Joined: Tue Jun 17, 2014 6:54 am

PreviousNext

Return to General Help

Who is online

Users browsing this forum: No registered users and 23 guests

cron