Is the following possible?

All your general support questions for OpenZFS on OS X.

Is the following possible?

Postby tangles » Thu Jul 11, 2019 5:37 am

Hi,

Here's the setup:
I have 48GB of RAM in an X58 system with Xeon CPU.
2 pairs of mirrored 12TB disks. (4 total).
Server does nothing but serve up files via SMB over 10 Gigabit link.

Here's what I'm seeing:
Using the Finder to copy data via SMB (so async writes) writing from Mac Mini with 10GB Nic to server with 10GB Nic and I see 1.1GB/s for about 5 seconds, and then transfer drops down to about 170MB/sec.
So that means about ~6GB of data is written to RAM before ZFS decides to commit RAM to the disks? After this, the transfer drops down to the sum of the vdev spindle speeds for the remaining data until the transfer is complete.

Here's what I'm wanting:
Because I have 48GB RAM, I'm wanting the RAM to be filled to at least 20GB before ZFS starts writing to the disks from RAM.
The reason for this is because almost all my files I ever send to the server don't exceed 20GB. Plus it would be nice to use the RAM for async writes because there's no need for any cache devices with this setup.

I have played with all sorts of sysctl commands in an attempt to make this happen, but no matter what I try, it's always slowing down after ~5 seconds.

What is/are the sysctl settings to achieve the above? or is this 5 seconds (or the equivalent of 5 seconds of data being written) hard coded somewhere?

I've found heaps of weblinks such as https://www.freebsd.org/doc/handbook/zfs-advanced.html describing tunables such as vfs.zfs.txg.timeout but I'm just not having any luck achieving a 15-20sec duration of writing to RAM or ~20GB ram fill equivalent.

Can anyone help with this?

ta.
tangles
 
Posts: 149
Joined: Tue Jun 17, 2014 6:54 am

Re: Is the following possible?

Postby tangles » Mon Jul 15, 2019 1:29 am

So I take it that it's not possible to configure ZFS to have a "Write ARC" of a desired amount?

I'm not finding anything with the Google Machine… :cry:
tangles
 
Posts: 149
Joined: Tue Jun 17, 2014 6:54 am

Re: Is the following possible?

Postby lundman » Mon Jul 15, 2019 3:45 pm

Many tunables that are involved. Of course the arc_max - but that should already be big enough, then there is the pool_sync that runs every now and then, then the txg sync that runs every 5s and so on. You would have to increase the in-flight data to be much bigger to force it to skip writing each txg - which almost sounds unsafe :)

The real solution is to solve why writes are occasionally so poor - but that has proven to be quite the challenge.
User avatar
lundman
 
Posts: 601
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Is the following possible?

Postby tangles » Tue Jul 16, 2019 3:17 am

Hi Lundy,

Thank you for replying.

I had a go with sysctl using:
Code: Select all
vfs.zfs.txg.timeout   15

vfs.zfs.vdev.async_read_max_active   32
vfs.zfs.vdev.async_read_min_active   8

vfs.zfs.vdev.async_write_max_active   32
vfs.zfs.vdev.async_write_min_active   8

vfs.zfs.vdev.scrub_max_active   64
vfs.zfs.vdev.scrub_min_active   24

vfs.zfs.vdev.sync_read_max_active   32
vfs.zfs.vdev.sync_read_min_active   8

Made no change to push more than the default of ~5 seconds/~4GB of async writes into RAM before starting to write to disks.

I just wish there was a tunable to specify how much RAM could be allocated to async writes before it's flushed to disk. It could have a hardcoded limit of 0.75xtotal_ram to ensure things remain safe.

cheers,
tangles
 
Posts: 149
Joined: Tue Jun 17, 2014 6:54 am

Re: Is the following possible?

Postby kingneutron » Fri Jul 26, 2019 1:12 pm

> Using the Finder to copy data via SMB

--Finder could be a bottleneck. Can you try using tar and 'pv' or 'buffer'? PV is available via both Brew or Macports

alias pvv='pv -t -r -b -W -i 2 -B 200M ' # try 50M if it wont do 200

Example: (using bash shell)

Code: Select all
cd sourcedir
date; time tar cpf - * |pvv |(cd sambadir && tar xpvf -); date


Edit: You could also try copying the fileset to a RAMdisk first, and then troubleshoot the receiving side:

https://www.techjunkie.com/how-to-creat ... -mac-os-x/

--Just be advised that the OSX ZFS project is currently not geared toward speed ( as stated on their website** ) and you might want to try dual-booting Linux on the Samba destination side...

** https://openzfsonosx.org/wiki/Performance
kingneutron
 
Posts: 5
Joined: Sat Mar 16, 2019 4:37 pm

Re: Is the following possible?

Postby tangles » Mon Aug 26, 2019 5:09 pm

So I tested with two PCIe flash storage devices as well as two SAS SSDs in the same server.

Because they were able to commit writes to the disks faster than the 12TB spindles, I was able to maintain 1.1GB/sec in the Finder for longer. So Finder is not the bottleneck.

The PCIe flash didn't slow down at all… 1.1GB/sec for files up to 20GB no probs. The SAS SSD's did eventually fall behind as their I/O is not as fast as 10GB network I/O. The sad thing is that the 48GB or RAM was hardly used and I had almost 30GB free.

So ZFS is geared towards using memory for reads more so than for writes.

It would be nice to have a parameter that configures ZFS to favour a write environment or a read environment for a server.

Or even better to assign a mirrored (and very fast) vdev to be dedicated to write buffering. (Fusion drive anyone?)

<anyone-game?>Would be fun to see if it's possible to make a Fusion disk using ZFS spindles for the slow bit, and PCIe flash for the fast bit…</anyone-game?>

oh well.
tangles
 
Posts: 149
Joined: Tue Jun 17, 2014 6:54 am

Re: Is the following possible?

Postby tangles » Thu Sep 12, 2019 8:33 pm

So I had a play with freeNAS.

I created a mirror pool of 2 x 2TB disks

I added another pair to the pool being 2 x PCIe 128GB flash sticks

zpool iostat -v $pool shows that the PCIe flash storage is being written to more so than the spindles.
This will obviously peter out though given the size discrepancies between the vdevs.

I used zpool remove $pool $vdev to remove the flash storage which took about 5 minutes to move all the data over to the 2x2TB vdev. (bout 60GB)

I could see that an extra 5MB of memory is now used to "reference" the pointers of the now relocated data.

I added the PCIe flash back to the pool and sure enough, my write speeds increased again.

So. This is definitely a way to mimic a Mac Fusion drive, but I'm not sure I would want to perform the above many times…
Using larger M.2 cards would help in not having to remove/add as frequently.

The interesting thing is that the RAM buffer does not have this issue or reserving memory for pointers when data moves from RAM to spindles when it's flushed.

It would be really nice to have the ability to tell ZFS to use a (fast) vdev instead of RAM to keep I/O speeds high.
tangles
 
Posts: 149
Joined: Tue Jun 17, 2014 6:54 am

Re: Is the following possible?

Postby lundman » Thu Sep 12, 2019 11:49 pm

Is that something Don Brady's work on "Pool allocation classes" helps with? You could at least set metadata to go to fast storage?
User avatar
lundman
 
Posts: 601
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Is the following possible?

Postby tangles » Fri Sep 13, 2019 3:22 pm

Interesting stuff Don is working on.
I looked at his slides and am I correct in that he’s mainly focusing on metadata and only touches on application data for small (32K) blocks?
This would still speed things up though, but am not sure how beneficial with environments of large async writes.

Need to read more.
tangles
 
Posts: 149
Joined: Tue Jun 17, 2014 6:54 am


Return to General Help

Who is online

Users browsing this forum: No registered users and 1 guest

cron