Page 1 of 1

treat zfs like a fusion drive...

PostPosted: Sat Feb 02, 2019 11:31 pm
by tangles
Once vdev removal arrives:

1. add pcie flash to a pool that is just mirrored spindles which should speed up writes.

2. at any time, remove the pcie vdev to push data onto the spindles.

3. re-add the pcie flash for fast writes again…

does this sound like a plan?

Re: treat zfs like a fusion drive...

PostPosted: Sun Feb 03, 2019 12:40 am
by abc123
Wouldn’t using the flash as l2arc suffice?

Re: treat zfs like a fusion drive...

PostPosted: Wed Feb 06, 2019 3:30 am
by tangles
In the current climate of macOS' poor I/O performance… I'm not seeing that advantage…

Wouldn't this suggestion to improve I/O on any platform?

Just saying…

I'm using freeNAS now for everything everywhere… but do wish I could go back to macOS where I can...

Re: treat zfs like a fusion drive...

PostPosted: Wed Feb 06, 2019 4:33 am
by chrryd
I don't know the details of how ZFS works, but I would assume that writes to a mirror completed at the rate of the *slowest* device in the mirror (otherwise you risk that data isn't really mirrored...)

Re: treat zfs like a fusion drive...

PostPosted: Thu Feb 07, 2019 12:53 am
by tangles
ARC is for reads, not writes.

ZFS will send more write transactions to the vdevs that are the least full. (i.e. fewest allocated blocks)

ZIL is for writes but I see no advantage here with my pool of putting ZIL onto a fast/dedicated device for large/huge files.
Different story for small files/writes because the ZIL can turn all those small (random) writes into nice long and big sequential writes.

ARC/ZIL aside, I guess you could say a pool of type stripe would be only as fast as the slowest device for both reads and writes as all disks equally receive the same quantity of blocks when being written to.

You'd probably also notice a big drop in I/O if you added a new vdev of 2 x 4200 rpm disks to a mirror-configured pool that has nothing but 15000 rpm disks already in it. ouch!

I'm running 10 x 4TB disks (mirror config) here at home now, and the pool originally was created with 6 x 4TB disks.

When the pool was getting to 85%, I added a pair to 4TB disks to the pool. Using zpool iostat -v 1 1000, you can see that any new writes were predominantly being sent to the new vdev containing the 2 newest 4TB disks. ZFS is still protecting my data because each block is still mirrored.

Putting ZIL & ARC aside for now, what this effectively results in is a pool's write speed being limited to the I/O capabilities of the newest vdev(s) as the existing vdevs in the pool aren't really being pushed to their maximum capability anymore until the pool's distribution of blocks normalises across all disks again.

I help the normalisation along by copying a zfs dataset onto another/different pool and then delete that zfs-dataset from my main pool. (think backup pool here, so it's no hassle to do this)

This frees up space on the existing/older vdevs within my main pool.

I then copy the same zfs dataset back onto the main pool so that it gets redistributed better across all vdevs, rather than just the newest vdev.
The more I do this task, the more I even the distribution of writes across my pool's vdevs and so attain faster write speeds.

In fact, I can run this task at any time as you don't have to have new disks recently added. It's actually more effective if you perform this task without a new disks/vdev recently added.
Because most of the files are large, I'm not seeing any fragmentation to slow down write speeds afterwards either.
The net result for my setup is faster transfers over the network because all my vdevs are writing at similar speeds.

Ideally, you would want to delete all data off your main pool and copy it back on again every time you added a new vdev. I have done this in the past to compare my write speeds of a non-fragmented pool versus a fragmented pool. I/O was not effected that I could see.

This is why I'm looking forward to vdev removal because I won't have to zfs send → delete dataset → zfs recv anymore… I'll just script my two PCIe flash devices to be removed and re-added each night.

As long as my flash-based vdev has more free space than my spindle vdevs, I should see faster writes than not having the PCIe flash at all.

I can see I'm about to purchase 2 or 4 x M.2 cards and put them onto a 2 or 4 slot PCIe adapter… :ugeek: and script something like zpool remove vdev-n zpool_name; verify identity of Flash media; zpool add mirror flash1 flash2 pool_name. (and include some serious device path/size checking to ensure I add the correct devices!)

Re: treat zfs like a fusion drive...

PostPosted: Fri Feb 08, 2019 4:08 pm
by Jimbo
Actually, ARC is used for writes. ZIL is only used for synchronous writes. This is why ZIL is often happy being quite small (synchronous writes don’t happen a lot in many use cases).

Ultimately, when ZFS has to martial all the writes to the back end vdevs, you’re limited to the speed of your spinning rust vdevs particularly when your ARC is full of writes (sync/async, whatever) - can’t dump data I’ve not written yet!

Not sure that trying to use vdev removal in this way is a great solution - it might be a stop gap while kinks are worked out in o3x - but ultimately it’s a question of understanding the IO workload and designing the pool to suit (ARC, L2ARC, ZIL, vdev type and spinning rust performance).


Re: treat zfs like a fusion drive...

PostPosted: Sat Feb 09, 2019 3:55 am
by haer22
Eh, the ARC is Adaptive Replacement Cache. It will only enhance reading. Writing not at all.

Re: treat zfs like a fusion drive...

PostPosted: Fri Feb 15, 2019 12:18 am
by Jimbo
Indeed, my bad. Writes into DRAM/buffers, not ARC. Not sure where that brain fart of mine originated from.

Anyhow, back to workload and hardware and needing to understand how the two relate, particularly when starting to push the limits of the tin.

Re: treat zfs like a fusion drive...

PostPosted: Tue Feb 19, 2019 4:28 am
by tangles
I gotta rethink this too.

From what I've read, I think there's a whole bunch of block redirects that get written into a table to track the blocks being moved off of any vdevs about to be removed.

I would not like to see what happens to the performance of a pool if it had a vdev removed regularly such as a weekly. Could negate the whole purpose of removing and adding a flash-based vdev in the first place.