De-duplication

Moderators: jhartley, MSR734, nola

De-duplication

Post by grahamperrin » Mon Jan 14, 2013 1:39 am

At http://getgreenbytes.com/solutions/zevo/

No Deduplication

We all know that deduplication is a toxic feature in standard ZFS and will be worse in systems with limited RAM (like Macs). Thus, we are removing it to protect the viability of ZFS on the Mac.


At viewtopic.php?p=3746#p3746

mk01 wrote:I need dedup …
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: De-duplication

Post by mk01 » Mon Jan 14, 2013 8:50 am

it seems like we have to accept the decision - but more or less business decision.
mk01 Offline


 
Posts: 65
Joined: Mon Sep 17, 2012 1:16 am

Costs, benefits

Post by grahamperrin » Wed Jan 16, 2013 9:26 pm

We can wish for the feature in a future version of edition of ZEVO :-)

In the meantime: what's the most cost-effective way of using a de-duplicating ZFS setup with a Mac? Maybe FreeNAS with masses of memory?
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: De-duplication

Post by mk01 » Thu Jan 17, 2013 6:40 am

until you get at least dedup ratio of 2 or more (and more is the main criteria here), it's not cost effective at all. this simply needs to be said (because the memory requirements of DDT tables). if you consider, that you need to operate with DDT tables constantly during writes (does not matter, if the data are deduped or not), you need ~ 3-5GB of DDT information at hand for each 1TB of data. imagine a 3 pairs of 2TB mirrored disks (makes 6TB in total), you can end up (of course depending on the block size) with 30GB of DDT tables. and this is not a fairy tail, that's life. + you have to know, that standard implementation and setup allows only 25% of ARC cache to use for metadata (I don't know the specs of ZEVO CE), what means, that you would need 120GB of memory installed for such setup.

Who thinks of that in the very moment, when is issuing zfs set dedup=on on the filesystem ??? That's why I can understand the (business) decision of green bytes not to allow it. Who want's to be spoiled for a stupid idea of a (do-it-first think-later) customer?

if you build up a cheap NAS, used as a media storage or backups for snapshots from other systems, the only usable solution is to use SSD as a cache. For most systems you can't even fit such much memory to a machine.

Otherwise you can't get reliable write speed and latency of the system soon after 1-2TB of data stored (for a standard system with 8-16gb ram, no ssd cache).

And why I wished to have dedup? At the server where everything ends in my house (network account data, backups, media files, virtual machines for development), I have currently dedup of ~ 4. Firstly I wanted to stay at Macosx only setup, but due to the lack of dedup in CE, I had to buy additional non-Mac machine and move the storage out of the Mac server - it was no more possible to put more memory / disks into it.
mk01 Offline


 
Posts: 65
Joined: Mon Sep 17, 2012 1:16 am


Return to General Discussion

Who is online

Users browsing this forum: ilovezfs and 1 guest

cron