by mk01 » Thu Jan 17, 2013 6:40 am
until you get at least dedup ratio of 2 or more (and more is the main criteria here), it's not cost effective at all. this simply needs to be said (because the memory requirements of DDT tables). if you consider, that you need to operate with DDT tables constantly during writes (does not matter, if the data are deduped or not), you need ~ 3-5GB of DDT information at hand for each 1TB of data. imagine a 3 pairs of 2TB mirrored disks (makes 6TB in total), you can end up (of course depending on the block size) with 30GB of DDT tables. and this is not a fairy tail, that's life. + you have to know, that standard implementation and setup allows only 25% of ARC cache to use for metadata (I don't know the specs of ZEVO CE), what means, that you would need 120GB of memory installed for such setup.
Who thinks of that in the very moment, when is issuing zfs set dedup=on on the filesystem ??? That's why I can understand the (business) decision of green bytes not to allow it. Who want's to be spoiled for a stupid idea of a (do-it-first think-later) customer?
if you build up a cheap NAS, used as a media storage or backups for snapshots from other systems, the only usable solution is to use SSD as a cache. For most systems you can't even fit such much memory to a machine.
Otherwise you can't get reliable write speed and latency of the system soon after 1-2TB of data stored (for a standard system with 8-16gb ram, no ssd cache).
And why I wished to have dedup? At the server where everything ends in my house (network account data, backups, media files, virtual machines for development), I have currently dedup of ~ 4. Firstly I wanted to stay at Macosx only setup, but due to the lack of dedup in CE, I had to buy additional non-Mac machine and move the storage out of the Mac server - it was no more possible to put more memory / disks into it.