When I setup my ZFS storage I initially dismissed deduplication since the bulk of my data (in terms of capacity used) is unique, so there would little or no savings from deduplication.
However, I do have a working directory where I use tools that have a tendency to copy a lot of data with only small modifications (usually in header blocks), so I'm thinking of enabling deduplication only for this directory (which I'll put in its own new dataset).
What I'm wondering though is what my expected memory impact will be when accounting for the fact that files are only in this directory temporarily (while being worked on, as once complete they're copied elsewhere).
For example, let's say I have 10gb file and it's copied five times. With deduplication enabled this should only result in 10gb of data in the dataset (plus change) with, if I'm working it out correctly, a deduplication table of maybe 80mb (320 bytes per dedup record multiplied by the number of unique records?). Now let's say I move the file out of the working dataset and into its final location; will that 80mb of RAM be freed from the deduplication table (assuming no snapshots etc.) since the records no longer exist in that dataset?
My hope is that I can essentially have a deduplication setup in which the memory usage rises while active, then drops back to near zero when done. If that's not how it works while active then are there alternatives; for example, are there any drawbacks to enabling deduplication only temporarily, i.e- turn it when I start a task, then turn it off again once I've completed it?