File-based vdevs are useful for off-site backups. Set the file to whatever size you want, with whatever replication level you want, send in whatever data you want, export the pool, and back up the files. You should be able to import the pool again on any platform supporting at least the same zpool and zfs versions ("zpool upgrade"), and recover the datasets whatever way you want.
The key here is that you can set a replication level and other pool and dataset properties, and the ability to use zfs send/receive, including for incremental backups. You can of course also create whatever datasets you like and copy data in using tools like "/usr/bin/rsync -avhPE /Volumes/HFS-data/ /Volumes/filepool/HFS-data-archive/".
Here's an example, using a small bit of data. Imagine the mkfile being sized for DVD-Rs for example, so after export you could burn them and send them away in a couple of envelopes, and at the destination copy the data back from the DVD-Rs into files. Lose an envelope? Well, you're raidz3, no problem, you can still import a degraded pool. Scratches on one or more DVDs leading to lost blocks? Also no problem, a pool with sufficient replication will likely self-repair.
The vdev files are also suitable for network transfers of whatever type you want; there's nothing special about the file vdevs in terms of metadata - it's only the data in the files themselves that's needed.
- Code: Select all
cla:ssdpool # mkfile 256m d1
cla:ssdpool # mkfile 256m d2
cla:ssdpool # mkfile 256m d3
cla:ssdpool # mkfile 256m d4
cla:ssdpool # zpool create -O checksum=sha256 filepool raidz2 /Volumes/ssdpool/d1 /Volumes/ssdpool/d2 /Volumes/ssdpool/d3 /Volumes/ssdpool/d4
cla:ssdpool # zpool status -v filepool
pool: filepool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
filepool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
/Volumes/ssdpool/d1 ONLINE 0 0 0
/Volumes/ssdpool/d2 ONLINE 0 0 0
/Volumes/ssdpool/d3 ONLINE 0 0 0
/Volumes/ssdpool/d4 ONLINE 0 0 0
errors: No known data errors
cla:ssdpool # zpool iostat -v filepool
capacity operations bandwidth
pool alloc free read write read write
----------------------- ------ ------ ------ ------ ------ ------
filepool 1.11Mi 999Mi 0 16 369 157Ki
raidz2 1.11Mi 999Mi 0 16 369 157Ki
/Volumes/ssdpool/d1 - - 0 11 529 109Ki
/Volumes/ssdpool/d2 - - 0 11 639 109Ki
/Volumes/ssdpool/d3 - - 0 11 569 109Ki
/Volumes/ssdpool/d4 - - 0 11 549 109Ki
----------------------- ------ ------ ------ ------ ------ ------
cla:ssdpool # zfs send -v Donkey@2013-04-11-055900 | zfs receive -v -u filepool/Donkey
sending from @ to Donkey@2013-04-11-055900
receiving full stream of Donkey@2013-04-11-055900 into filepool/Donkey@2013-04-11-055900
received 176KiB stream in 1 seconds (176KiB/sec)
cla:ssdpool # zfs list -o space -t all -r filepool
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
filepool 465Mi 725Ki 0 432Ki 0 293Ki
filepool/Donkey 465Mi 176Ki 0 176Ki 0 0
filepool/Donkey@2013-04-11-055900 - 0 - - - -
cla:ssdpool # zpool list filepool
NAME SIZE ALLOC FREE CAP HEALTH ALTROOT
filepool 1000Mi 1.49Mi 999Mi 0% ONLINE -
cla:ssdpool # zpool export filepool
cla:ssdpool # ls -l d1 d2 d3 d4
-rw------T 1 root wheel 268435456 14 Apr 12:46 d1
-rw------T 1 root wheel 268435456 14 Apr 12:46 d2
-rw------T 1 root wheel 268435456 14 Apr 12:46 d3
-rw------T 1 root wheel 268435456 14 Apr 12:46 d4
cla:ssdpool # /usr/bin/rsync -avhPE d[1-4] some.backup.host:/Volumes/archive
Naturally you could keep the exported files in a directory that is backed up by your favourite "cloud" system. It would even be reasonably efficient if the "cloud" system archives only changed ranges within files, like rsync can.
Of course, retrieving *one* file would require a bit of work - get the files, zpool import, and copy from the datasets directly, or alternatively, zfs clone a snapshot first and copy from the clone.
There is thus value in using a file-based "cloud" backup system where you can retrieve the one smile file that you've just damaged or destroyed. File-based backups however have their deficiencies too as you have been discussing the thread. They are also horrendous for recovering large numbers of files.
Multiple, overlapping backup strategies are useful because you can turn to whatever backup set has the best fitting restoration/recovery scheme for different emergencies.