The trouble here
In as few words as possible:
The most recent scrub repaired 0 with 0 errors on 2013-04-05 at 06:15.
I used fsck_hfs to verify the HFS Plus file system of a .sparsebundle that's stored in a ZFS child file system:
Volumes/tall/com.apple.backupd/macbookpro08-centrim.sparsebundle
No problems found by fsck_hfs, but considering recent incidents involving the pool I decided to experiment with DiskWarrior 4.4.
The first and second runs of DiskWarrior were disrupted by Time Machine routines; no replacement of the directory.
Before or during a third run of DiskWarrior: to avoid disruption, I switched off Time Machine.
Trouble began at 2013-04-06 10:07:56.000:
- Code: Select all
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 12288)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 12288)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 12288)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 12288)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 20480)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 20480)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 8192)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 8192)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 8192)
2013-04-06 10:07:56.000 kernel[0]: zfs_vdm_completion: media 0xffffff80320df200, device/channel is not attached (6), 0 bytes (of 16384)
2013-04-06 10:07:56.000 kernel[0]: ZFSLabelScheme:willTerminate: this 0xffffff803430f200 provider 0xffffff80320df200 'zfs vdev for 'tall''
2013-04-06 10:07:56.000 kernel[0]: ________________________________________
2013-04-06 10:07:56.000 kernel[0]: ZFS WARNING: 'error from: fs.zfs.probe_failure'
2013-04-06 10:07:56.000 kernel[0]: pool: 'tall'
2013-04-06 10:07:56.000 kernel[0]: vdev_type: 'disk'
2013-04-06 10:07:56.000 kernel[0]: vdev_path: '/dev/dsk/GPTE_99056308-F5E2-4314-852C-4DA04732A2D0'
2013-04-06 10:07:56.000 kernel[0]: parent_type: 'root'
2013-04-06 10:07:56.000 kernel[0]: prev_state: 0
2013-04-06 10:07:56.000 kernel[0]: ________________________________________
2013-04-06 10:07:56.000 kernel[0]: ZFS WARNING: 'error from: fs.zfs.data'
2013-04-06 10:07:56.000 kernel[0]: pool: 'tall'
2013-04-06 10:07:56.000 kernel[0]: ________________________________________
2013-04-06 10:07:56.000 kernel[0]: ZFS WARNING: 'error from: fs.zfs.data'
2013-04-06 10:07:56.000 kernel[0]: pool: 'tall'
…
GPTE_99056308-F5E2-4314-852C-4DA04732A2D0 links to the larger and newer of the two disks.
DiskWarrior seemed to complete its preparations a few minutes later; within its report is a 10:16 time stamp.
It was impossible to export the pool. A normal restart routine brought the Mac close to a halt (observed in verbose mode), but zpool was a stray command that the operating system could not kill so ultimately, a forced restart was required.
The pool is left with some errors, I can probably rollback to rid myself of most of those.
Critically
I did nothing to disconnect or disturb the disks, so the reported detachment is a puzzle.
Considerations
No external hub for these disks at the time. Both were directly connected to USB 2.0 ports of one hi-speed bus in the MacBookPro5,2, nothing else on that bus.
For at least some of the time, the Mac was extraordinarily busy with at least:
- concurrent runs of two commands that took probably a few hours to traverse my ZEVO home directory gjp22 (afterthought: those commands were long running but probably not significant contributors to business)
- DiskWarrior itself (example)
The first debug build of OS X SAT SMART Driver 0.6 was present – but missing 64-bit objects.
The File Anti-Virus feature of Kaspersky Security was enabled, so the comparisons by DiskWarrior took longer.
Time Machine backups within the sparse bundle disk image might include an EICAR test file … which (in combination with protection by Kaspersky Security) might have affected comparisons at the HFS Plus level, but should not affect the state of the pool at the ZFS level.
Additional details
Screenshots and other files at http://www.wuala.com/grahamperrin/publi ... allery&id= may be of interest, but I haven't finished arranging things in and around that folder so please treat that collection as volatile.
A highlight from the detailed report produced by DiskWarrior:
Disk: "Time Machine Backups"
Detected Media Errors in Time Machine Backups
For this, the report without detail seems clearer:
- Media errors were encountered during rebuilding. Some data may be missing because it could not be read from disk.
I'm not sure about that description by DiskWarrior. Considering the timing of the reported detachment of the device/channel, I suspect that media errors were truly encountered:
- after the rebuilt directory was prepared (but not committed)
- whilst comparing with the existing directory.
Comparisons
device/channel is not attached
– appears in three other topics:
- ZEVO related system hang (2012-10-03)
- System hang using CCC with ZFS after upgrade from SL to ML (2012-10-05)
- a 2012-10-31 post within the fourteen page topic Mac hard resets when mounting external HD
In the latter there's suspicion of imperfect hardware.
In my case I reckon that hardware is good, but I'm open to suggestions.
Now
I'm scrubbing the pool …