my devices with known imperfections or suspected issues

Moderators: jhartley, MSR734, nola

my devices with known imperfections or suspected issues

Post by grahamperrin » Fri Sep 28, 2012 1:58 am

For reference only … to avoid repeating myself in other topics. A shortlist.

500 GB LaCie Big Disk Extreme (300794EK)

Years ago this device (costly and very new at the time) was knocked off a desk, whilst running TechTool Pro 4.1.2 to check for bad blocks. What a twit. I recall a need to reinstall firmware as a result of the accident. I treat all data on these disks – two in the one enclosure – as disposable.

How imperfect? It's the sort of disk that consistently appears error free when given to HFS Plus and checked with fsck_hfs. When full, or nearly full then scrubbed by ZEVO it's typical to find less than 2 MB repaired with zero errors. In the real world people do unknowingly use hard disk drives with comparable imperfections, so for test purposes (exposing bugs etc.) I reckon it's reasonable to have this LaCie device sometimes in the mix.

OT: I now want to 'break away' from that LaCie firmware … but knocking it off a desk is probably not the answer ;)
Last edited by grahamperrin on Fri Mar 29, 2013 11:44 am, edited 1 time in total.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Room 101?

Post by grahamperrin » Fri Sep 28, 2012 2:03 am

Spooky – topic 101, viewtopic.php?t=101 … an omen? What or who should be sent to Room 101?
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

cross reference

Post by grahamperrin » Sun Oct 14, 2012 9:27 am

Today, unexpectedly, the old 500 GB LaCie Big Disk Extreme (300794EK) appeared – for a while – as two disks instead of one. An extremely rare incident, so I took the opportunity to experiment …

insufficient replicas following temporary 'split' of a LaCie
Last edited by grahamperrin on Fri Mar 29, 2013 11:44 am, edited 1 time in total.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

thoughts, some anecdotal

Post by grahamperrin » Sat Nov 10, 2012 2:02 am

Borrowing from viewtopic.php?p=2739#p2739

redwards wrote:Western Digital USB drives hide drive failures and SMART status when hooked up via USB. I have had drives that seemed to work just fine, but when pulled from their cases and attached directly to a computer using SATA, they fail manufacturers SMART testing and such. The drives I were using had 2TB WD Green drives and I even had problems with them after running badblocks on them for weeks.

I would recommend ensuring that your drives get scanned!


… and on the subject of luck under is Zevo safe enough?

Rewind a year or so. Everything that I saw and read (shared experiences) before Lion was released should have led to trouble-free installations, trouble-free upgrades from Snow Leopard, for the masses. I rarely bother with Apple Support Communities (bugs with the forum drove me away) but I decided to drop in, for a few days, immediately after Lion was released. Anecdotally, many of the installation-related problems screamed one thing to me: silent hard disk failure! … so I left the poor users, wallowing in a mire of "Me too!" and "I have exactly the same problem!", uppercase shouting with five exclamation marks in a row and so on, where in fact, most people's problems were not the same :geek:

I can't estimate what percentage of installations were truly bugged by silent hard disk failures (we'll never know) but …

… the mismatch – between (a) pre-release test experiences (Developer Preview etc.) and (b) post-release observation of other users' experiences – was great enough for me to think: there must be a better way. So here I am.

Consider: the multiple checksumming routines within a full installation of OS X probably do not extend to checksumming the end results. With so much written and deleted during and after a full installation, it's little surprise: disks that are only marginally good may be pushed over the edge.

This is not to bash HFS Plus – echoing a sentiment in an Ars Technica review, the file system does what it says on the box. But modern uses of file systems are so much more strenuous, and users' expectations so much higher, that we need approaches (such as ZFS) that can proactively detect problems in good time.

Then for example Customer Reviews: Western Digital 1TB My Book for Mac - Apple Store (U.S.) paint not a pretty picture – two of five stars based on 168 reviews. Some of the low marks are unjustifiable but overall, it ain't purdy.

Again, I'm reluctant to product-bash. How many of those reviews might be different if the users enjoyed an Apple-integrated method of properly assessing the state of a hard disk?

ZFS alone is not the panacea – we need complementary approaches to check unused space, and so on. But ZFS should be a huge step away from a past where imperfections remain unknown until it's too late.

badblocks

The observation from redwards is appreciated. I have in Ask Different a question that's unanswered, possibly redundant (I know not enough about CoreStorage):

What free or open source software can I use with Mac hardware to verify integrity of every block of a disk where CoreStorage is used?
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: my devices with known imperfections or suspected issues

Post by grahamperrin » Sun Nov 25, 2012 4:47 am

grahamperrin wrote: 500 GB LaCie Big Disk Extreme (300794EK)

Years ago this 500 GB device (costly and very new at the time) was knocked off a desk, probably whilst writing. What a twit. I recall a need to reinstall firmware as a result of the accident. I treat all data on these disks – two in the one enclosure – as disposable.


Recent history for that 500 GB LaCie Big Disk Extreme (300794EK):

Last edited by grahamperrin on Fri Mar 29, 2013 11:45 am, edited 1 time in total.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Another device

Post by grahamperrin » Mon Mar 04, 2013 9:37 pm

Another device for the shortlist:

1 TB LaCie Big Disk Extreme (300797EK)

Quoting from viewtopic.php?p=4165#p4165

… usually well-treated, but sometimes does not spin up properly. My habit, after making a FireWire 800 connection to the drive (or after starting the MacBookPro5,2 with a connection in place):

  1. see whether its two main volumes (one HFS Plus, one ZFS) appear in Finder
  2. if volumes do not appear, then I press the power button on the front of the drive twice (once to switch off, once to switch on) with only a second or so between the presses.


If I sleep the MacBookPro5,2 with a FireWire 800 connection to this drive, then on wake there'll be a red alert from the system for an improper eject then:

  • with luck, its two volumes (one HFS Plus, one ZFS) will disappear from Finder; and ZFS-related commands will remain usable
  • if I'm unlucky, both volumes will remain in Finder; and zfs and zpool commands will be unusable.

Last but not least, occasional errors such as these:

Code: Select all
sh-3.2$ sudo zpool status -v twoz
  pool: twoz
 state: ONLINE
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
 scan: scrub repaired 1.02Mi in 10h14m with 2 errors on Sun Mar  3 21:28:19 2013
config:

   NAME                                         STATE     READ WRITE CKSUM
   twoz                                         ONLINE       0     0     0
     GPTE_34E7E852-7E88-4FAD-B162-2AEF6D300D42  ONLINE       0     0     0  at disk5s4

errors: Permanent errors have been detected in the following files:

        twoz:/macbookpro08-centrim.sparsebundle/bands/3252
        twoz@2013-01-02-201022:/macbookpro08-centrim.sparsebundle/bands/3252
        twoz@2013-01-02-201022:/macbookpro08-centrim.sparsebundle/bands/4aca


For the spin issue I could try an alternative power supply. No rush.

For the data errors recently I increased from copies=2 to copies=3 … not the sanest approach, and certainly no substitute for a multi-disk pool with true redundancy but for my current use case with the disk in its current state: copies=3 is good enough. (Nothing of great importance is stored in the pool.) If the disk worsens significantly, then I'll move the data elsewhere and keep the disk for test purposes.

Code: Select all
macbookpro08-centrim:~ gjp22$ zpool history twoz | grep copies
2012-12-09.17:15:30 zpool create -o ashift=12 -O casesensitivity=insensitive -O compression=gzip-9 -O atime=off -O snapdir=visible -O copies=2 twoz /dev/disk8s4
2013-03-03.18:12:47 zfs set copies=3 twoz
Last edited by grahamperrin on Fri Mar 29, 2013 11:48 am, edited 3 times in total.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Afterthought

Post by grahamperrin » Wed Mar 13, 2013 1:19 am

grahamperrin wrote:
  • with luck, its two volumes (one HFS Plus, one ZFS) will disappear from Finder; and ZFS-related commands will remain usable
  • if I'm unlucky, both volumes will remain in Finder; and zfs and zpool commands will be unusable.


I wonder whether one or the other is more likely when:

  • the red alert from the operating system coincides with a scrub of the pool.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

LaCie 1 TB Big Disk Extreme, a growing number of errors

Post by grahamperrin » Sun Mar 17, 2013 10:28 am

grahamperrin wrote:Another device for the shortlist:

LaCie 1 TB Big Disk Extreme (300797EK)


For reference only, read this post alongside the errors outlined at viewtopic.php?p=4286#p4286

A summary of recent Time Machine writes to a sparse bundle disk image on this pool, with copies=3 at the dataset, plus boot times:

Code: Select all
2013-03-17 06:17:16.000 bootlog[0]: BOOT_TIME 1363501036 0
2013-03-17 06:26:51.000 bootlog[0]: BOOT_TIME 1363501611 0
2013-03-17 08:03:41.269 com.apple.backupd[392]: Copied 329351 files (3.67 GB) from volume OS.
2013-03-17 08:08:02.326 com.apple.backupd[392]: Copied 20290 files (89 MB) from volume OS.
2013-03-17 08:28:44.178 com.apple.backupd[5799]: Copied 1272 files (2 MB) from volume OS.
2013-03-17 08:29:02.570 com.apple.backupd[5799]: Copied 192 files (93 bytes) from volume OS.
2013-03-17 09:29:30.336 com.apple.backupd[8241]: Copied 2341 files (4.5 MB) from volume OS.
2013-03-17 09:29:58.117 com.apple.backupd[8241]: Copied 192 files (93 bytes) from volume OS.
2013-03-17 10:29:39.280 com.apple.backupd[10735]: Copied 5119 files (155.9 MB) from volume OS.
2013-03-17 10:30:32.245 com.apple.backupd[10735]: Copied 193 files (101 bytes) from volume OS.
2013-03-17 11:10:57.000 bootlog[0]: BOOT_TIME 1363518657 0
2013-03-17 13:49:20.283 com.apple.backupd[1206]: Copied 63142 files (821.1 MB) from volume OS.
2013-03-17 13:52:48.834 com.apple.backupd[1206]: Copied 48103 files (150 KB) from volume OS.
2013-03-17 14:15:11.613 com.apple.backupd[67885]: Copied 4916 files (303.7 MB) from volume OS.
2013-03-17 14:15:26.591 com.apple.backupd[67885]: Copied 192 files (93 bytes) from volume OS.
2013-03-17 15:13:58.511 com.apple.backupd[68450]: Copied 2756 files (3.2 MB) from volume OS.
2013-03-17 15:14:15.538 com.apple.backupd[68450]: Copied 539 files (93 bytes) from volume OS.


Amongst the full set of messages from backupd, nothing to indicate a problem. But qualitatively, there was a remarkable delay (a few minutes) between:

  1. the time when I chose Back Up Now from the Time Machine menu; and
  2. the start of the backup:

Code: Select all
2013-03-17 15:11:56.795 com.apple.backupd[68450]: Starting manual backup


Then whilst hdiutil debugged an attached image –

Code: Select all
sudo hdiutil attach -debug /Volumes/twoz/macbookpro08-centrim.sparsebundle


– I performed a live verification with options for extra debugging information:

Code: Select all
sh-3.2$ sw_vers
ProductName:   Mac OS X
ProductVersion:   10.8.3
BuildVersion:   12D78
sh-3.2$ sudo /sbin/fsck_hfs -l -D 0x0001 -D 0x0002 -D 0x0010 -D 0x0020 /dev/disk12s2

/dev/rdisk12s2: fsck_hfs run at Sun Mar 17 15:39:40 2013
** /dev/rdisk12s2 (NO WRITE)
   Using cacheBlockSize=32K cacheTotalBlock=32768 cacheSize=1048576K.
   Executing fsck_hfs (version diskdev_cmds-557.3~1).
** Performing live verification.
** Checking Journaled HFS Plus volume.
** Detected a case-sensitive volume.
   The volume name is Time Machine Backups
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking multi-linked directories.
** Checking volume bitmap.
** Checking volume information.
** The volume Time Machine Backups appears to be OK.
sh-3.2$ date
Sun 17 Mar 2013 17:35:22 GMT
sh-3.2$ hdiutil detach /Volumes/Time\ Machine\ Backups
"disk12" unmounted.
"disk12" ejected.
sh-3.2$ date
Sun 17 Mar 2013 17:35:49 GMT
sh-3.2$


At the HFS Plus level with fsck_hfs the volume appears to be OK. At an underlying level the number of errors is alarming.

Other notes to follow. The output from hdiutil was very verbose – 768.6 MB when saved as a a .txt file. Related, in Ask Different:

Last edited by grahamperrin on Fri Mar 29, 2013 11:38 am, edited 1 time in total.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: my devices with known imperfections or suspected issues

Post by raattgift » Mon Mar 18, 2013 1:25 pm

fsck_hfs only does a consistency test on the metadata, and repairs broken b*-trees, references from one into an empty slot in the other, etc. *All* of the user data could be corrupt, and fsck_hfs would say nothing.

as it's a time machine backup internally, you could use "tmutil compare -a" (you probably want to look at the man page and use the -I option). This subsumes the "-d" flag of tmutil compare, which will force a read of both the archived data and the original. It will take a long time.

Alternatively, use script(1) or otherwise save the output of rsync -x --dry-run --checksum -avhE / "`tmutil latestbackup`/bootvolumename" ; --checksum forces a read of everything in both directory trees, no matter what the metadata says in terms of dates and sizes. Again, this will take a long time, but has lots of ways to log the differences and any errors that arise.

Your previous errors all seemed to be in snapshots. The blocks stay in the snapshot; any data added to the band by the DMG layer will cause a COW difference, and any on the fly repairing by ZFS (during a normal read or a scrub) will also cause a COW difference, *iff* there is a good copy available.
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm

fsck_hfs -dfRace and other forced rebuilds

Post by grahamperrin » Sun Mar 24, 2013 12:59 pm

Re viewtopic.php?p=4295#p4295

From past experience with orderly (and debatably less orderly) forced rebuilds of B-tree files, I shouldn't rush to apply force.

Where fsck_hfs reports that it can't repair a disk, I tend to trust that first report. In edge cases I find that a subsequent run (not necessarily the second) will report successful repairs, where previously the file system was reportedly irreparable, but then I might not trust the reported repair.

Where fsck_hfs reports that the file system is irreparable, DiskWarrior might do better – if there's enough memory. With a maximum of 8 GB in my MacBookPro5,2 I usually found that DiskWarrior simply couldn't handle the file systems that I wished to repair … never massive amounts of data (no disk larger than 1 TB at the time) but enough to present an obstacle to DiskWarrior.

ilovezfs wrote:… I might do fsck_hfs -dfRa, then -dfRc, and then -dfRe …


If I should force a trio of rebuilds, I'd do so in a different order:

  1. extents overflow B-tree (flag e)
  2. catalog overflow B-tree (flag c)
  3. extended attributes B-tree (flag a).

YMMV.
Last edited by grahamperrin on Fri Mar 29, 2013 8:28 pm, edited 2 times in total.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Next

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 0 guests

cron