Zpool checksum errors

This forum is to find answers to problems you may be having with ZEVO Community Edition.

Moderators: jhartley, MSR734, nola

Zpool checksum errors

Post by mnos3c » Sat Mar 30, 2013 3:25 pm

I'm testing a migration of data.
I've runned badblocks 4 times on a new 4tb disk before doing this with 0 errors (in smart 0 realloc).
Now during the transfer I noticed this:

Code: Select all
sudo zpool status -v
  pool: tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
 scan: none requested
config:

   NAME                                         STATE     READ WRITE CKSUM
   tank                                         ONLINE       0     0     3
     GPTE_ECE4AF23-88D9-4D2D-8F70-D7F4F129BFAC  ONLINE       0     0    12  at disk3s2

errors: Permanent errors have been detected in the following files:

        tank:/My Disk/Docs/Java SE 1.5.0 docs/tooldocs/solaris/jdb-TIDY.html
        tank:/My Disk/Docs/Java SE 1.5.0 docs/tooldocs/solaris/jdb.html
        tank:/My Disk/Docs/Java SE 1.5.0 docs/guide/security/jgss/spec/com/sun/security/jgss/GSSUtil.html



What does it mean? My idea was to migrate all data to this disk, then attach 2x2tb concatenated to do a mirror but at the moment the pool consist of a single disk (I've got original files on the 2x2tb hfs disks btw).
Every disk involved is fw800 (2x2 and 1x4).
Migration with ccc.
mnos3c Offline


 
Posts: 33
Joined: Wed Dec 26, 2012 6:40 am

Clarification please

Post by grahamperrin » Sat Mar 30, 2013 9:30 pm

Is this the mystery third disk in the topic where you hot unplugged the last?

If so, did you export this pool tank before the disruption?

How was each of the four badblocks commands structured?

If you allowed the default (read-only) then the runs might complete relatively quickly, but if I understand correctly: simple readability of a block by badblocks is not comparable to the checksumming capabilities of ZFS.

If you applied option -n (non-destructive read-write mode) or -w (write-mode) then I'd expect runs to take considerably longer.

Consider this partial Drive Genius extended scan of all blocks of a 1 TB drive with a FireWire 800 connection. Eleven days for 1 TB, so four extended scans of a 4 TB disk might take around one hundred and seventy six days, more than five months.

From [ZFS] Random ZFS corruption - The FreeBSD Forums (2012-08-01, unanswered):

… pass several runs of badblocks and ONLY give me errors on ZFS metadata …


Reference: Ubuntu Manpage: badblocks - search a device for bad blocks
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Clarification please

Post by mnos3c » Sun Mar 31, 2013 1:03 am

grahamperrin wrote:Is this the mystery third disk in the topic where you hot unplugged the last?

If so, did you export this pool tank before the disruption?

How was each of the four badblocks commands structured?

If you allowed the default (read-only) then the runs might complete relatively quickly, but if I understand correctly: simple readability of a block by badblocks is not comparable to the checksumming capabilities of ZFS.

If you applied option -n (non-destructive read-write mode) or -w (write-mode) then I'd expect runs to take considerably longer.

Consider this partial Drive Genius extended scan of all blocks of a 1 TB drive with a FireWire 800 connection. Eleven days for 1 TB, so four extended scans of a 4 TB disk might take around one hundred and seventy six days, more than five months.

From [ZFS] Random ZFS corruption - The FreeBSD Forums (2012-08-01, unanswered):

… pass several runs of badblocks and ONLY give me errors on ZFS metadata …


Reference: Ubuntu Manpage: badblocks - search a device for bad blocks



After unboxing I ran a surface scan with techtool pro (38h). Then I read that badblocks was a better choice so I repetead test.
I ran badblocks onto a dedicated pc with ubuntu live, disk connected via sata 6.
I opened 4 terminal and give 2 instances with -n and 2 instances with -wsv at intervals of 10 minutes (Started 1. 10minutes after started 2, etc...)
After 70hours it was around 75% -n and 50% -w.
Yesterday I decided to stop badblocks and verify smart values: nothing changed (I compared result with first run smartctl result after unboxing, it was a new disk)
I don't remeber if I exported pool or just destroyed...I've run several test with zfs with multiple command combo in last 40h.
mnos3c Offline


 
Posts: 33
Joined: Wed Dec 26, 2012 6:40 am

Re: Zpool checksum errors

Post by mnos3c » Sun Mar 31, 2013 2:13 am

The migration with CCC failed because of crash/hanging system.
From Console I found this:

Code: Select all
http://pastebin.com/W2PMyU7W


Rebooted machine, the pool was cleared (no more errors).
But in destination disk instead of 330Gb transferred (noted by ccc) I found just 1,27Gb data...
I've destroyed pool and abandoned zfs test for now, too much instability.
I'll retry after upgrade system to 10.8.3 with clean install (I suspect that precedent maczfs installation and fuse cause some sort of instability).
I never had kernel panic or hanging system before (NEVER).
I've checked disk with smart after this event and no bad sector reallocation occur. The disk (according to smartctl) is healty.
mnos3c Offline


 
Posts: 33
Joined: Wed Dec 26, 2012 6:40 am

Thanks, and thoughts

Post by grahamperrin » Sun Mar 31, 2013 2:37 am

I like your approach to badblocks – it never occurred to me that for a single disk, multiple concurrent runs might be possible. I adapted your example to complement an answer to something in Ask Different – I hope you don't mind.

Thanks for the extensive testing around ZEVO!

Recalling your first post to the forum, I imagine that you have more than a few FireWire cables. Consider the possibility of a bad/marginal cable, although … 

… in my cases (800–800, 800–400, 400–800 cables, and three LaCie hard disk drives): whilst I have set aside at least one suspect cable, I blame nearly all past errors (usually revealed by zpool status or zpool scrub) on the relatively old LaCie hardware.

For yourself, add Apple Hardware Test (AHT) to the mix.

I, too, use FUSE for OS X (recently updated from 2.5.4 to 2.5.5 beta) and whilst some uses may be associated with kernel panics, I don't sense that this distribution is contributory to panics.

I, too, wonder about MacZFS. Please see:


Glancing at your paste (below, for posterity) my first thought is of a hardware issue, so think beyond the newness and apparent goodness of the disk.

Consider: whilst the kernel may be less likely to panic without the demands of ZFS (see below, ZFS is known to be useful for revealing hardware issues), if there is a hardware issue then with HFS Plus it may be difficult or impossible to detect any corruption.

Code: Select all
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9415
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9389
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9412
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9410
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9406
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9404
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9399
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9396
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9391
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 9387
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 1
30/03/13 22:44:09,000 kernel: zio_blkid: 73
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176237
30/03/13 22:44:09,000 kernel: zio_level: 2
30/03/13 22:44:09,000 kernel: zio_blkid: 0
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 0
30/03/13 22:44:09,000 kernel: zio_level: 0
30/03/13 22:44:09,000 kernel: zio_blkid: 5507
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 0
30/03/13 22:44:09,000 kernel: zio_level: 1
30/03/13 22:44:09,000 kernel: zio_blkid: 43
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 0
30/03/13 22:44:09,000 kernel: zio_level: 2
30/03/13 22:44:09,000 kernel: zio_blkid: 0
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 0
30/03/13 22:44:09,000 kernel: zio_level: 3
30/03/13 22:44:09,000 kernel: zio_blkid: 0
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 0
30/03/13 22:44:09,000 kernel: zio_level: 4
30/03/13 22:44:09,000 kernel: zio_blkid: 0
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 0
30/03/13 22:44:09,000 kernel: zio_level: 5
30/03/13 22:44:09,000 kernel: zio_blkid: 0
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 0
30/03/13 22:44:09,000 kernel: zio_level: 6
30/03/13 22:44:09,000 kernel: zio_blkid: 0
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 28
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 0
30/03/13 22:44:09,000 kernel: zio_level: -1
30/03/13 22:44:09,000 kernel: zio_blkid: 0
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.data'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: zio_err: 6
30/03/13 22:44:09,000 kernel: zio_objset: 21
30/03/13 22:44:09,000 kernel: zio_object: 176236
30/03/13 22:44:09,000 kernel: zio_le_____vel: 0
30/03/13 22:44:09,000 kernel: zio_blk______id: 0
30/03/13 22:44:09,000 kernel: _____________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.io_failure'
30/03/13 22:44:09,000 kernel: pool: 'tank'
30/03/13 22:44:09,000 kernel: ________________________________________
30/03/13 22:44:09,000 kernel: ZFS WARNING: 'error from: fs.zfs.io_failure'
30/03/13 22:44:09,000 kernel: pool: 'tank'


At viewtopic.php?p=4243#p4243

raattgift wrote:… ZFS (when used correctly) mainly lets you know you should throw a drive or cable or power supply away and replace it with a new or known-working one, and buys you time to do that without having to take your system or your data offline. I.e., it keeps your data available in the face of common problems. …


Good luck …
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Disk make and model?

Post by grahamperrin » Sun Mar 31, 2013 4:00 am

mnos3c wrote:… new 4tb disk …


What's the make and model?

Was it in your ICY BOX IB-111StUEb-Wh dock using FireWire, or is the FireWire interface integral to the enclosure of the disk?
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Disk make and model?

Post by mnos3c » Sun Mar 31, 2013 8:59 am

grahamperrin wrote:
mnos3c wrote:… new 4tb disk …


What's the make and model?

Was it in your ICY BOX IB-111StUEb-Wh dock using FireWire, or is the FireWire interface integral to the enclosure of the disk?



Thanks for your effort.
4Tb disk is a HGST Deskstar 7200rpm 64Mb cache sata 6.
It was in the icybox 'cause the oxford chip in my old wd my book studio fw800 enclosure (model 2009, withoud aes chip) does not recognize it (I use that enclosure with several disk up to 2tb without problems).
I've got also another wd studio (model 2011 with aes) but that enclosure work only with the disk shipped with.
Update situation:
I've formatted 4tb hfs+ and copied data over it with dedicated pc (triboot hackintosh, ubuntu, win7). I'm testing with IntegrityChecker if data were correctly written.
In the mean time I've created a mirror 1 with 2x1tb, one internal and one external in icybox connected fw800 trough wd studio with same cable from previous test.
I've copied over 650gb and now I'm checking with IntegrityChecker.
If no error occurs I think I can say that cable/enclosure/disk works properly?
Tested ram on macbook with techtool pro test and passed ok.
I don't have a machine with ecc ram so I must be satisfied this way...
mnos3c Offline


 
Posts: 33
Joined: Wed Dec 26, 2012 6:40 am

Hardware docks

Post by grahamperrin » Mon Apr 01, 2013 4:07 am

Generally, I'd think very carefully about the dock.

With a different dock my work colleagues and I have had both bad and good experiences. The bad experiences, I couldn't tie to bad disks alone so thoughts stray to USB 2.0 and/or the ability of a particular Mac, and its installed software and firmware, to drive a combination of external hardware.

In System Preferences, do you allow hard disks or the Mac to sleep?
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Hardware docks

Post by mnos3c » Mon Apr 01, 2013 4:35 am

grahamperrin wrote:Generally, I'd think very carefully about the dock.

With a different dock my work colleagues and I have had both bad and good experiences. The bad experiences, I couldn't tie to bad disks alone so thoughts stray to USB 2.0 and/or the ability of a particular Mac, and its installed software and firmware, to drive a combination of external hardware.

In System Preferences, do you allow hard disks or the Mac to sleep?



No sleep for my disks :-)
Btw, I've finished checking with IntegrityChecker but 2 times run, 2 times failed due to error -36 (I/O error after 8-9/h).
I think that there's some problem with cable or dock...disk seems helaty (smartctl reports).
mnos3c Offline


 
Posts: 33
Joined: Wed Dec 26, 2012 6:40 am

Sleep, a link

Post by grahamperrin » Mon Apr 01, 2013 11:30 am

mnos3c wrote:… No sleep for my disks :-)


Are you certain?

Please see my mention of a dock under:

grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom


Return to Troubleshooting

Who is online

Users browsing this forum: bileyqrkq, ilovezfs and 0 guests

cron