Error detected but no checksum problem

All your general support questions for OpenZFS on OS X.

Error detected but no checksum problem

Postby haer22 » Wed Jan 16, 2019 5:14 am

The following pool status confuses me. No checksum errors reported on the disks but the vdev has reported an error.
Was there some bad data written to the disk, i.e. no bitrot, the data was bad when written?

Running zfs 1.8.1-1 on Sierra 10.12.6

Code: Select all
[ihecc:~] root# zstatus -v
  pool: time
 state: ONLINE
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub in progress since Wed Jan 16 12:23:06 2019
   1,55T scanned out of 5,22T at 270M/s, 3h57m to go
    0 repaired, 29,65% done
config:

   NAME                                            STATE     READ WRITE CKSUM
   time                                            ONLINE       0     0     1
     raidz2-0                                      ONLINE       0     0     2
       disk11                                      ONLINE       0     0     0
       disk6                                       ONLINE       0     0     0
       disk8                                       ONLINE       0     0     0
       disk9                                       ONLINE       0     0     0
   logs
     disk5s2                                       ONLINE       0     0     0
   cache
     disk5s3                                       ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        time/zJulius@2018-12-04_01.01.00--9w:/Shared Items/Backups/julius.sparsebundle/bands/98f
haer22
 
Posts: 123
Joined: Sun Mar 23, 2014 2:13 am

Re: Error detected but no checksum problem

Postby lundman » Thu Jan 17, 2019 4:15 pm

That is curious, that it uses "permanent" but scrub hasn't finished yet, so has it tried to repair it or what. Be interesting to see what it says once it finishes.
User avatar
lundman
 
Posts: 564
Joined: Thu Mar 06, 2014 2:05 pm
Location: Tokyo, Japan

Re: Error detected but no checksum problem

Postby haer22 » Fri Jan 18, 2019 12:43 am

Scrubbing found a total of 4 errors in the same way. I.e., no disk errors just checksum errors on the vdev and upmost level. Another pool I have show the same strange problem, i.e. 4 files with errors but no checksum errors.

I removed the snapshots with the offending files so mow it looks like this:
Code: Select all
errors: Permanent errors have been detected in the following files:
        <0x1b326>:<0x192>
        <0x1b35e>:<0x192>
        <0x1b467>:<0x192>
        <0x1b373>:<0x192>

Started another scrub. Hopefully it will remove the permanent errors. We will see later this evening.

I'll be back...

p.s. I am running 1.8.2 on mojave
haer22
 
Posts: 123
Joined: Sun Mar 23, 2014 2:13 am

Re: Error detected but no checksum problem

Postby haer22 » Sun Jan 20, 2019 2:29 am

OK, now I have isolated a scrub with the strange stuff. First a zpool status before the scrub and then one directly after. No checksums on the disks logged but errors on the vdev and pool level logged.
Only one vdev in the pool, still it has twice the error count as the pool.
Any ideas?
Are the caches or logs involved?

Code: Select all
2019-01-20_01:05:00_(Sun) SCRUBDAY!
  pool: time
 state: ONLINE
  scan: scrub in progress since Sun Jan 20 01:05:00 2019
   32.3M scanned out of 5.32T at 4.04M/s, 383h16m to go
    0 repaired, 0.00% done
config:
   NAME                                            STATE     READ WRITE CKSUM
   time                                            ONLINE       0     0     0
     raidz2-0                                      ONLINE       0     0     0
       media-00AF7F6B-1E6F-9C41-9742-F81B5F5560EA  ONLINE       0     0     0
       media-EBF19290-FBC4-CA47-A48D-DC407469D3B7  ONLINE       0     0     0
       media-5FADEE17-7329-C14C-B250-2E757D275283  ONLINE       0     0     0
       media-02DFE210-0239-BC4C-A0C3-60D543916517  ONLINE       0     0     0
   logs
     media-FF82AA63-ADF4-4EE2-BECE-3F3DAA586D30    ONLINE       0     0     0
     media-0EE5CCB9-B62D-48AD-9A8C-D9597513348F    ONLINE       0     0     0
   cache
     media-E4973EF4-4F94-4BE2-AED4-AF9EECE8C46A    ONLINE       0     0     0
errors: No known data errors

Code: Select all
2019-01-20 06:30:00 (Sun) time scrub repaired 0 in 5h10m with 3 errors on Sun Jan 20 06:15:28 2019
  pool: time
 state: ONLINE
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 5h10m with 3 errors on Sun Jan 20 06:15:28 2019
config:
   NAME                                            STATE     READ WRITE CKSUM
   time                                            ONLINE       0     0     3
     raidz2-0                                      ONLINE       0     0     6
       media-00AF7F6B-1E6F-9C41-9742-F81B5F5560EA  ONLINE       0     0     0
       media-EBF19290-FBC4-CA47-A48D-DC407469D3B7  ONLINE       0     0     0
       media-5FADEE17-7329-C14C-B250-2E757D275283  ONLINE       0     0     0
       media-02DFE210-0239-BC4C-A0C3-60D543916517  ONLINE       0     0     0
   logs
     media-FF82AA63-ADF4-4EE2-BECE-3F3DAA586D30    ONLINE       0     0     0
     media-0EE5CCB9-B62D-48AD-9A8C-D9597513348F    ONLINE       0     0     0
   cache
     media-E4973EF4-4F94-4BE2-AED4-AF9EECE8C46A    ONLINE       0     0     0
errors: Permanent errors have been detected in the following files:
        time/zTime@2019-01-02_00.01.00--9w:/OLDIES/localTM.sparsebundle/bands/1013
        time/zTime@2019-01-02_00.01.00--9w:/OLDIES/localTM.sparsebundle/bands/10d3
        time/zTime@2019-01-02_00.01.00--9w:/Shared Items/Backups/Sture.sparsebundle/bands/84
haer22
 
Posts: 123
Joined: Sun Mar 23, 2014 2:13 am

Re: Error detected but no checksum problem

Postby haer22 » Sun Jan 20, 2019 2:40 am

Trying to read one of the offending files I get an "Input/output error". BUT the error count from zpool status does not change! Is the error detected somewhere else in the system apart from ZFS? If zfs detected it I assume some error count would have been increased. Or?

Code: Select all
[ihecc:~] root# sum /Volumes/zTime/Shared\ Items/Backups/Sture.sparsebundle/bands/84
sum: /Volumes/zTime/Shared Items/Backups/Sture.sparsebundle/bands/84: Input/output error
[ihecc:~] root# zpool status -v time
  pool: time
 state: ONLINE
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 5h10m with 3 errors on Sun Jan 20 06:15:28 2019
config:
   NAME                                            STATE     READ WRITE CKSUM
   time                                            ONLINE       0     0     3
     raidz2-0                                      ONLINE       0     0     6
       media-00AF7F6B-1E6F-9C41-9742-F81B5F5560EA  ONLINE       0     0     0
       media-EBF19290-FBC4-CA47-A48D-DC407469D3B7  ONLINE       0     0     0
       media-5FADEE17-7329-C14C-B250-2E757D275283  ONLINE       0     0     0
       media-02DFE210-0239-BC4C-A0C3-60D543916517  ONLINE       0     0     0
   logs
     media-FF82AA63-ADF4-4EE2-BECE-3F3DAA586D30    ONLINE       0     0     0
     media-0EE5CCB9-B62D-48AD-9A8C-D9597513348F    ONLINE       0     0     0
   cache
     media-E4973EF4-4F94-4BE2-AED4-AF9EECE8C46A    ONLINE       0     0     0
errors: Permanent errors have been detected in the following files:
        time/zTime@2019-01-02_00.01.00--9w:/OLDIES/localTM.sparsebundle/bands/1013
        time/zTime@2019-01-02_00.01.00--9w:/OLDIES/localTM.sparsebundle/bands/10d3
        time/zTime@2019-01-02_00.01.00--9w:/Shared Items/Backups/Sture.sparsebundle/bands/84
[ihecc:~] root#
haer22
 
Posts: 123
Joined: Sun Mar 23, 2014 2:13 am


Return to General Help

Who is online

Users browsing this forum: No registered users and 1 guest

cron