Degraded pool - what to do?

Moderators: jhartley, MSR734, nola

Degraded pool - what to do?

Post by hergum » Thu Feb 21, 2013 4:16 pm

I have a simple mirrored pool on external FW800 drives that just became degraded and won't mount. What to do? Should I replace the drive? (it's just a couple of months old) Can zevo/zfs fix it?

Code: Select all
 sudo zpool status -v
Password:
  pool: Studio
 state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
 scan: scrub repaired 0 in 7h58m with 0 errors on Wed Jan  9 04:26:53 2013
config:

   NAME                                           STATE     READ WRITE CKSUM
   Studio                                         DEGRADED     1     0     0
     mirror-0                                     DEGRADED     6     0     0
       GPTE_2EC26F7B-2461-4D7A-9C66-BFB5B828B25F  ONLINE       0     0     6  at disk2s2
       GPTE_01D345B9-E47B-42AB-9453-537F6ECE29D7  FAULTED      0     0     0  too many errors

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x1d>


I tried "sudo zpool clear Studio" but it seems like nothing is happening. The command does not return.

I am also getting multiple
Code: Select all
Error: 92 (checksum error) , offset 1090926067712
with various offsets reported from Growl.
hergum Offline


 
Posts: 5
Joined: Sat Sep 15, 2012 2:54 pm

Re: Degraded pool - what to do?

Post by hergum » Sun Feb 24, 2013 3:50 pm

Not much help to get here, so I tried a few more "zpool clear", "zpool export" and so on, which did nothing but get stuck. I tried to re-boot, which got stuck too. After a hard restart zfs started re-silvering all by itself. Good. I have no idea what caused the errors, and even less of an idea of why my command line efforts all got stuck. Lets hope it never happens again.
hergum Offline


 
Posts: 5
Joined: Sat Sep 15, 2012 2:54 pm

Re: Degraded pool - what to do?

Post by shuman » Sun Feb 24, 2013 4:17 pm

If you have not performed a scrub recently, you might run
Code: Select all
zpool scrub Studio

to have it check all your data. It looks like your last scrub was Jan 9th.

You can observe the progress of the scrub by running
Code: Select all
zpool status


How do you have your mirror configured? What type/interface of drives? Internal/external? ZEVO version? Snow Leopard, Lion or Mountain Lion?
- Mac Mini (Late 2012), 10.8.5, 16GB memory, pool - 2 Mirrored 3TB USB 3.0 External Drives
shuman Offline

User avatar
 
Posts: 96
Joined: Mon Sep 17, 2012 8:15 am

Re: Degraded pool - what to do?

Post by hergum » Mon Feb 25, 2013 4:53 pm

I am on Mac OS X 10.8.2 with Zevo 1.1.1, with two external FW800 drives.

After the resilvering was complete I was informed I had one file with permanent errors. I did not need this file any more, so I deleted it, and performed a scrub.

Should I worry about the repairs and all the checksum errors?
Code: Select all
$ zpool status
  pool: Studio
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
 scan: scrub repaired 34,5Ki in 9h27m with 0 errors on Mon Feb 25 21:16:39 2013
config:

   NAME                                           STATE     READ WRITE CKSUM
   Studio                                         ONLINE       0     0    19
     mirror-0                                     ONLINE       0     0    38
       GPTE_2EC26F7B-2461-4D7A-9C66-BFB5B828B25F  ONLINE       0     0    45  at disk3s2
       GPTE_01D345B9-E47B-42AB-9453-537F6ECE29D7  ONLINE       0     0    41  at disk2s2

errors: No known data errors
hergum Offline


 
Posts: 5
Joined: Sat Sep 15, 2012 2:54 pm

Re: Degraded pool - what to do?

Post by grahamperrin » Wed Feb 27, 2013 2:08 pm

her gum wrote:… Should I worry about the repairs and all the checksum errors? …


It's remarkable that checksum errors affect both disks/drives.

Maybe lucky that with the most recent scrub, repairs completed without error.

Consider the possibility that both disks are failing, or some other hardware problem.

Aaron Toponce : ZFS Administration, Part VI- Scrub and Resilver
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: Degraded pool - what to do?

Post by raattgift » Wed Feb 27, 2013 4:22 pm

In particular, check (and if possible swap out) your FW 800 cables. Marginal cables are very common and will often degrade the performance of devices downstream from them. Occasionally that degradation will lead to read errors, including checksum errors as in this case. More usually one or more devices on the bus appear "slow" -- at least slower than similar devices on the same bus -- when timed with e.g. dd if=/dev/rdiskXX bs=1m count=1k of=/dev/null.
raattgift Offline


 
Posts: 98
Joined: Mon Sep 24, 2012 11:18 pm


Return to General Discussion

Who is online

Users browsing this forum: hlxpgxmum, ilovezfs and 1 guest

cron