ZEVO related system hang

Moderators: jhartley, MSR734, nola

ZEVO related system hang

Post by pbeyersdorf » Wed Oct 03, 2012 11:31 am

I am finding that accessing my ZEVO volume causes my system to become unresponsive (I'm using ZEVO CE 1.1.1). I have a striped three disk pool in a TowerRaid USB enclosure. I had no issues with it connected to my home computer where I seeded a backup to it, but connected to my work computer where I intended to house it for remote backups (Mac Pro, Mid 2010 16GB ram) accessing the pool via the terminal will frequently cause the terminal session to freeze, and accessing via the finder causes the spinning beachball. I can close the terminal window and launch a new one, but I'm unable to force quit the finder and can only recover via a hard-reset. After rebooting I can usually access the pool a couple of times before the issue resurfaces. I'd be suspicious of the enclosure except that I had no problems on my home machine using the same enclosure.

Here are some console messages from one of the events:


10/3/12 9:44:34.000 AM kernel[0]: IOUSBMassStorageClass[0xffffff804676c000]: The device is still unresponsive after 6 consecutive USB Device Resets; it will be terminated.
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046d1fc00, device/channel is not attached (6), 0 bytes (of 512)
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046cc1000, device/channel is not attached (6), 0 bytes (of 512)
10/3/12 9:44:34.000 AM kernel[0]: [0xffffff804676c000](6)/(5) Device not responding
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046d1fc00, device/channel is not attached (6), 0 bytes (of 8192)
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046cc1000, device/channel is not attached (6), 0 bytes (of 8192)
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046d1fc00, device/channel is not attached (6), 0 bytes (of 8192)
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046cc1000, device/channel is not attached (6), 0 bytes (of 8192)
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046d1fc00, device/channel is not attached (6), 0 bytes (of 8192)
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046cc1000, device/channel is not attached (6), 0 bytes (of 8192)
10/3/12 9:44:34.000 AM kernel[0]: ________________________________________
10/3/12 9:44:34.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.probe_failure'
10/3/12 9:44:34.000 AM kernel[0]: pool: 'Backup-Stripe'
10/3/12 9:44:34.000 AM kernel[0]: vdev_type: 'disk'
10/3/12 9:44:34.000 AM kernel[0]: vdev_path: '/dev/dsk/GPTE_F8D440E9-A239-4AC2-9295-BB291EBB3AAE'
10/3/12 9:44:34.000 AM kernel[0]: parent_type: 'root'
10/3/12 9:44:34.000 AM kernel[0]: prev_state: 0
10/3/12 9:44:34.000 AM kernel[0]: zfs_vdm_completion: media 0xffffff8046cc1000, media is not present (6), 0 bytes (of 512)
10/3/12 9:44:34.000 AM kernel[0]: ________________________________________
10/3/12 9:44:34.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.data'
10/3/12 9:44:34.000 AM kernel[0]: pool: 'Backup-Stripe'
10/3/12 9:44:34.000 AM kernel[0]: zio_err: 6
10/3/12 9:44:34.000 AM kernel[0]: zio_objset: 21
10/3/12 9:44:34.000 AM kernel[0]: zio_object: 9
10/3/12 9:44:34.000 AM kernel[0]: zio_level: 0
10/3/12 9:44:34.000 AM kernel[0]: zio_blkid: 0
10/3/12 9:44:34.000 AM kernel[0]: ________________________________________
10/3/12 9:44:34.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.io_failure'
10/3/12 9:44:34.000 AM kernel[0]: pool: 'Backup-Stripe'
10/3/12 9:44:34.000 AM kernel[0]: ZFSLabelScheme:willTerminate: this 0xffffff8047855300 provider 0xffffff8046cc1000 'zfs vdev for 'Backup-Stripe''
10/3/12 9:44:34.000 AM kernel[0]: ZFSLabelScheme:willTerminate: this 0xffffff8047855300 provider 0xffffff8046d1e100 'zfs vdev for 'Backup-Stripe''
10/3/12 9:44:34.000 AM kernel[0]: ZFSLabelScheme:willTerminate: this 0xffffff8047855300 provider 0xffffff8046d1fc00 'zfs vdev for 'Backup-Stripe''
10/3/12 9:44:34.000 AM kernel[0]: zfsx_terminate_pool: found top-level fs for 'Backup-Stripe'
10/3/12 9:44:34.000 AM kernel[0]: ________________________________________
10/3/12 9:44:34.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.probe_failure'
10/3/12 9:44:34.000 AM kernel[0]: pool: 'Backup-Stripe'
10/3/12 9:44:34.000 AM kernel[0]: vdev_type: 'disk'
10/3/12 9:44:34.000 AM kernel[0]: vdev_path: '/dev/dsk/GPTE_0F4E9278-CBCC-45F1-BF36-A14A386DD866'
10/3/12 9:44:34.000 AM kernel[0]: parent_type: 'root'
10/3/12 9:44:34.000 AM kernel[0]: prev_state: 0
10/3/12 9:44:34.000 AM kernel[0]: zfsx_vdm_strategy: 'disk6s2' no longer open!
10/3/12 9:44:34.000 AM kernel[0]: zfsx_vdm_strategy: 'disk6s2' no longer open!
10/3/12 9:44:34.000 AM kernel[0]: ZFSLabelScheme:stop: 0xffffff8047855300 goodbye 'zfs vdev for 'Backup-Stripe''
10/3/12 9:44:34.000 AM kernel[0]: zfsx_vdm_strategy: 'disk6s2' no longer open!
10/3/12 9:44:34.000 AM kernel[0]: zfsx_vdm_strategy: 'disk6s2' no longer open!
10/3/12 9:44:34.000 AM kernel[0]: ________________________________________
10/3/12 9:44:34.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.probe_failure'
10/3/12 9:44:34.000 AM kernel[0]: pool: 'Backup-Stripe'
10/3/12 9:44:34.000 AM kernel[0]: vdev_type: 'disk'
10/3/12 9:44:34.000 AM kernel[0]: vdev_path: '/dev/dsk/GPTE_E44D27ED-7744-4728-A6F3-6702515242C1'
10/3/12 9:44:34.000 AM kernel[0]: parent_type: 'root'
10/3/12 9:44:34.000 AM kernel[0]: prev_state: 0
10/3/12 9:44:34.000 AM kernel[0]: ________________________________________
10/3/12 9:44:34.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.data'
10/3/12 9:44:34.000 AM kernel[0]: pool: 'Backup-Stripe'
10/3/12 9:44:34.000 AM kernel[0]: zio_err: 6
10/3/12 9:44:34.000 AM kernel[0]: zio_objset: 21
10/3/12 9:44:34.000 AM kernel[0]: zio_object: 99112
10/3/12 9:44:34.000 AM kernel[0]: zio_level: 0
10/3/12 9:44:34.000 AM kernel[0]: zio_blkid: 0
10/3/12 9:44:34.000 AM kernel[0]: ________________________________________
10/3/12 9:44:34.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.io_failure'
10/3/12 9:44:34.000 AM kernel[0]: pool: 'Backup-Stripe'
10/3/12 9:44:38.000 AM kernel[0]: ________________________________________
10/3/12 9:44:38.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.data'
10/3/12 9:44:38.000 AM kernel[0]: pool: 'Backup-Stripe'
10/3/12 9:44:38.000 AM kernel[0]: zio_err: 6
10/3/12 9:44:38.000 AM kernel[0]: zio_objset: 0
10/3/12 9:44:38.000 AM kernel[0]: zio_object: 1
10/3/12 9:44:38.000 AM kernel[0]: zio_level: 0
10/3/12 9:44:38.000 AM kernel[0]: zio_blkid: 0
10/3/12 9:44:38.000 AM kernel[0]: ________________________________________
10/3/12 9:44:38.000 AM kernel[0]: ZFS WARNING: 'error from: fs.zfs.io_failure'
pbeyersdorf Offline


 
Posts: 6
Joined: Mon Sep 17, 2012 11:47 pm

Re: ZEVO related system hang

Post by grahamperrin » Wed Oct 03, 2012 2:36 pm

Please, what model is the home computer where you had no issues?

On each computer, which operating system?

With the affected device is connected, before the problem presents, please run the following command then paste (as code) the result to this topic:

zpool list

… for remote backups …


Is file sharing enabled? AFP?
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: ZEVO related system hang

Post by pbeyersdorf » Wed Oct 03, 2012 6:05 pm

OK, running "zpool list" before the issue arises I get:

Code: Select all
NAME             SIZE   ALLOC    FREE     CAP  HEALTH  ALTROOT
Backup-Stripe  5.00Ti  2.22Ti  2.78Ti     44%  ONLINE  -


I do not have file sharing for the pool turned on - I'm backing up from a FreeNAS box at home using rsync over ssh. The home computer that can connect to this pool without issue is a mid 2007 24" iMac with 4GB ram. Both home and work computers are running Mac OSX 8.2
pbeyersdorf Offline


 
Posts: 6
Joined: Mon Sep 17, 2012 11:47 pm

Re: ZEVO related system hang

Post by pbeyersdorf » Fri Oct 12, 2012 5:16 pm

I'm beginning to think the system hang is in response to an intermittent failure of the 3TB WD Caviar that is in my striped array - the error longs I see after a hang seem to point at a device failure for disk 6 which is this (brand new) WD drive. I've been able to set up able to set up a striped array using different disks and haven't had any problems. So, I'm going to throw in the towel and replace the potentially bad disk, but assuming this is the cause of the trouble, it would be good if ZEVO could respond in a more graceful way.
pbeyersdorf Offline


 
Posts: 6
Joined: Mon Sep 17, 2012 11:47 pm

Re: ZEVO related system hang

Post by grahamperrin » Fri Oct 12, 2012 10:14 pm

Thanks.

If this is RAID 0 (without fault tolerance) with an unresponsive device, would you like ZEVO to allow the system to quickly respond as if an improper eject occurred?

Something like the usual red alert –

semi red alert.png
semi red alert.png (31.28 KiB) Viewed 132 times


– but without the "OS X will attempt to repair" promise.

If not quickly, how long would you like ZEVO to wait before assuming that the device will never respond?
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

reference

Post by grahamperrin » Wed Nov 21, 2012 3:55 am

grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: ZEVO related system hang

Post by pbeyersdorf » Wed Nov 21, 2012 12:00 pm

I discovered the issue is related to a bug in my enclosure that prevents it from waking up the disks after they have spun down. I didn't see this when I seeded my backup because the disks never spun down. When a disk formatted as HFS+ encounters this problem, the OS gives me a dialog box that the disk was improperly ejected but the OS remains responsive. This would be the desired behavior for ZFS as well.
pbeyersdorf Offline


 
Posts: 6
Joined: Mon Sep 17, 2012 11:47 pm

problem with spin of disks in a TowerRAID USB enclosure

Post by grahamperrin » Wed Nov 21, 2012 2:39 pm

Thanks for follow-up. Is the enclosure amongst those listed at TowerRAID?

Is the problem reproducible only when you remotely access (ssh) the Mac Pro that hosts the enclosure?

If you remotely run (say) the command below – shortly before attempting backup – then do all disks in the enclosure spin up?

Code: Select all
mdutil -as
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: ZEVO related system hang

Post by pbeyersdorf » Thu Nov 22, 2012 12:31 pm

Yes, the enclosure is an 8 enclosure USB TowerRaid (TR8U). I've recreated my volume using HFS+ and disk spanning, rather than ZFS, because of this problem, so I'm not able to test out if it is SSH access only or if the mdutil -as command fixes the issue. While the enclosure still has the problem of not spinning up the disks, at least it doesn't result in a system crash. I've turned off the "put hard disks to sleep" option in Energy Saver preferences, as a temporary workaround.
pbeyersdorf Offline


 
Posts: 6
Joined: Mon Sep 17, 2012 11:47 pm

Re: ZEVO related system hang

Post by grahamperrin » Thu Nov 22, 2012 2:01 pm

TowerRAID TR8U - 8 Bay SATA to USB 2.0 JBOD / Spanning Enclosure – interesting to note that the problem with spin is reproduced with HFS+ in lieu of ZEVO ZFS – thanks.

Should you decide to revert to ZFS for the disks in that enclosure, I'd expect your chosen workaround to be as good with ZEVO Community Edition 1.1.1 as it is with HFS+ in OS X 10.8.2.

(USB aside: I sometimes use spin as a workaround to problems with a multi-disk enclosure on FireWire.)
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Next

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 3 guests

cron