grahamperrin wrote:How closely can you relate (a) the time of the panic to (b) the time of the fibre channel link loss?
The most recent event occurred "Mon Oct 1 14:46:54 2012"
One of the Xserve RAID units has link errors in that time frame:
- Code: Select all
Warning Lower Controller 10/01/12 02:47:47 PM RAID Controller 2 Fibre LIP
Lower Controller 10/01/12 02:47:47 PM RAID Controller 2 Fibre Link Up
Warning Lower Controller 10/01/12 02:47:47 PM RAID Controller 2 Fibre LIP
Warning Upper Controller 10/01/12 02:47:39 PM RAID Controller 1 Fibre LIP
Upper Controller 10/01/12 02:47:39 PM RAID Controller 1 Fibre Link Up
Warning Upper Controller 10/01/12 02:47:39 PM RAID Controller 1 Fibre LIP
Warning Lower Controller 10/01/12 02:47:13 PM RAID Controller 2 Fibre Link Down
Warning Upper Controller 10/01/12 02:47:11 PM RAID Controller 1 Fibre Link Down
Warning Lower Controller 10/01/12 02:47:11 PM RAID Controller 2 Fibre LIP
Warning Upper Controller 10/01/12 02:47:09 PM RAID Controller 1 Fibre LIP
Now these units are not using a time server, and are synchronized to my local time (Via RAID Admin) so the time difference between my computer and the server is relevant. At this time, my clock reads 5:15:03 while the server reads 5:14:04. So the server is about a minute behind the RAID.
This would mean that the Link events in the RAID log happened just a few seconds before the kernel panic.
grahamperrin wrote:Very loosely speaking, with ZEVO CE on Mountain Lion … if I ungracefully disrupt the physical connection to part of a pool, then: the OS does continue to run; subsequent zfs and zpool commands may not run (and not respond to Control-Z); an attempt to shut down the OS may get to near completion but ultimately require force; if instead of shut down I attempt to eject/unmount an affected volume, then I *might* be wary of a panic.
Thanks for sharing your experience with Moutain Lion, Graham. Most OS X builds since Tiger have had shutdown/restart issues with drives that aren't properly attached. I remember early testing with eSATA cards in Mac Pros that refused to shut down because the drive controller just dropped the hard drive for no reason. This seems similar so I'm not surprised by your experience.
As I said, a more graceful handling of the link inconsistencies would be great, but I'm not too fussed by it. Especially in this case where the link is down, then up and down again within a span of only a few seconds. The root of the problem here is obviously the Link, which I've remedied.
Can you believe it was a simple cable incompatibility? While the Xserve RAID devices are 2GB max themselves, since I have them plugged in to a 4GB FC card, the Apple firmware requires a 4GB cable!
So anyway, I replaced the cables and I'll keep an eye on it.