Crash when importing

All your general support questions for OpenZFS on OS X.

Crash when importing

Postby haer22 » Tue Apr 19, 2016 11:58 am

Suddenly my server started to mess up. I saw that it had restarted and hen restarted again after 2-3 minutes. After turning stuff on until it survived one of my pools seems to be the source of problems.

I have two pools, gaia and zeus, ca 12 TB each. Importing gaia is no problem. Importing zeus the server first freezes after say 20 sec, then reboots ca 10-15s later.

I have started the pools on another miniMac, same symptoms. I have moved around the disks between the cabinets so none of zeus disks are in the same position. Still rebooting after I try to import.

I do not see anything spectacular in the system.log.

Any debug-flag I can turn on before I attempt to import? Anything else I should check?
haer22
 
Posts: 123
Joined: Sun Mar 23, 2014 2:13 am

Re: Crash when importing

Postby Brendon » Tue Apr 19, 2016 1:06 pm

You will need to report to the IRC channel for support, crash log in hand. One thing I notice about this forum is that users rarely report anything useful about their configuration, software and hardware. Makes diagnosis nigh on impossible.

Cheers
Brendon
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Re: Crash when importing

Postby haer22 » Wed Apr 20, 2016 10:06 am

Well i knew I had very little data, that is why I asked if there was some debug flag that i could turn on.

The setup is like:
Code: Select all
[ihecc:~] root# zpool import
   pool: zeus
     id: 15661326811550779214
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
   devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

   zeus                                            UNAVAIL  missing device
     raidz1-0                                      ONLINE
       disk10                                      ONLINE
       disk11                                      ONLINE
       media-982056C2-2802-5F47-9797-6D618E3D7DF9  ONLINE
       media-B1CEAE5F-42EB-2949-8DED-FF7AFDCEDEF1  ONLINE
     raidz1-1                                      ONLINE
       media-BAE6145A-00CE-AF43-87B2-E7D340783418  ONLINE
       media-916A0386-F7E4-8740-A7BA-98655E710CFE  ONLINE
       media-F43B0AEF-A118-B245-86E3-EC18131D2AE4  ONLINE
       media-67994AD7-F775-7B4E-9ED0-D1B0670BE302  ONLINE
   cache
     disk3s5

   Additional devices are known to be part of this pool, though their
   exact configuration cannot be determined.
[ihecc:~] root# zpool import zeus

The missing device is the log device. I did see it referenced in one log about an I/O error. Only once but still I disconnected it. Import with -m but the system still freeze after ca 20-30s and then reboot.

After I issue:
Code: Select all
[ihecc:~] root# date;zpool import zeus;date
Ons 20 Apr 2016 19:52:50 CEST
The devices below are missing, use '-m' to import the pool anyway:
       11665668891433955486 [log]

I get the following in /var/log/system.log (disk3s5 is the missing log-device):
Code: Select all
Apr 20 19:52:54 ihecc kernel[0]: ZFS: vdev_disk_open('/dev/disk3s5') failed error 2
Apr 20 19:53:00 --- last message repeated 1 time ---
Apr 20 19:53:00 ihecc com.apple.xpc.launchd[1] (com.apple.screensharing[1041]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): com.apple.screensharing.server
Apr 20 19:53:01 ihecc zed[1045]: eid=131 class=statechange
Apr 20 19:53:01 ihecc zed[1047]: eid=132 class=statechange
Apr 20 19:53:01 ihecc zed[1049]: eid=133 class=statechange
Apr 20 19:53:01 ihecc zed[1051]: eid=134 class=statechange
Apr 20 19:53:02 ihecc zed[1053]: eid=135 class=statechange
Apr 20 19:53:02 ihecc zed[1055]: eid=136 class=statechange
Apr 20 19:53:02 ihecc zed[1057]: eid=137 class=statechange
Apr 20 19:53:02 ihecc zed[1059]: eid=138 class=statechange
Apr 20 19:53:02 ihecc zed[1061]: eid=139 class=statechange
Apr 20 19:53:02 ihecc kernel[0]: ZFS: vdev_disk_open('/dev/disk3s5') failed error 2
Apr 20 19:53:05 ihecc zed[1064]: eid=140 class=zpool pool=zeus
Apr 20 19:53:05 ihecc zed[1068]: Pool export zeus
Apr 20 19:53:05 ihecc sudo[1070]:     root : TTY=unknown ; PWD=/ ; USER=hans ; COMMAND=/usr/bin/osascript -e display notification "Pool zeus exported." with title "Pool export"
Apr 20 19:53:05 ihecc zed[1075]: Pool export zeus
Apr 20 19:53:05 ihecc sudo[1077]:     root : TTY=unknown ; PWD=/ ; USER=hans ; COMMAND=/usr/bin/osascript -e display notification "Pool zeus exported." with title "Pool export"
Apr 20 19:53:06 ihecc zed[1080]: eid=141 class=statechange
Apr 20 19:53:06 ihecc zed[1082]: eid=142 class=statechange
Apr 20 19:53:06 ihecc zed[1084]: eid=143 class=statechange
Apr 20 19:53:06 ihecc zed[1086]: eid=144 class=statechange
Apr 20 19:53:06 ihecc zed[1088]: eid=145 class=statechange
Apr 20 19:53:06 ihecc zed[1090]: eid=146 class=statechange
Apr 20 19:53:06 ihecc zed[1092]: eid=147 class=statechange
Apr 20 19:53:06 ihecc zed[1094]: eid=148 class=statechange
Apr 20 19:53:06 ihecc zed[1096]: eid=149 class=statechange
Apr 20 19:53:07 ihecc zed[1098]: eid=150 class=zpool pool=zeus
Apr 20 19:53:07 ihecc zed[1102]: Pool export zeus
Apr 20 19:53:07 ihecc sudo[1104]:     root : TTY=unknown ; PWD=/ ; USER=hans ; COMMAND=/usr/bin/osascript -e display notification "Pool zeus exported." with title "Pool export"
Apr 20 19:53:07 ihecc zed[1109]: Pool export zeus
Apr 20 19:53:07 ihecc sudo[1111]:     root : TTY=unknown ; PWD=/ ; USER=hans ; COMMAND=/usr/bin/osascript -e display notification "Pool zeus exported." with title "Pool export"
Apr 20 19:53:07 ihecc zed[1114]: eid=151 class=statechange
Apr 20 19:53:07 ihecc zed[1116]: eid=152 class=statechange
Apr 20 19:53:07 ihecc zed[1118]: eid=153 class=statechange
Apr 20 19:53:07 ihecc zed[1120]: eid=154 class=statechange
Apr 20 19:53:07 ihecc zed[1122]: eid=155 class=statechange
Apr 20 19:53:07 ihecc zed[1124]: eid=156 class=statechange
Apr 20 19:53:07 ihecc zed[1126]: eid=157 class=statechange
Apr 20 19:53:07 ihecc zed[1128]: eid=158 class=statechange
Apr 20 19:53:07 ihecc zed[1130]: eid=159 class=statechange
Apr 20 19:53:07 ihecc zed[1132]: eid=160 class=zpool pool=zeus
Apr 20 19:53:07 ihecc zed[1136]: Pool export zeus
Apr 20 19:53:08 ihecc sudo[1139]:     root : TTY=unknown ; PWD=/ ; USER=hans ; COMMAND=/usr/bin/osascript -e display notification "Pool zeus exported." with title "Pool export"
Apr 20 19:53:08 ihecc zed[1144]: Pool export zeus
Apr 20 19:53:08 ihecc sudo[1146]:     root : TTY=unknown ; PWD=/ ; USER=hans ; COMMAND=/usr/bin/osascript -e display notification "Pool zeus exported." with title "Pool export"
Apr 20 19:53:08 ihecc zed[1149]: eid=161 class=statechange
Apr 20 19:53:08 ihecc zed[1151]: eid=162 class=statechange
Apr 20 19:53:08 ihecc zed[1153]: eid=163 class=statechange
Apr 20 19:53:08 ihecc zed[1155]: eid=164 class=statechange
Apr 20 19:53:08 ihecc zed[1157]: eid=165 class=statechange
Apr 20 19:53:08 ihecc zed[1159]: eid=166 class=statechange
Apr 20 19:53:08 ihecc zed[1161]: eid=167 class=statechange
Apr 20 19:53:08 ihecc zed[1163]: eid=168 class=statechange
Apr 20 19:53:08 ihecc zed[1165]: eid=169 class=statechange
Apr 20 19:53:08 ihecc zed[1167]: eid=170 class=zpool pool=zeus
Apr 20 19:53:08 ihecc zed[1171]: Pool export zeus
Apr 20 19:53:08 ihecc sudo[1173]:     root : TTY=unknown ; PWD=/ ; USER=hans ; COMMAND=/usr/bin/osascript -e display notification "Pool zeus exported." with title "Pool export"
Apr 20 19:53:09 ihecc zed[1178]: Pool export zeus
Apr 20 19:53:09 ihecc sudo[1180]:     root : TTY=unknown ; PWD=/ ; USER=hans ; COMMAND=/usr/bin/osascript -e display notification "Pool zeus exported." with title "Pool export"
Apr 20 19:53:42 ihecc com.apple.xpc.launchd[1] (com.apple.screensharing[1185]): Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): com.apple.screensharing.server
^C
[ihecc:~] hans%

There is a lot of "eid=NNN class=statechange". What happened there?
Why "Pool export"? I am trying to import the pool.

If I now would issue the "import -m" command the system will issue roughly the same stuff to /var/log/system.log and the freeze and then reboot.

No disk errors apart from the single I/O error once (and I have had many reboots while I debug this).
The setup has worked fine for months. I even wen back 2 days in TimeMachine and reloaded the system as it was then just to make sure nothing had changed in any software. Same problems still. Points to hardware. But which disk? And what failed and made the system reboot?

More logs please.
Last edited by haer22 on Wed Apr 20, 2016 12:21 pm, edited 1 time in total.
haer22
 
Posts: 123
Joined: Sun Mar 23, 2014 2:13 am

Re: Crash when importing

Postby haer22 » Wed Apr 20, 2016 10:07 am

And I am sitting in the IRC but it seems a bit empty. I hope am in the correct one but I see familiar names in the membership.


Brendon wrote:You will need to report to the IRC channel for support, crash log in hand. One thing I notice about this forum is that users rarely report anything useful about their configuration, software and hardware. Makes diagnosis nigh on impossible.

Cheers
Brendon
haer22
 
Posts: 123
Joined: Sun Mar 23, 2014 2:13 am

Re: Crash when importing

Postby Brendon » Wed Apr 20, 2016 5:28 pm

Theres usually someone there 24/7. Probably peak activity 1:00:00 AM UTC - 12:00:00 Noon UTC or thereabouts.

At any rate, the only information that we are likely wanting is your crash dump. Almost no other logs matter. Please ensure that the crash was captured with symbols in accordance with the installation FAQ - https://openzfsonosx.org/wiki/Install (look for nvram commands).

Brendon
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Re: Crash when importing

Postby patmuk » Fri May 13, 2016 3:08 pm

Hi,

I have a similar problem: Import crashes my machine.
I did not bother IRC because I am currently backing up my files - which takes still 24h.
And this thread might be helpful to others.

I don't know where I can find a crash dump (nothing written in the wiki as well).
The system.log and the o3x logs in /var/log are empty.

My setup:
I am running version 1.5.2. (recently upgraded)
on a MacPro 5.1 (2010) - so ECC RAM is used.
I have 3 zpools, all with mirroring disks.
Two pools with two disks, one with 3.
1 disk of each pool is connected in an external case via eSATA, the rest is internally connected (SATA).
The external case is connected through a PCI eSATA controller.

Since I upgraded openzfs I implemented a bunch of launchDemons, which are doing regular (hourly) snapshots and weekly scrubs.

Since upgrading openzfs I have this issue for the second time.
The last time it was on a different pool then now.

I forgot what helped the last time ... it was either pool import -F or -FX or I simply restored a snapshot.

The machine suddenly crashed and crashed again after rebooting. I figured it was the auto-install script and disabled it.
I can import two pools fine, but it repeatedly crashes when importing the third.

This time import -F did not help - I did not want to try -FX yet.
I coud import the pool readonly with
zpool import -f -o readonly=on <poolname>

All pools did a scrub when my machine crashed. I guess the faulty pool had just bad luck and the crash corrupted something.

Like I said I currently backup all files before trying -X or restoring a snapshot.

So - I actually have two questions:
1) How to fix my current problem, that a rw import of the pool crashes my machine?
I assume it might be a bug in version 1.5.2. - I am happy to help with further info for debugging.

2) I am rethinking my backup strategy. Well, this is currently making automatic snapshots within the pool.
But could it happen, that the pool gets this much damaged that the snapshots are not mountable as well?
I am thinking about either zfs send-ing the snapshots from one local pool to another - or to an online backup.
Surely the offsite backup adds another level of protection, but it involves additional monthly costs. And I am not sure, if my zipped snapshots would be save from data corruption there (as I don't know if the provider would use zfs ;) ).

If I opt for online, altdrive.com looks like the cheapest option with monthly costs of $4.45 (or even $3.71, if one pays for a year), unlimited storage space and no other fees (for up/downloading, etc.).
Any experience with this? I assume that I might just point their client-software to the .zfs snapshot directory ... or locally zip incremental snapshots, which I store there.

Thanks for reading my long post!
Please tell me where/how to get a dump, give me tips and feel free to comment my setup :)
Thanks in advance :)
patmuk
 
Posts: 19
Joined: Fri May 13, 2016 2:41 pm

Re: Crash when importing

Postby Brendon » Fri May 13, 2016 11:02 pm

My 2c Worth:

1. Your backup strategy sounds weak. You should backup your data outside of your pool. There is nothing preventing you from losing your pool via hardware or software failure, and nothing preventing you from deleting your data via administrative action.

2. My guess, you are panicking due to data corruption on the offending pool. Panicking seems to be zfs' most common way of expressing displeasure. Causes may include, hardware failure, or potentially bugs in ZFS. We are aware of a period when we were corrupting some data structures sometime around the 1.3 series. This resulted in the "unlinked drain" panics. Cure was to mount pool readonly and transfer data off before recreating. This strategy may also apply to your circumstances.

3. Crash dumps etc are visible using the Console.app on your mac. In rare cases when you panic you can "double panic" in which case there may be no log entries.

- Brendon
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Re: Crash when importing

Postby patmuk » Sat May 14, 2016 4:23 pm

Hey,

thanks a lot for your hints.
1) Agree - though the only thing I could do to enhance my security would be
a) putting more drives to the mirror
=> But my important files are in a 3-drive-mirror - a 4th, 5th, ... will not improve the reliability much.
b) backing up the snapshots elsewhere:
i) online
=> has additional costs and who knows what file system/backup strategy is used there
ii) external drive
=> involves manual work of swiping it out.
iii) other zpool
=> would protect from failure of a pool - but if there is something systematic wrong with a ifs release all pools would be broken
===> though it is unlikely, as I assume the releases are well tested :)

How is your backup strategy?
I might opt for 1biii)

2) Oh, that would be great :) That would mean I could restore a snapshot. But, yes, once I have copied all data I could rebuild the pool.

3) You mean the reports in /Library/logs/DiagnosticReports?
There is no new dump. I doubt that there is always a double panic - maybe the mac crashes before it can write a crash dump?
patmuk
 
Posts: 19
Joined: Fri May 13, 2016 2:41 pm

Re: Crash when importing

Postby patmuk » Sat May 14, 2016 4:28 pm

As for 2) ... I just noticed that one can not rollback a snapshot on a read-only drive.
patmuk
 
Posts: 19
Joined: Fri May 13, 2016 2:41 pm

Re: Crash when importing

Postby Brendon » Sat May 14, 2016 4:43 pm

Maybe you can send the snapshot to another dataset.
Brendon
 
Posts: 286
Joined: Thu Mar 06, 2014 12:51 pm

Next

Return to General Help

Who is online

Users browsing this forum: No registered users and 18 guests