Page 1 of 1

Rebooting loses one pool

PostPosted: Mon May 23, 2016 3:17 am
by fustbariclation
I've got two pools. They seem to be working well - performance is great.

The problem is that, when I reboot, one disappears.

When I try to get the other one back, it has problems reading the cache disk. I couldn't get it back, so I had to re-create it:

# zpool status jupiter
pool: jupiter
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
jupiter ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
disk8 ONLINE 0 0 0
disk9 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
disk11 ONLINE 0 0 0
disk14 ONLINE 0 0 0
cache
disk13 ONLINE 0 0 0

errors: No known data errors

What should I do to prevent this happening in future? Would it make sense to export both volumes at shut down, and re-import them?

Or is there another solution?

Re: Rebooting loses one pool

PostPosted: Mon May 23, 2016 5:05 pm
by lundman
You should be very careful with cache disks and OSX with /dev/disk names. If they get renumbered, it will just use whatever disk happens to have the old name.

It is recommended you use invariantdisk path in your import, ie zpool import -d /var/run/disk/by-id

Re: Rebooting loses one pool

PostPosted: Mon May 23, 2016 9:56 pm
by fustbariclation
Thank you! That fits perfectly, and makes sense.

I wonder how I can change the existing sets to be ordered by ID..

Re: Rebooting loses one pool

PostPosted: Mon May 23, 2016 10:24 pm
by Brendon
what lundman said -> zpool import -d /var/run/disk/by-id

Re: Rebooting loses one pool

PostPosted: Tue May 24, 2016 3:47 am
by fustbariclation
Yes, I understand that - for the future.

However, I have a running pool at the moment. It takes quite a long time to move all the data out, and then move all of it back again. There must be a config file somewhere where it keeps this information, and it'd be a lot quicker to edit that.

I think this is a bug, actually, it should be easy to fix - all that's necessary is that zpool converts device IDs to unique IDs, and the problem would disappear.

Re: Rebooting loses one pool

PostPosted: Tue May 24, 2016 5:20 am
by fustbariclation
I think I can see the problem.

In the file:

/etc/zfs/zpool.cache

It refers to devices as either:

"
disk
guid
path
F/private/var/run/disk/by-id/media-38B45B4D-5617-754A-B307-A305948EE3A5
whole_disk
create_txg
"

or

"
disk
guid
path
/dev/disk14s1
whole_disk
"

This is what is wrong. All that should be necessary is to replace the /dev/disk14s1 type entries with the correct reference in /private/var/run/disk/by-id/

It's probably best to do this from the recovery system, just to be sure nothing gets confused -- though they shouldn't really, the zpool.config file is just a text file, not an XML, or anything nastier, so it should just be read sequentially. I can't see any evidence for a checksum being kept of this file.

It'll save a day's copying, with just a quick re-boot, so it should do the trick.

Has anybody here done this? It would be good to know if it's safe to change it on a live system - I've got the source, so I could check, but it'd be nice if somebody here knows if zpool.cache is usually left alone when no changes are being made (my guess is that it is).

Re: Rebooting loses one pool

PostPosted: Tue May 24, 2016 12:42 pm
by Brendon
If you do the import the cache file will be rewritten.

- Brendon