NFD: normalization=formD (normalisation form D)

Moderators: jhartley, MSR734, nola

NFD: normalization=formD (normalisation form D)

Post by grahamperrin » Sun Oct 14, 2012 1:49 am

Spun off from topics such as Recommended zpool creation options? and Importing zpool from Linux ZFS-Fuse

http://www.ustream.tv/recorded/25862520 around 00:10:33 on the timeline:

… subtle bugs that, I feel like no-one else would appreciate my pain. Like in the Unicode space there's actually two different ways to store, several characters – like an
é on the Mac traditionally is stored as an
e and an ´ character. When it's rendered they composite them.

On any other platform … store composite characters … one click, one character.

So on the Mac, without intervention, you can get into some nasty problems because the Finder stores it one way, Terminal chose a different way. So you can actually go into the Finder and create a directory – café – then go into the Terminal and
Code: Select all
touch café

then you have two objects – you have a directory and a file with exactly the same name, which is, it leads to all kinds of … (!) … it looks the same but unlike … where you have differentiator, there's nothing, it's like, and in the Finder, depending on the Finder view you get different experiences. Sometimes you see two folders, sometimes you see a folder and a file, sometimes you see one folder. It's like, it's bizarre. So unfortunately …

… there's a formD-explicit setting so, on the Mac we highly recommend and in fact that's the default, you should use formD so then that problem, you can't do that – when you do the touch it'll actually map it back to the correct way.

You pay a little bit of an overhead but you can keep your sanity. It's crazy to have different stacks using different variants of the encoding.


– Don Brady, amongst panel members at the 2012 Illumos ZFS Day.
Last edited by grahamperrin on Sun Oct 14, 2012 3:07 am, edited 3 times in total.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: NFD: normalization=formD (normalisation form D)

Post by grahamperrin » Sun Oct 14, 2012 2:03 am

At viewtopic.php?p=904#p904

daniel.jozsef wrote:… likely that the filenames are in normal form C, as they were created by Linux.

However, a weird thing happened when I used convmv to change the normal form of the filenames. They started appearing in Finder, but when I try to access these files from the command line, bash no longer seems to recognize them when I type their name on the keyboard.
Also, even weirder, the latin2 characters in NFD register as TWO characters for bash (eg. to access "hé.txt", I need to type "h??.txt" instead of "h?.txt").

It's as bash itself was using the wrong normalization...


Whilst convmv (man) "converts filenames …, directories, and even whole filesystems to a different encoding …" I doubt that it can change the normalisation property at the ZFS file system level.

Here with Terminal I find it normal to type two characters for the é composition.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: NFD: normalization=formD (normalisation form D)

Post by daniel.jozsef » Sun Oct 14, 2012 4:39 am

grahamperrin wrote:Whilst convmv (man) "converts filenames …, directories, and even whole filesystems to a different encoding …" I doubt that it can change the normalisation property at the ZFS file system level.

Here with Terminal I find it normal to type two characters for the é composition.

You misunderstood. :) There is no normalization set for my zfs volumes (if there was, I wouldn't have all this trouble).
I actually wrote a script to scour the entire volume, and run the following for all items:
Code: Select all
convmv --notest -f UTF-8 -t UTF-8 --nfd ${f}

And the problem I am experiencing is that while it seems HFS does some normalization inside, zfs doesn't, and as I use the console a lot, it is extremely uncomfortable that any native text I write in the console is saved (or compared) as NFC, while all other system components expect NFD.

dbrady wrote:… there's a formD-explicit setting so, on the Mac we highly recommend and in fact that's the default, you should use formD so then that problem, you can't do that – when you do the touch it'll actually map it back to the correct way.

This sounds like what I need, but where exactly do I set this

EDIT:
Here's the entire script, there are some less trivial things to account for so I thought it might help to include it. :)
Code: Select all
#!/bin/sh
IFS=$'\n'
for f in $(find $1 | sort --reverse) ; do
   convmv --notest -f UTF-8 -t UTF-8 --nfd ${f}
done
daniel.jozsef Offline


 
Posts: 14
Joined: Sun Sep 30, 2012 2:26 pm

Re: NFD: normalization=formD (normalisation form D)

Post by grahamperrin » Sun Oct 14, 2012 8:50 am

The property:

  • is set at time of creation of a file system
  • cannot be changed after the file system is created.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: NFD: normalization=formD (normalisation form D)

Post by grahamperrin » Fri Nov 09, 2012 2:32 pm

Oracle JDK Bug ID: 7130915 File.equals does not give expected results when path contains Non-English characters on Mac OS X with both NFC and NFD in its evaluation

… and a dataloss-related conversation on Twitter.

Should there be a caution against using any Java 7-based file synchronisation app on OS X with a file system that is not NFD?

(Or is that overly cautious?)
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: NFD: normalization=formD (normalisation form D)

Post by shuman » Fri Nov 09, 2012 6:26 pm

Off topic. . . Don definitely looked like the Mac guy. That's funny.
- Mac Mini (Late 2012), 10.8.5, 16GB memory, pool - 2 Mirrored 3TB USB 3.0 External Drives
shuman Offline

User avatar
 
Posts: 96
Joined: Mon Sep 17, 2012 8:15 am

OT, nice work

Post by grahamperrin » Sat Nov 10, 2012 1:02 am

Yeah, I particularly liked the Mac laptop – on lap and sometimes in use – during the panel. (Checking IRC and USTREAM to see who else felt the pain. That's my guess  :ugeek:

I still haven't played the whole day (some of it was sleepy night time for me) but everything that I saw live brought a smile to my face. If there's a ZFS Day in 2013 it'll be interesting to see how many Mac users tune in. Who knows; maybe by then Apple will allow ZFS of a certain level to be an entirely normal part of using OS X (nudge nudge …).
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Re: NFD: normalization=formD (normalisation form D)

Post by shuman » Sat Nov 10, 2012 3:47 pm

There still seems to be some confusion regarding "and in fact that's the default". Does that mean when using ZEVO on the Mac it's unnecessary to to set the normalization option as it already the "default"? I added it as a part of my pool creation, but I'm still not sure if its required to state it explicitly during creation.
- Mac Mini (Late 2012), 10.8.5, 16GB memory, pool - 2 Mirrored 3TB USB 3.0 External Drives
shuman Offline

User avatar
 
Posts: 96
Joined: Mon Sep 17, 2012 8:15 am

Re: NFD: normalization=formD (normalisation form D)

Post by si-ghan-bi » Sat Nov 10, 2012 7:25 pm

It's not required.
si-ghan-bi Offline


 
Posts: 145
Joined: Sat Sep 15, 2012 5:55 am

a four-point summary for newcomers

Post by grahamperrin » Sat Nov 10, 2012 11:22 pm

Maybe the simplest way to sum up, for newcomers:

  • normalisation form D is required for greatest compatibility with Mac OS X and OS X
  • the Apple-oriented defaults of ZEVO Community Edition 1.1 and 1.1.1 make it not necessary to set normalization=formD when using either version of ZEVO to create a pool or file system
  • other forms of normalisation with Mac OS X and OS X may lead to unexpected behaviours – be aware of this if you use a file system that was created by any other implementation of ZFS
  • once set for a ZFS file system, the normalization property can not be changed.
grahamperrin Offline

User avatar
 
Posts: 1596
Joined: Fri Sep 14, 2012 10:21 pm
Location: Brighton and Hove, United Kingdom

Next

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 0 guests

cron