Windows port

From OpenZFS on OS X
Revision as of 05:35, 5 January 2018 by Lundman (Talk | contribs)

Jump to: navigation, search

Windows port of Open ZFS

The ZOW port does not yet have its own home, but I wanted to jot down some things I have come across while doing the port. There will be a mixture of information for myself, the occasional difference that surprised me and other left-side experiences. Since my background and knowledge comes from Unix, it is mostly looking at the way that Windows does things, so it is unlikely a Windows developer would say anything but "Of course it does it this way!". Still, it has been an interesting journey. This journey is of my naive "young" self (past lundman was younger!), making assumptions that was just wrong.

The first brick wall I hit was actually in the very first couple of weeks. Yes, there is a "Hello World" kernel (Windows Driver) example which I tried to compile to "run". This was surprisingly complex a task, a lot of information that is stale led me down the way you would do it if you were still running Windows XP. Just too much information exists. Eventually I figured out that "current" best way is to deploy in VisualStudio, where VS will copy the compiled binary over and "load" it into the running kernel. When I created the first project file, I called it "Open ZFS on Windows". Each time I had to re-create the project in frustration (as nothing worked, not even rebooting) I deleted one of the characters. In the end, it was "ZFSin" that finally had some progress. I feel I got close to giving up there, before I even started.

At first, the porting consisted of changing over the SPL primitives, like atomics, mutex, condvars, rwlocks, threads and taskqs, and all that. It is pretty straight forward porting work, and you never know if it'll work at this point.

The first real porting of a function, was the Unix `panic()` call. Ie, things have gone so bad, we want to purposely terminate the kernel. Used by the VERIFY macros throughout the ZFS sources.

During my first ZFS porting work, to OsX, the biggest annoyance when Googling for information was the lack of said information. There just is not many kernel devs on OsX, especially in the filesystem genre.

With Windows, I quickly found the opposite to be true. When trying to Google for how to trigger a BSOD (Blue screen of death), the first 1,000 or so hits are about "Troubleshooting: How to fix your BSOD!". As a side note here Windows people, when the steps to troubleshoot includes "try re-installing Windows". That is troubleshooting, that is giving up.

Anyway, the next lot of thousand hits are "How to abuse BSOD for malwares" - oh great, at least we are touching of development. A few thousand hits later, you do get some suggestions, but first always for XP or similar obsolete system.

Just too much information.

Eventually, I got everything to compile. (albeit with thousands of warnings).

And it did not take long to get SPL up and ticking, all taskqs running and firing when needed. After that, ZFS loads and ticks along. Which meant I needed a short detour to port over userland, so that I could eventually run zpool command to talk to the kernel. Userland already has a "soft" kernel shim layer, so it is already pretty portable. One of the big changes is that Windows "file descriptors" - ie the integer file used by POSIX, is very limited and there were quite a few things I could not really do with them. So userland porting included changing the integer file descriptors, to the Windows HANDLE type, replacing `open` siblings with `CreateFile` equivalent. Trivial.

At this point, ZFS and zpool/zfs command worked, you could do everything (!) but mount the file system. So create, destroy, snapshot, rename, get/set properties and all that. Naturally, ZFS without mounts is not really all that exciting so next up was to handle mount requests to the kernel.

Of course Windows do not have mount requests. Rats.

Looking around at other solutions, in particular Dokan, and btrfs for Windows, the standard seems to create a new "virtual" disk, which you then attach your filesystem to. For ZFS I created new ioctls from userland, for mount and unmount. That way the way ZFS userland works, which controls what is mounted, where and when, can all stay the same.

The mounting problem was the third brickwall I came across, where I spent weeks trying to find a way to make it work.

But finally, after about 3 months from when I started, I could do the actual porting - ie the Unix vnops to Windows... whatever they are.

So in Windows, they are IRP (IO Request Packet) in form of MaJor and MiNor numbers. For example IRP_MJ_CREATE.

Naive lundman went and counted the Unix VNOPs under OsX, and its roughly some 30-40 of them. Things like vnop_mkdir, vnop_remove and vnop_lookup. Familiar, relaxing.. they refer to single operation, almost like an atomic transaction, tight bit of code to do just that operation, like creating a directory.

Under Windows, when I counted up the IRP_MJ_ I found there were more than 100 entries! Wow, I thought, they must be even more tight, perhaps several calls to make up a single transaction! Hah!

I started with IRP_MJ_CREATE which was surprising. I was looking for a vnop_lookup but I assumed (young past lundman) that IRP_MJ_CREATE was vnop_create. But really the way to think about it is that it creates a handle to an object, which can either exist, or be created.

So then IRP_MJ_CREATE can open existing files and directories, so it is in fact vnop_lookup. Great!

But of course, you can create new files when you call IRP_MJ_CREATE, so it is also vnop_create. Oh

and, IRP_MJ_CREATE can also create new Directories, so add vnop_mkdir to that list. Come on!

There is also DeleteOnClose flag you can pass to IRP_MJ_CREATE, so it needs to call vnop_remove when the handle is closed. Of course you can!

It was about three months later, when I was wondering why I had to keep cleaning up empty directories after running the Windows tester tool ifstest.exe that I discovered that DeleteOnClose can also be set for Directories. So add vnop_rmdir to that list. Surprise!


So no, the Windows IRPs are not finer grained calls, there seems to be mostly about 10-20 real calls used for everything, and all the rest are weird obscure(ish) calls, usually in specific areas like scsi. So each IRP ends up being a large function testing all sorts of incoming flags, and branching out depending on the operation. Even simple calls like IRP_MJ_READ is also called for paging files, so has vnop_pagein included.

It was not too surprising to come across TruncateOnClose after that, not sure I see the point of that flag, but I am no longer fazed! Or something...

During the hackathon at Open ZFS summit 2017, I took a look at the zfs send and zfs recv features, to see how complicated it would be to add that support. It was quite a struggle under OsX, as Apple do not let us call so many of the kernel functions. So finding a way to do IO on a file descriptor that could be either a file on disk, or, a pipe to a command, was complicated. I spent quite a bit of time thinking about how to attack this problem in the kernel, how much would I need to change. What was surprising is how easy the kernel part was, I just changed it to take a HANDLE, like the userland port work, and it just.. worked. So I cleaned that patch up, played with the userland options, and eventually found that zfs send -v did not work. "-v" is just an option to tell zfs to print progress ever second. Amount the data sent, speed and ETA. Should be trivial?

Led down a dark avenue trying to get async ioctls to work, which means the copyin/copyout code needs to change (different stack, so permission denied) and same for file descriptor (actually, handle). But that didn't fix it, so turns out that an open /dev/zfs can only do one ioctl at a time, so the thread that does the progress printing just needed to open the /dev/zfs again for itself, and it all works. Delete all new async ioctl code. Userland ended up being the hard part.