Editing Windows port
Warning: You are not logged in.
Your IP address will be recorded in this page's edit history.The edit can be undone.
Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 7: | Line 7: | ||
I actually had a good time challenging myself with this, so I've tried my utmost to make this text be void of bitterness, as the stereotype would demand :) | I actually had a good time challenging myself with this, so I've tried my utmost to make this text be void of bitterness, as the stereotype would demand :) | ||
− | |||
The first brick wall I hit was actually in the very first couple of weeks. Yes, there is a "Hello World" kernel (Windows Driver) example which I tried to compile to "run". This was surprisingly complex a task, a lot of information that is stale led me down the way you would do it if you were still running Windows XP. Just too much information exists. Eventually I figured out that "current" best way is to deploy with VisualStudio to a remote VM, where VS will copy the compiled binary over and "load" it into the running kernel. When I created the first project file, I called it "Open ZFS on Windows". Each time I had to re-create the project in frustration (as nothing worked, not even rebooting! So much rebooting) I deleted one of the characters. In the end, it was "ZFSin" that finally had some progress. I feel I got close to giving up then, before I even started. | The first brick wall I hit was actually in the very first couple of weeks. Yes, there is a "Hello World" kernel (Windows Driver) example which I tried to compile to "run". This was surprisingly complex a task, a lot of information that is stale led me down the way you would do it if you were still running Windows XP. Just too much information exists. Eventually I figured out that "current" best way is to deploy with VisualStudio to a remote VM, where VS will copy the compiled binary over and "load" it into the running kernel. When I created the first project file, I called it "Open ZFS on Windows". Each time I had to re-create the project in frustration (as nothing worked, not even rebooting! So much rebooting) I deleted one of the characters. In the end, it was "ZFSin" that finally had some progress. I feel I got close to giving up then, before I even started. | ||
− | |||
At first, the porting consisted of changing over the SPL primitives, like atomics, mutex, condvars, rwlocks, threads and taskqs, and all that. It is pretty straight forward porting work, and you never know if it'll work at this point, or be worth it. | At first, the porting consisted of changing over the SPL primitives, like atomics, mutex, condvars, rwlocks, threads and taskqs, and all that. It is pretty straight forward porting work, and you never know if it'll work at this point, or be worth it. | ||
− | Since OsX already runs with Solaris/IllumOS's memory manager, '''kmem''', it was easy to compile that for Windows as well. It is a page allocator with slab support and magazines, kmem_caches etc. Done by Jeff Bonwick many moons ago. So we are already familiar with it, and it has great debugging features to find memory corruption, modify-after-free and so on. | + | Since OsX already runs with Solaris/IllumOS's memory manager, '''kmem''', it was easy to compile that for Windows as well. It is a page allocator with slab support and magazines, kmem_caches etc. Done by Jeff Bonwick many moons ago. So we are already familiar with it, and it has great debugging features to find memory corruption, modify-after-free and so on. Hopefully Jeff isn't too unhappy with that. |
− | + | ||
The first real porting of a function, was the Unix '''panic()''' call. Ie, things have gone so bad, we want to purposely terminate the kernel. Used by the '''VERIFY''' macros throughout the ZFS sources. | The first real porting of a function, was the Unix '''panic()''' call. Ie, things have gone so bad, we want to purposely terminate the kernel. Used by the '''VERIFY''' macros throughout the ZFS sources. | ||
Line 26: | Line 23: | ||
Just too much information. | Just too much information. | ||
− | |||
Eventually, I got everything to compile. (albeit with thousands of warnings - they are still there if you want something to do!). | Eventually, I got everything to compile. (albeit with thousands of warnings - they are still there if you want something to do!). | ||
Line 37: | Line 33: | ||
Of course Windows do not have mount requests. Rats. | Of course Windows do not have mount requests. Rats. | ||
− | |||
Looking around at other solutions, in particular Dokan, and btrfs for Windows, the standard seems to create a new "virtual" disk, which you then attach your filesystem to. For ZFS I created new ioctls from userland, for mount and unmount. The way ZFS works, is userland controls what is mounted, where and when. So this code can all stay the same, making future merges easier. | Looking around at other solutions, in particular Dokan, and btrfs for Windows, the standard seems to create a new "virtual" disk, which you then attach your filesystem to. For ZFS I created new ioctls from userland, for mount and unmount. The way ZFS works, is userland controls what is mounted, where and when. So this code can all stay the same, making future merges easier. | ||
− | The mounting problem was the | + | The mounting problem was the third brickwall I came across, where I spent weeks trying to find a way to make it work. First week you try increasingly more and more insane things. The second week I spend asking on stackoverflow or similar places, and by third week and not getting any answers, I give up and properly learn how it works. |
− | + | ||
− | + | ||
+ | But finally, after about 3 months from when I started, I could do the '''actual porting''' work - ie changing the Unix vnops to Windows... whatever they are. The core part which is the real porting phase. | ||
So in Windows, they are IRP (IO Request Packet) in form of MaJor and MiNor numbers. For example '''IRP_MJ_CREATE'''. | So in Windows, they are IRP (IO Request Packet) in form of MaJor and MiNor numbers. For example '''IRP_MJ_CREATE'''. | ||
Line 51: | Line 45: | ||
Under Windows, when I counted up the IRP_MJ_ I found there were more than 100 entries! Wow, I thought, they must be ''even more'' tight, perhaps several calls to make up a single transaction! Super clean! Hah! | Under Windows, when I counted up the IRP_MJ_ I found there were more than 100 entries! Wow, I thought, they must be ''even more'' tight, perhaps several calls to make up a single transaction! Super clean! Hah! | ||
− | |||
I started with '''IRP_MJ_CREATE''' which was surprising. I was looking for a '''vnop_lookup''' but I assumed (young past lundman) that '''IRP_MJ_CREATE''' was '''vnop_create'''. But really the way to think about it is that it ''creates a handle'' to an object, which can either exist, or be created. | I started with '''IRP_MJ_CREATE''' which was surprising. I was looking for a '''vnop_lookup''' but I assumed (young past lundman) that '''IRP_MJ_CREATE''' was '''vnop_create'''. But really the way to think about it is that it ''creates a handle'' to an object, which can either exist, or be created. | ||
Line 64: | Line 57: | ||
It was about three months later, when I was wondering why I had to keep cleaning up empty directories after running the Windows tester tool '''ifstest.exe''' that I discovered that '''DeleteOnClose''' can also be set for Directories! So add '''vnop_rmdir''' to that list. ''Surprise!'' | It was about three months later, when I was wondering why I had to keep cleaning up empty directories after running the Windows tester tool '''ifstest.exe''' that I discovered that '''DeleteOnClose''' can also be set for Directories! So add '''vnop_rmdir''' to that list. ''Surprise!'' | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Can you imagine Unix code like '''dh = opendir("dir", DeleteOnClose); closedir(dh);''' to be the equivalent of '''rmdir("dir");'''? | Can you imagine Unix code like '''dh = opendir("dir", DeleteOnClose); closedir(dh);''' to be the equivalent of '''rmdir("dir");'''? | ||
− | |||
− | |||
So no, the Windows IRPs are not finer grained calls, there seems to be mostly about 10-20 ''real'' calls used for ''everything'', and all the rest are weird obscure(ish) calls, usually in specific areas like scsi. So each IRP ends up being a large function, testing all sorts of incoming flags, and branching out depending on the operation. Even simple calls like '''IRP_MJ_READ''' is also called for paging files, so has '''vnop_pagein''' included. | So no, the Windows IRPs are not finer grained calls, there seems to be mostly about 10-20 ''real'' calls used for ''everything'', and all the rest are weird obscure(ish) calls, usually in specific areas like scsi. So each IRP ends up being a large function, testing all sorts of incoming flags, and branching out depending on the operation. Even simple calls like '''IRP_MJ_READ''' is also called for paging files, so has '''vnop_pagein''' included. | ||
It was not too surprising to come across '''TruncateOnClose''' after that, not sure I see the point of that flag, but I am no longer fazed! Or something... | It was not too surprising to come across '''TruncateOnClose''' after that, not sure I see the point of that flag, but I am no longer fazed! Or something... | ||
− | |||
During the hackathon at Open ZFS summit 2017, I took a look at the ''zfs send'' and ''zfs recv'' features, to see how complicated it would be to add that support. It was quite a struggle under OsX, as Apple do not let us call many of the kernel functions. Finding a way to do IO on a file descriptor that could be either a file on disk, or, a pipe to a command, was complicated. With Windows I spent quite a bit of time thinking about how to attack this problem in the kernel, how much would I need to change. What was surprising is how easy the kernel part was, I just changed it to take a HANDLE, like the userland port work, and it just.. worked. So I cleaned that patch up, played with the userland options, and eventually found that ''zfs send -v'' did not work. "-v" is just an option to tell zfs to print progress every second. The amount the data sent, speed and ETA. Should be trivial? | During the hackathon at Open ZFS summit 2017, I took a look at the ''zfs send'' and ''zfs recv'' features, to see how complicated it would be to add that support. It was quite a struggle under OsX, as Apple do not let us call many of the kernel functions. Finding a way to do IO on a file descriptor that could be either a file on disk, or, a pipe to a command, was complicated. With Windows I spent quite a bit of time thinking about how to attack this problem in the kernel, how much would I need to change. What was surprising is how easy the kernel part was, I just changed it to take a HANDLE, like the userland port work, and it just.. worked. So I cleaned that patch up, played with the userland options, and eventually found that ''zfs send -v'' did not work. "-v" is just an option to tell zfs to print progress every second. The amount the data sent, speed and ETA. Should be trivial? | ||
Led down a dark avenue trying to get async ioctls to work, which means the ''copyin/copyout'' code needs to change (different stack, so permission denied) and same for file descriptor (actually, handle). But that didn't fix it, so turns out that an open '''/dev/zfs''' can only do one ioctl at a time, so the thread that does the progress printing just needed to open the '''/dev/zfs''' again for itself, and it all works. Delete all new async ioctl code. Userland ended up being the hard part, and the kernel code trivial. | Led down a dark avenue trying to get async ioctls to work, which means the ''copyin/copyout'' code needs to change (different stack, so permission denied) and same for file descriptor (actually, handle). But that didn't fix it, so turns out that an open '''/dev/zfs''' can only do one ioctl at a time, so the thread that does the progress printing just needed to open the '''/dev/zfs''' again for itself, and it all works. Delete all new async ioctl code. Userland ended up being the hard part, and the kernel code trivial. | ||
− | |||
Deleting an entry can also be done by calling '''IRP_SET_INFORMATION''' with '''set_file_disposition''' with '''Delete''' set to TRUE. Again, the actual delete is delayed until close '''IRP_MJ_CLOSE'''. | Deleting an entry can also be done by calling '''IRP_SET_INFORMATION''' with '''set_file_disposition''' with '''Delete''' set to TRUE. Again, the actual delete is delayed until close '''IRP_MJ_CLOSE'''. | ||
− | |||
Unix '''vnop_readdir''' has 2 structs it can use, the legacy and ''extended'' struct. Which is a bit annoying, but the ZFS code already handles that case. On Windows, it turns out that there are at least 9! of these structs (so far). For short-name or long-name, or both. Then again, with file IDs etc. To be fair, I've only seen 4 types used in the wild so far. You can also pass in a glob match pattern to it, which is a pain. | Unix '''vnop_readdir''' has 2 structs it can use, the legacy and ''extended'' struct. Which is a bit annoying, but the ZFS code already handles that case. On Windows, it turns out that there are at least 9! of these structs (so far). For short-name or long-name, or both. Then again, with file IDs etc. To be fair, I've only seen 4 types used in the wild so far. You can also pass in a glob match pattern to it, which is a pain. | ||
− | |||
With Unix, the vnode has a v_data void * pointer, which is "yours" to do with as you please. Ie, it can point to whatever data you want, and in ZFS it points to a znode. Under Windows, a '''FileObject''' actually has 2 '''FsContext''' pointers, which I thought was rather generous. But it turns out to be a lie! With directory listings, you are expected to remember the index offset, and search glob pattern "yourself", in the filesystem, which typically what '''FsContext2''' is used for. But also, if you want memory-mapped ('''vnop_mmap''') to work (you do), you are expected to put in a Windows specific struct inside your struct somewhere, set in '''Fs_Context'''. So it feels like suddenly the v_data pointer isn't ''entirely mine'' as it would be under Unix. But, really the mmap struct doesn't ''have'' to be in the '''FsContext''', I could create some other storage for it, like a linked list and search for it, but it is placed there under all examples, and is much easier to handle when it is time to release it. | With Unix, the vnode has a v_data void * pointer, which is "yours" to do with as you please. Ie, it can point to whatever data you want, and in ZFS it points to a znode. Under Windows, a '''FileObject''' actually has 2 '''FsContext''' pointers, which I thought was rather generous. But it turns out to be a lie! With directory listings, you are expected to remember the index offset, and search glob pattern "yourself", in the filesystem, which typically what '''FsContext2''' is used for. But also, if you want memory-mapped ('''vnop_mmap''') to work (you do), you are expected to put in a Windows specific struct inside your struct somewhere, set in '''Fs_Context'''. So it feels like suddenly the v_data pointer isn't ''entirely mine'' as it would be under Unix. But, really the mmap struct doesn't ''have'' to be in the '''FsContext''', I could create some other storage for it, like a linked list and search for it, but it is placed there under all examples, and is much easier to handle when it is time to release it. | ||
− | |||
This was the third brickwall I came across, after implementing '''IRP_MJ_READ''' and '''IRP_MJ_WRITE''', I could do simple IO like cat/type of files and view a PNG image of the cat. But what didn't work was '''notepad.exe'''. The simplest of all editors! That turns out to be the mmap problem, and since I failed to include the struct above, no mmap worked, so notepad could not read/write data. Wordpad.exe was fine. | This was the third brickwall I came across, after implementing '''IRP_MJ_READ''' and '''IRP_MJ_WRITE''', I could do simple IO like cat/type of files and view a PNG image of the cat. But what didn't work was '''notepad.exe'''. The simplest of all editors! That turns out to be the mmap problem, and since I failed to include the struct above, no mmap worked, so notepad could not read/write data. Wordpad.exe was fine. | ||
− | |||
Technically, I probably do not need to create vnodes under the Windows port, but I have mirrored XNU's VFS layer, so I can ASSERT on the vnode iocount references and so on, as well as hold on to the mmap struct above. Eventually the Windows vnode layer should be fleshed out a bit more, so it has a max-vnodes setting to cache vnodes, and a thread that calls reclaim when needed etc. Right now, everything is reclaimed on close, which is slow. | Technically, I probably do not need to create vnodes under the Windows port, but I have mirrored XNU's VFS layer, so I can ASSERT on the vnode iocount references and so on, as well as hold on to the mmap struct above. Eventually the Windows vnode layer should be fleshed out a bit more, so it has a max-vnodes setting to cache vnodes, and a thread that calls reclaim when needed etc. Right now, everything is reclaimed on close, which is slow. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− |