Other Software > Developer's Corner
Real-time OS drivers and their scheduling
f0dder:
Okay, I've browsed a bit through "Inside Windows 2000" and tried to summarize just a little part of Thread Scheduling from Chapter 6.
A thread runs for an amount of time called a "Quantum". A thread will always be interrupted at the end of it's Quantum, and NT will then check if there's a higher-priority thread that needs to be scheduled, or if the current thread's priority needs to be reduced (there's some dynamic priority levels in NT).
A thread isn't guaranteed to run for it's entire quantum, though - it can even be pre-empted before it's quantum starts. This is because higher-priority threads are always given preferance.
NT schedules strictly at thread granularity, no consideration is given to what process the thread belongs to. - so yes, an app with 10 threads could "starve" an app with 2 threads. So you need to do responsible design; don't have a bunch of CPU-intensive threads on a single-CPU system. Having multiple threads isn't a problem as long as most of the threads are blocking for events, though, since those won't even be considered for scheduling.
In most OSes, everything in "kernel mode" (which includes the drivers and the kernel/monitor) are mapped together such that execution can move from one place to another without the overhead a protection switch. (Yes, that means a bad driver can crash the kernel.) Does Windows work the same way?
--- End quote ---
Yup, everything kernel-mode is basically lumped together in the high part of the address space (upper 2GB, unless a boot.ini switch is added to make the split 3GB for usermode and 1GB for kernelmode). So the kernel part, afaik, stays mapped in every process.
Well, you run out of MMU register pairs for each separately protected module that must be constantly mapped into memory. How many MMU mapping registers does the Pentium processor have?
--- End quote ---
x86 doesn't work that way :)
You have a register (CR3) that points to a page table (physical memory address). Each process has it's own CR3 value. The page table is a multi-level hierarchy that maps linear->physical addresses, including some additional info like access rights (since pages are 4k and must start at 4k boundaries, there's 12 bits for per-page stuff). There's also some extensions that allow for 2MB and 4MB pages, but 4kb is by far the most common and useful granularity.
That was a very rough breakdown :)
NT doesn't do "swapping", it does "paging" - ie., it swaps individual pages in and out, instead of full processes.
superticker:
NT schedules strictly at thread granularity, no consideration is given to what process the thread belongs to. - so yes, an app with 10 threads could "starve" an app with 2 threads. So you need to do responsible design; don't have a bunch of CPU-intensive threads on a single-CPU system. Having multiple threads isn't a problem as long as most of the threads are blocking for events,...
-f0dder (November 12, 2006, 09:32 AM)
--- End quote ---
And I think I'm okay with that design because it simplifies the scheduler, which means it works faster. The only downside I see is if someone wrote some malware (like a packet sniffer) with lots of threads that would create a denial-of-service effect on your computer. But that would bring your attention to the malware, and the malware designer wouldn't want to do that.
In most OSes, everything in "kernel mode" (which includes the drivers and the kernel/monitor) are mapped together such that execution can move from one place to another without the overhead a protection switch.... Does Windows work the same way?
--- End quote ---
Yup, everything kernel-mode is basically lumped together in the high part of the address space (upper 2GB, unless a boot.ini switch is added to make the split 3GB for usermode and 1GB for kernel mode).
-f0dder (November 12, 2006, 09:32 AM)
--- End quote ---
That brings up a new topic. Is it possible to have more than 4GBytes of memory in Windows and still use the 32-bit version of the OS? For example, could you mapped one 4GB (32-bit) block for just the kernel (not that you would really need/want to) and the other 4 GB block for the user mode so that you have a 8GB machine running 32-bit Windows? I realize this design may not be ideal, but is it possible? (Go ahead and move this new topic to another thread if the answer requires some discussion. Maybe it won't.)
Well, you run out of MMU register pairs for each separately protected module that must be constantly mapped into memory. How many MMU mapping registers does the Pentium processor have?
--- End quote ---
x86 doesn't work that way :)
You have a register (CR3) that points to a page table (physical memory address). Each process has it's own CR3 value. The page table is a multi-level hierarchy that maps linear->physical addresses, including some additional info like access rights (since pages are 4k and must start at 4k boundaries, there's 12 bits for per-page stuff).-f0dder (November 12, 2006, 09:32 AM)
--- End quote ---
I'm trying to decide as long as these page tables stay in L1 cache, if this is an acceptable solution? My initial thinking is that if you had dedicated mapping registers (without any memory contingencies between processor, address decoder, and address mapper), you could have more parallel operations (instruction fetching & effective address compution). But, in truth, some pipelining and segmenting of the L1 cache could be used to avoid this potential conflict.
My only comment is, as you make you pipeline longer, you suffer more pentalities (like pipeline refilling) on branch instructions. I do know the Pentium does look-ahead address compution on branch instructions--and maybe it needs to for this reason.
I guess I favor the dedicated register design for the MMU. It's cleaner and you don't have to worry about several subunits fighting over the same L1 cache for their parallel activities. You could set aside (segment) part of the L1 cache for address mapping info, but then you would have a messy form of the dedicated MMU register design.
NT doesn't do "swapping", it does "paging" - ie., it swaps individual pages in and out, instead of full processes.-f0dder (November 12, 2006, 09:32 AM)
--- End quote ---
There's a followup question on this at https://www.donationcoder.com/forum/index.php?topic=6142
f0dder:
That brings up a new topic. Is it possible to have more than 4GBytes of memory in Windows and still use the 32-bit version of the OS? For example, could you mapped one 4GB (32-bit) block for just the kernel (not that you would really need/want to) and the other 4 GB block for the user mode so that you have a 8GB machine running 32-bit Windows? I realize this design may not be ideal, but is it possible? (Go ahead and move this new topic to another thread if the answer requires some discussion. Maybe it won't.)
--- End quote ---
You can have more than 4GB yup, but iirc you need one of the server versions of windows to use it. Also, traditionally each application will only be able to use around 2 gigs of address space (or 3 gigs with that boot.ini switch) - although some "Address Window Extensions" have been added so you can map "windows" to physical RAM.
More than 4GB ram support was added already with the Pentrium Pro yeeeaaars ago, by the way :)
I'm trying to decide as long as these page tables stay in L1 cache, if this is an acceptable solution? My initial thinking is that if you had dedicated mapping registers (without any memory contingencies between processor, address decoder, and address mapper), you could have more parallel operations (instruction fetching & effective address compution). But, in truth, some pipelining and segmenting of the L1 cache could be used to avoid this potential conflict.
--- End quote ---
Well, it obviously works :). There's something called TLB - Translate Lookaside Buffers. Basically some extra caching for page table entries. TLB flushes/misses are relatively expensive.
I guess I favor the dedicated register design for the MMU. It's cleaner and you don't have to worry about several subunits fighting over the same L1 cache for their parallel activities. You could set aside (segment) part of the L1 cache for address mapping info, but then you would have a messy form of the dedicated MMU register design.
--- End quote ---
A bit cleaner and perhaps more efficient, but less flexible. MMU registers sound good for embedded devices, but IMHO isn't that great an idea for a generic operating system such as NT. Especially not if you take Terminal Services into account ;)
Navigation
[0] Message Index
[*] Previous page
Go to full version