NT schedules strictly at thread granularity, no consideration is given to what process the thread belongs to. - so yes, an app with 10 threads could "starve" an app with 2 threads. So you need to do responsible design; don't have a bunch of CPU-intensive threads on a single-CPU system. Having multiple threads isn't a problem as long as most of the threads are blocking for events,...
-f0dder
And I think I'm okay with that design because it simplifies the scheduler, which means it works faster. The only downside I see is if someone wrote some malware (like a packet sniffer) with lots of threads that would create a denial-of-service effect on your computer. But that would bring your attention to the malware, and the malware designer wouldn't want to do that.
In most OSes, everything in "kernel mode" (which includes the drivers and the kernel/monitor) are mapped together such that execution can move from one place to another without the overhead a protection switch.... Does Windows work the same way?
Yup, everything kernel-mode is basically lumped together in the high part of the address space (upper 2GB, unless a boot.ini switch is added to make the split 3GB for usermode and 1GB for kernel mode).
-f0dder
That brings up a new topic. Is it possible to have more than 4GBytes of memory in Windows and still use the 32-bit version of the OS? For example, could you mapped one 4GB (32-bit) block for just the kernel (not that you would really need/want to) and the other 4 GB block for the user mode so that you have a 8GB machine running 32-bit Windows? I realize this design may not be ideal, but is it possible? (Go ahead and move this new topic to another thread if the answer requires some discussion. Maybe it won't.)
Well, you run out of MMU register pairs for each separately protected module that must be constantly mapped into memory. How many MMU mapping registers does the Pentium processor have?
x86 doesn't work that way
You have a register (CR3) that points to a page table (physical memory address). Each process has it's own CR3 value. The page table is a multi-level hierarchy that maps linear->physical addresses, including some additional info like access rights (since pages are 4k and must start at 4k boundaries, there's 12 bits for per-page stuff).-f0dder
I'm trying to decide as long as these page tables stay in L1 cache, if this is an acceptable solution? My initial thinking is that if you had dedicated mapping registers (without any memory contingencies between processor, address decoder, and address mapper), you could have more parallel operations (instruction fetching & effective address compution). But, in truth, some pipelining and segmenting of the L1 cache could be used to avoid this potential conflict.
My only comment is, as you make you pipeline longer, you suffer more pentalities (like pipeline refilling) on branch instructions. I do know the Pentium does look-ahead address compution on branch instructions--and maybe it needs to for this reason.
I guess I favor the dedicated register design for the MMU. It's cleaner and you don't have to worry about several subunits fighting over the same L1 cache for their parallel activities. You could set aside (segment) part of the L1 cache for address mapping info, but then you would have a messy form of the dedicated MMU register design.
NT doesn't do "swapping", it does "paging" - ie., it swaps individual pages in and out, instead of full processes.-f0dder
There's a followup question on this at
https://www.donation...index.php?topic=6142