Dunno if number of platters/spindles has much to say, as you don't really have control over which files go where
NCQ helps mitigate seek thrashing, but does not remove it - if you
can avoid excessive seeking, you should.
I don't know how much drive cache memory matters to be honest, the benchmarks I've seen have been unable to show a difference between 16MB and 32MB of disk cache. I'm sure that
some amount is crucial, but when do we stop seeing gains? Certainly the benefit would be a lot larger is your Operation System didn't do read/write caching. But since it does, what are the (quantifiable!) situations where the on-disk cache matters? I've been pondering this for a couple of weeks now
If you have a system that does "insane caching" of writes (the kind only battery-backed devices usually dare), then multiple writing threads very likely isn't a problem, and the hardware can take it's time to re-order the writes and do as much sequential bursting as possible, avoiding seek thrashing, and giving good speeds.
As for SMB/CIFS (two names, same thing), SMBv1 (the protocol used up to and including XP) shows it's age and limitations. Iirc the problem is a mix of request-packet size as well as ACKs for each "packet", whereas SMBv2 (again, iirc - I should read up on this!) increases the "packet" size substantially, as well as allows for pipelining. The net effect is that SMBv2 should be able to utilize gigabit (and faster) connections a lot better, whereas SMBv1 (even on copying a single huge file) will usually cap out at around 30MB/s or so, even with various network tweaks (I get ~32MB/s on my fileserver via SMB, vs. 55+MB/s via FTP, depending on source disk speed - and if I didn't have everything AES-256 encrypted, it might even go faster
).
As for I/O Completion ports, the main rationale behind them is to combine with Async disk I/O. The principle is that you can issue a load of async I/O requests (which the OS can hopefully "be smart" about, at least that's the theory!), and instead of manually creating a bunch of threads, the completion routines get scheduled to a thread pool. It's mostly useful for high-performance many-connections internet servers (where you can avoid resource starvation from threads, and avoid some context switch overhead as well), but I
think it could be interesting for your file copying scenario as well,
if the OS is smart enough. At any rate, at least it would save you from the (relative
) wastefulness of creating a lot of threads - most of them will be spending a lot of time blocking on I/O, so a thread pool should serve just fine.
However, that is all theory, and it might not hold up in practice - I've been meaning to toy with IOCP myself for a long time, but haven't gotten around to it... and while the theory behind it is indeed very nice, I dunno if it'll work in practice for the file copying example. And I don't have super-high-end gear to test it on, even if I wrote the code