Via AND intel usb is a typical mix on a P4 system that has more controllers than the onboard the intel chipset offers - iirc i had a mix like that on my P4 systems.
IMHO a specialized copy program won't do terribly much for speed when simply copying one file at a time - the advantages of things like Overlapping I/O has more to do with application logic and maximal use of threads. And if there's multiple simultaneous I/O, well, disk performance is going to hell anyway

I did various tests of file reading methods a while ago, and didn't find any difference in completion speed between Overlapped I/O, normal ReadFile, ReadFile without buffering, and memory mapped files. I did, however, find that memory mapped file access took more CPU time while doing the copy, and that ReadFile without buffering took the least.