ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

teracopy: copy your files faster

<< < (22/28) > >>

KD7LRJ:
Anyone want to try out yet another copy program? It is multi-threaded and was designed to improve copy speeds on high-end network storage devices. It's mostly for use with lots (millions!) of small files, but works in other situations as well. Not a big deal - just thought I'd share if anyone can use such a thing...

McTool - web page


* Multi-threaded
* Drag & drop
* Directory synchronization
* E-mail report when finished
* Save parameter files
* Run parameter files from command line
* Log actions to file
* Retry on error
* Wildcard (Regular Expression) file matching
* Pause/resume

f0dder:
Hmm, what's the point of doing it multi-threaded? Isn't this just going to thrash the disk drives seek requests? I guess the filesystem cache might rescue you though, since you mention it's made mainly for small files. I wouldn't have thought multiple threads worked that well wrt. network copying, but perhaps it triggers multiple SMB sessions - in which case it is useful (I haven't made SMBv1 sessions go much faster than ~30MB/s, and SMBv1 is all you're going to get unless both machines run Vista or 2008-Server).

Personally I'd experiment with fewer threads but include async I/O (possibly using I/O Completion Ports).

KD7LRJ:
Hmm, what's the point of doing it multi-threaded? Isn't this just going to thrash the disk drives seek requests?
-f0dder (January 07, 2009, 10:21 PM)
--- End quote ---
On drives with a single spindle, you're right to suspect minimal or even negative improvement, yet most of the time with relatively small files (perhaps even as large as, say, files in an MP3 collection), you can still gain a performance advantage by using a small number of threads (2-5 has worked well for me in the past) on single drives.

Some things that affect whether multi-threading will improve copy speeds:


* The sizes of the files.
* The availability of NCQ on the drives (both source and destination).
* The sizes of the internal memory caches on the drives.
* The number of spindles that make up the volumes (e.g. a RAID array)
* Other hardware caching of data on the drives (NAS controllers, etc.)
On high-end systems (like these) I often see 10 times the throughput using multiple threads (whether using McTool or multiple instances of RoboCopy) when copying smallish files from one system to another. I have seen throughput speeds of over 90 MB/s in a single instance of the tool.

I don't know enough about SMB, CIFS, or Completion Ports to talk intelligently about them, but in developing this application, I have run into enough information about them to think it would be worth learning more...

f0dder:
Dunno if number of platters/spindles has much to say, as you don't really have control over which files go where :)

NCQ helps mitigate seek thrashing, but does not remove it - if you can avoid excessive seeking, you should.

I don't know how much drive cache memory matters to be honest, the benchmarks I've seen have been unable to show a difference between 16MB and 32MB of disk cache. I'm sure that some amount is crucial, but when do we stop seeing gains? Certainly the benefit would be a lot larger is your Operation System didn't do read/write caching. But since it does, what are the (quantifiable!) situations where the on-disk cache matters? I've been pondering this for a couple of weeks now :)

If you have a system that does "insane caching" of writes (the kind only battery-backed devices usually dare), then multiple writing threads very likely isn't a problem, and the hardware can take it's time to re-order the writes and do as much sequential bursting as possible, avoiding seek thrashing, and giving good speeds.

As for SMB/CIFS (two names, same thing), SMBv1 (the protocol used up to and including XP) shows it's age and limitations. Iirc the problem is a mix of request-packet size as well as ACKs for each "packet", whereas SMBv2 (again, iirc - I should read up on this!) increases the "packet" size substantially, as well as allows for pipelining. The net effect is that SMBv2 should be able to utilize gigabit (and faster) connections a lot better, whereas SMBv1 (even on copying a single huge file) will usually cap out at around 30MB/s or so, even with various network tweaks (I get ~32MB/s on my fileserver via SMB, vs. 55+MB/s via FTP, depending on source disk speed - and if I didn't have everything AES-256 encrypted, it might even go faster ;)).

As for I/O Completion ports, the main rationale behind them is to combine with Async disk I/O. The principle is that you can issue  a load of async I/O requests (which the OS can hopefully "be smart" about, at least that's the theory!), and instead of manually creating a bunch of threads, the completion routines get scheduled to a thread pool. It's mostly useful for high-performance many-connections internet servers (where you can avoid resource starvation from threads, and avoid some context switch overhead as well), but I think it could be interesting for your file copying scenario as well, if the OS is smart enough. At any rate, at least it would save you from the (relative :)) wastefulness of creating a lot of threads - most of them will be spending a lot of time blocking on I/O, so a thread pool should serve just fine.

However, that is all theory, and it might not hold up in practice - I've been meaning to toy with IOCP myself for a long time, but haven't gotten around to it... and while the theory behind it is indeed very nice, I dunno if it'll work in practice for the file copying example. And I don't have super-high-end gear to test it on, even if I wrote the code :)

wreckedcarzz:
Just wanted to say that I read this topic a while back, and for the last year or so I've been using TeraCopy almost exclusively (except in situations where launching an external file copy app would take up more time than would be gained by it, and that means... Explorer). I finally got the portable version for my new 16GB flash drive (couldn't fit it on my tiny 1GB), and am using it everywhere. Gotta love it. :)

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version