ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

MD5Hash 2.9

<< < (2/9) > >>

MilesAhead:
Why MD5? :)
Why x64-only?

As for "not the fastest checksummer", how do you do your file I/O? in case you aren't using either, I'd suggest testing with both memory-mapped files (which are both over- and under-appreciated) as well as overlapped (async) I/O.
-f0dder (September 23, 2010, 01:58 PM)
--- End quote ---

md5 because that's what I use.
x64 only because I already have FileCRC32 32 bit that does(surprise) crc32 and md5 sum.

It's not the fie i/o.  To make it faster I'd need a thread-safe md5 routine.  Also that complicates updating the UI for huge files and is more prone to crashing.  The program uses about 1/2 a core on my system. Not everyone wants to max out all cores to do download checksums.

btw you code so do you have anything for download?
edit: ok, I see the page now.

btw I already see a bunch of free checksum type programs that run right in the shell extension(or at least seem to.)  Most seem to want to function at max speed.  I see no reason to try to duplicate free programs with a design that doesn't interest me.  I tried 2 or 3 and when multiple files are selected sized a GB or so, at least with the ones I tried, they went away after a few seconds of furious processing. I'm sure there are some that work.  But to me, for my usage, it's a background task.  If the usage was for some other purpose then the design would be different.

I've used the same method in FileCRC32 adapted to this program.  In years of use I don't recall it crapping out because the files were large.  It may be a tortoise, but it gets there with the right results. :)


f0dder:
You say it's not the I/O, but that you're only running at about half a core - that indicates that you are I/O bound, or are block-waiting on something else. Async I/O lets you do you disk reads and hash computation in parallel, and I believe MMF should allow the same (you have less control and take a slight (as in should-be-almost-unmeasurable on modern CPUs) CPU hit from it, though). You'll still ultimately be I/O bound, but since you can overlap CPU and I/O you can possibly shave off a bit of execution time.

I don't think MD5 itself can be parallelized across threads, that would kind of defeat the "every input bit should affect every output bit" goal of cryptographic hashes.

MilesAhead:
I'm familiar with asynchronous i/o. I was not thinking of parallelizing one md5 computation across threads but having an instance of an md5 calculator processing a file for each thread/core.  But, as I say, as it is now, I can use the machine while the program is processing files 8+ GB in size with confidence it will complete correctly.  Why would I want to sit and watch a bunch of md5 sums being calculated? I'd rather, say, look for and download more files while the calculations are carried out.

In any case, for this application, the simpler design is likely better as the user is most likely not going to drop 512 files on the app to process in parallel.  More likely 4 or 5 files will be dropped on the program, then something else, like surfing the net will take place in the foreground.

f0dder:
Processing multiple files in parallel will only slow down the operation since you'll introduce a fair amount of read/write head movements - unless of course you have the files to sum on different physical drives, but how often does that usecase happen? :P

The argument for async I/O isn't to try and get full core utilization; it isn't something you can achieve with a nonparallizable algorithm, and I agree it's not something you want for this application anyway. But if you can get a bit closer to full utilization of one one by overlapping reads and computation, the system is still usable but the summing task will finish faster.

I've actually been meaning to do some performance benchmarking for various reading methods with hashing in mind for a short while, but unfortunately haven't found the time to do so yet :)

MilesAhead:
I'd like to have a multi-thread safe/instance safe/ md5 class just in case I ever found a use for it. :)

But on the file i/o I guess I would want to test drive it before making assumptions. If I introduce multi-tasking elements then I have to discriminate between tasks too trivial and those that need it.  So now I have a single-threaded pipe for the trivial stuff and a multi-threaded pipe for the big jobs all of which have to track UI updates so the user knows the 8 GB file isn't hung.  For this case it's way too much work for little gain.

But I'm sure those guys who process video are doing aync i/o as I can see the HD honkin' while they're fixing video bits.

I should have elaborated in first post that people who have used those super fast multi-threaded hash grinders should not expect that approach with this app. I wasn't really lamenting the lack of speed.  Just by implementing a similar hash in 64 bit with the same drag & drop processing scheme I think I got about a 52% increase over the old 32 bit version.  I processed the same file back and forth a couple of times to take file caching out of the result.

edit: just for grins I processed a file a bit less than 8 GB from one external drive, then a file very close to the same size from a faster external.  It made quite an improvement. So that supports your assertion.  I tended to discount it because I tried increasing the file buffer from 1 MB to 4 earlier and didn't detect an improvement.  Eyeballing it can be deceptive. :)

But this run indicates it is waiting for input.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version