So, just so I get this right... this tool can't recognise scaled down versions? I've got some old messy HDs lying around, needing backups, but I know from memory huge amounts of those lack sorting and have been scaled down as needed so other people could reliably view them. If I could easily go through it all and find the smaller-sized images to delete those, that'd help with sorting.
If it can't do this already, is there any chance such similar-image functionality could be added?
-worstje
That's right: it will only detect exact duplicates. The new EXIF exclusion functionality allows matching of the same image with different EXIF Tags.
Comparing scaled images would be neat, eh? I wonder if it could be done by scaling the in memory image down to the smaller size... but there's other factors involved, like the jpg quality used... I can hear Renegade saying that it's out of scope
-Perry Mowbray
It's out of scope.
I'd put that in a "pro" version.
What I'd also include there though:
* Network storage (currently only local devices can be scanned)
* "Live" folder browser (currently does not refresh for changes in file system)
*
SURF - Allows "fuzzy" detection for things like slightly different or possibly scaled images
* Database back end - For storing file path, hash and image metadata to speed up things & allow for better scanning
* Recursive folder searches - "Include subfolders"
* Other image format support - GIF, PNG, BMP, NEF, RAW, etc.
* Better data output - More than just file paths for duplicates with checkboxes
* Performance increases - Thread pooling and all that jazz.
What's in there right now is pretty much what most people need -- find "extra backups" and the like. It's simple, straight forward, and wasn't too much for me to get done by the deadline~!
Some of those I wanted to get in there even if I hid the functionality.
I'd actually spent most of my time doing research rather than actual programming. e.g. For the hashing, I spent probably close to 2 days just reading on different image comparisons and hashing methods. I'd also spent a good amount of time reading on fuzzy logic methods like SURF and SIFT.
I suppose if the program were to gain any kind of popularity I'd go back and do a pro version.