Welcome Guest.   Make a donation to an author on the site November 23, 2014, 07:15:25 PM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
Read the full one-year retrospective report on DonationCoder.com.
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: Prev 1 [2] 3 Next   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: NANY 2011 Release: Duplicate Photo Finder  (Read 28703 times)
Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #25 on: January 02, 2011, 07:30:54 AM »

Perry helped point out that identical photos with different tags (e.g. EXIF) were not being identified. (Yeah... Naughty me... I skipped that test...)

v1.2 adds in hashing for pixel data and lists these 3 methods:

  • Simple - Compares file sizes - Fast, medium reliability
  • File Signature - Compares file hashes - Very Slow, highly reliability (includes EXIF differrences)
  • Photo Signature - Compares pixel hashes - Very Slow, highly reliability (excludes EXIF differences)

On a side note, I learned an interesting little tidbit - .NET Controls Constructed Off-Screen Display Black. Very odd.

Anyways, I hope the update there is useful for people. (Should have done it in the first place.)

Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
Perry Mowbray
N.A.N.Y. Organizer
Moderator
*****
Posts: 1,807



Thoughtful Scribbles

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #26 on: January 02, 2011, 06:02:02 PM »

v1.2 adds in hashing for pixel data and lists these 3 methods:

  • Simple - Compares file sizes - Fast, medium reliability
  • File Signature - Compares file hashes - Very Slow, highly reliability (includes EXIF differrences)
  • Photo Signature - Compares pixel hashes - Very Slow, highly reliability (excludes EXIF differences)


Perfect!!!

Logged

worstje
Honorary Member
**
Posts: 555



The Gent with the White Hat

View Profile Give some DonationCredits to this forum member
« Reply #27 on: January 02, 2011, 06:38:18 PM »

So, just so I get this right... this tool can't recognise scaled down versions? I've got some old messy HDs lying around, needing backups, but I know from memory huge amounts of those lack sorting and have been scaled down as needed so other people could reliably view them. If I could easily go through it all and find the smaller-sized images to delete those, that'd help with sorting.

If it can't do this already, is there any chance such similar-image functionality could be added? smiley

I plead guilty to not testing the app since I don't actually have said old HD hooked up - it is just gathering dust at present. It is for one of those proverbial rainy days.
Logged
Perry Mowbray
N.A.N.Y. Organizer
Moderator
*****
Posts: 1,807



Thoughtful Scribbles

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #28 on: January 02, 2011, 06:44:50 PM »

So, just so I get this right... this tool can't recognise scaled down versions? I've got some old messy HDs lying around, needing backups, but I know from memory huge amounts of those lack sorting and have been scaled down as needed so other people could reliably view them. If I could easily go through it all and find the smaller-sized images to delete those, that'd help with sorting.

If it can't do this already, is there any chance such similar-image functionality could be added? smiley


That's right: it will only detect exact duplicates. The new EXIF exclusion functionality allows matching of the same image with different EXIF Tags.

Comparing scaled images would be neat, eh? I wonder if it could be done by scaling the in memory image down to the smaller size... but there's other factors involved, like the jpg quality used... I can hear Renegade saying that it's out of scope  Wink
Logged

Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #29 on: January 02, 2011, 08:40:04 PM »

So, just so I get this right... this tool can't recognise scaled down versions? I've got some old messy HDs lying around, needing backups, but I know from memory huge amounts of those lack sorting and have been scaled down as needed so other people could reliably view them. If I could easily go through it all and find the smaller-sized images to delete those, that'd help with sorting.

If it can't do this already, is there any chance such similar-image functionality could be added? smiley


That's right: it will only detect exact duplicates. The new EXIF exclusion functionality allows matching of the same image with different EXIF Tags.

Comparing scaled images would be neat, eh? I wonder if it could be done by scaling the in memory image down to the smaller size... but there's other factors involved, like the jpg quality used... I can hear Renegade saying that it's out of scope  Wink

cheesy It's out of scope. smiley

I'd put that in a "pro" version.

What I'd also include there though:

* Network storage (currently only local devices can be scanned)
* "Live" folder browser (currently does not refresh for changes in file system)
* SURF - Allows "fuzzy" detection for things like slightly different or possibly scaled images
* Database back end - For storing file path, hash and image metadata to speed up things & allow for better scanning
* Recursive folder searches - "Include subfolders"
* Other image format support - GIF, PNG, BMP, NEF, RAW, etc.
* Better data output - More than just file paths for duplicates with checkboxes
* Performance increases - Thread pooling and all that jazz.

What's in there right now is pretty much what most people need -- find "extra backups" and the like. It's simple, straight forward, and wasn't too much for me to get done by the deadline~! cheesy

Some of those I wanted to get in there even if I hid the functionality.

I'd actually spent most of my time doing research rather than actual programming. e.g. For the hashing, I spent probably close to 2 days just reading on different image comparisons and hashing methods. I'd also spent a good amount of time reading on fuzzy logic methods like SURF and SIFT.

I suppose if the program were to gain any kind of popularity I'd go back and do a pro version.
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
JavaJones
Review 2.0 Designer
Charter Member
***
Posts: 2,537



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #30 on: January 02, 2011, 09:20:11 PM »

I think there are already quite a few more sophisticated duplicate finders out there. I see the simplicity of this tool as one of its biggest benefits.

- Oshyan
Logged

The New Adventures of Oshyan Greene - A life in pictures...
Perry Mowbray
N.A.N.Y. Organizer
Moderator
*****
Posts: 1,807



Thoughtful Scribbles

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #31 on: January 02, 2011, 09:29:01 PM »

I think there are already quite a few more sophisticated duplicate finders out there. I see the simplicity of this tool as one of its biggest benefits.

Yes: I very much agree

I suppose if the program were to gain any kind of popularity I'd go back and do a pro version.

That would be nice too  Wink
« Last Edit: January 02, 2011, 10:10:34 PM by Perry Mowbray » Logged

Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #32 on: January 02, 2011, 09:29:58 PM »

I think there are already quite a few more sophisticated duplicate finders out there. I see the simplicity of this tool as one of its biggest benefits.

- Oshyan

True. I don't really see much point in doing much more there. It addresses basic needs. Or it addresses mine anyways~! cheesy
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
mouser
First Author
Administrator
*****
Posts: 33,692



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #33 on: January 07, 2011, 08:37:16 AM »

I watched an early test screencast that Wraith made and it went into some nice detail and examples that were cool. Wraith pointed out something that i thought was important.  He pointed out that if you select the same folder for both, and checked all items, it was possible to delete BOTH copies of a photo by accident.

I have a really simple solution suggestion:

Right before actually deleting, check that the original file partner of the pair still exists -- if not, skip the delete.
Logged
Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #34 on: January 09, 2011, 04:58:55 AM »

I watched an early test screencast that Wraith made and it went into some nice detail and examples that were cool. Wraith pointed out something that i thought was important.  He pointed out that if you select the same folder for both, and checked all items, it was possible to delete BOTH copies of a photo by accident.

I have a really simple solution suggestion:

Right before actually deleting, check that the original file partner of the pair still exists -- if not, skip the delete.

Went a different route. Tried a few things, but I think this works best.

The warning screen used to be this:



And it was possible to delete them all.

The problem that was supposed to address was "deciding which was the original". It's a damned if you do, damned if you don't situation.

* Decide -- removes control from user
* Don't decide -- leaves room for user error

Behavior now is:

1) Warning/Notification (now waffling on this as to whether or not it is needed as it is purely informative now):



2) Use only 1st image as "original" -- all successive photos are considered duplicates. e.g. The following images (1.jpg to 8.jpg are all identical):




Anyways, it is no longer possible to delete originals as described above.

Version updated to 1.3 (other speed optimizations added).

Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
Ath
Supporting Member
**
Posts: 2,263



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #35 on: January 09, 2011, 05:19:10 AM »

Thanks Renegade, for this improvement thumbs up
I'm always extra careful, even with lots of backups, with all my photo's, so I'm usually waiting for this kind of improvements before I even install tools like these.
Logged

Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #36 on: January 09, 2011, 05:28:39 AM »

Thanks Renegade, for this improvement thumbs up
I'm always extra careful, even with lots of backups, with all my photo's, so I'm usually waiting for this kind of improvements before I even install tools like these.

I hear you. I'm paranoid about things being deleted and regularly have too many backups of some things.

In the past I've always leaned in favor of giving control to the user, but I think those days are pretty much done. Decisions are "hard", so just making the decision for the user just makes things "easy".
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
Ath
Supporting Member
**
Posts: 2,263



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #37 on: January 09, 2011, 05:35:09 AM »

Murphy's law is still actively working as it always has. Cool
Logged

cranioscopical
Friend of the Site
Supporting Member
**
Posts: 4,195



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #38 on: January 09, 2011, 05:40:22 AM »

Quote from: Renegade
Anyways, it is no longer possible to delete originals as described above.

 Thmbsup
Logged

Chris
mouser
First Author
Administrator
*****
Posts: 33,692



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #39 on: January 09, 2011, 11:30:50 AM »

yep, i think as long as you can no longer accidentally delete all copies, the particular solution isn't important.
Logged
Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #40 on: January 14, 2011, 07:06:53 PM »

Well, I'm on to doing some threading stuff now to speed things up. (Got a few hours to burn today.)

On a funny note, my first attempt focused on threading for the simple comparison method, but it was simply too fast in parts and didn't work. In other words, time to do some significant refactoring. smiley

Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
nharding
Member
**
Posts: 36


View Profile Give some DonationCredits to this forum member
« Reply #41 on: January 14, 2011, 09:47:46 PM »

When you do the check hashes for identical files, you can speed that up significantly by only calculation the hashes for duplicate file sizes. So if you have 8 files
a.jpg [102456 bytes] b.jpg [232583 bytes] c.jpg [104356 bytes] d.jpg [232483 bytes] e.jpg [102456 bytes] f.jpg [232583 bytes] g.jpg [38914 bytes] h.jpg [89583 bytes] then you only need to calculate hashes for the files that are 102456 or 232583 bytes long. This is what I do as part of my check all archives for duplicates in DCDisplay (only there, since I want to go via image data, I check if hash maps then I check number of pages, and average resolution)

Neil Harding
Logged
Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #42 on: January 14, 2011, 10:55:15 PM »

When you do the check hashes for identical files, you can speed that up significantly by only calculation the hashes for duplicate file sizes. So if you have 8 files
a.jpg [102456 bytes] b.jpg [232583 bytes] c.jpg [104356 bytes] d.jpg [232483 bytes] e.jpg [102456 bytes] f.jpg [232583 bytes] g.jpg [38914 bytes] h.jpg [89583 bytes] then you only need to calculate hashes for the files that are 102456 or 232583 bytes long. This is what I do as part of my check all archives for duplicates in DCDisplay (only there, since I want to go via image data, I check if hash maps then I check number of pages, and average resolution)

Neil Harding


Thanks for the tip Neil.

At the moment, my logic for 2 methods are combined and need refactoring. One checks for file exactness while the other compares image data exactness. File size doesn't help for image data exactness, so at the moment that won't get done. I need to refactor things to make it cleaner. Threading forces that, so once I get that done, your tip will be an excellent optimization~! smiley


Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
tomos
Charter Member
***
Posts: 8,694



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #43 on: March 12, 2011, 03:36:12 PM »

I hope you wont be cursing me for these bug reports Renegade  smiley

I was helping someone move their photos to a new computer today. Their older pc has a maze of duplicated folders and files - there seemed to be 10 copies of one batch of images in the one folder tellme
Everything was already copied onto an external drive.

So I installed your app on the new laptop:
Windows 7 64 bit; classic theme (I believe it's uptodate but dont know if SP1 installed)

I compared the pics on the external drive.

Some minor problems:

1) when I selected the folder in the lower pane, the selected folder in the upper pane was no longer readable - ie, text and highlighted background were both dark blue, if I then clicked on the folder in the upper half, it became readable - but the lower one was then unreadable.

2) when it was checking for duplicates, there was no indication it was working, until I actually clicked (anywhere) on the programme window - then the hour-glass showed.

3) eventually I had to compare the "my pictures" folder with itself. (I used the fast setting.) I'm not sure how many exactly, but there was over two thousand loose images in there and a few folders**. It showed the warning message, I clicked okay, but basically it seized up after that - when I clicked (anywhere) on the window it said no response (or whatever it says - "keine rückmeldung") in the titlebar. So eventually I closed it down - it did close normally, I didnt have to kill it. I tried a few times but it was definitely too much for it...


** I presume contents of subfolders are NOT looked into?
Logged

Tom
Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #44 on: March 12, 2011, 07:06:31 PM »

I hope you wont be cursing me for these bug reports Renegade  smiley

I was helping someone move their photos to a new computer today. Their older pc has a maze of duplicated folders and files - there seemed to be 10 copies of one batch of images in the one folder tellme
Everything was already copied onto an external drive.

So I installed your app on the new laptop:
Windows 7 64 bit; classic theme (I believe it's uptodate but dont know if SP1 installed)

I compared the pics on the external drive.

Some minor problems:

1) when I selected the folder in the lower pane, the selected folder in the upper pane was no longer readable - ie, text and highlighted background were both dark blue, if I then clicked on the folder in the upper half, it became readable - but the lower one was then unreadable.

2) when it was checking for duplicates, there was no indication it was working, until I actually clicked (anywhere) on the programme window - then the hour-glass showed.

3) eventually I had to compare the "my pictures" folder with itself. (I used the fast setting.) I'm not sure how many exactly, but there was over two thousand loose images in there and a few folders**. It showed the warning message, I clicked okay, but basically it seized up after that - when I clicked (anywhere) on the window it said no response (or whatever it says - "keine rückmeldung") in the titlebar. So eventually I closed it down - it did close normally, I didnt have to kill it. I tried a few times but it was definitely too much for it...


** I presume contents of subfolders are NOT looked into?

Cursing? Heck no! I'm glad you've told me~! smiley


Quote
1) when I selected the folder in the lower pane, the selected folder in the upper pane was no longer readable - ie, text and highlighted background were both dark blue, if I then clicked on the folder in the upper half, it became readable - but the lower one was then unreadable.

Is the computer using a theme? Or is the default color scheme changed in Windows?

Quote
2) when it was checking for duplicates, there was no indication it was working, until I actually clicked (anywhere) on the programme window - then the hour-glass showed.

That shouldn't happen. The progress bars at the bottom should start.

Quote
3) eventually I had to compare the "my pictures" folder with itself. (I used the fast setting.) I'm not sure how many exactly, but there was over two thousand loose images in there and a few folders**. It showed the warning message, I clicked okay, but basically it seized up after that - when I clicked (anywhere) on the window it said no response (or whatever it says - "keine rückmeldung") in the titlebar. So eventually I closed it down - it did close normally, I didnt have to kill it. I tried a few times but it was definitely too much for it...

Can you let me know the computer specs? I was testing on folders of about 1,000 photos that were 4~5 GB, and it worked fine.

Also, can you let me know the rough sizes of the pictures?

Offhand, I don't know what the problem is. There's nothing special going on, and nothing really tricky that could "mess up".

I think it might be the folder browser control... It's based on the stock listview. I probably should replace that with a custom control I have that's designed for very large data sets. Still... 2000 isn't "that" many...

I have some work to get done here, so I'll look into it this evening if I get done in time, or tomorrow, and get back to you.

Thanks for letting me know.
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
tomos
Charter Member
***
Posts: 8,694



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #45 on: March 13, 2011, 08:37:07 AM »

1) Classic theme - it was modified but just the grey and the titlebar colours modified (selection colour definitely not modified)
Have you tried it with default classic? - I can test it again next weekend**

2) no, didnt see progress bars working at any stage

3) computer was laptop without a lot of memory (by Win.7 standards) - 2GB I think. I'll have to ask them & get back to you with more details. (Scratch that**)


** It was someone elses machine, and they wouldnt be able to tell me any of this info - I probably wont see them till next weekend. I can check out the theme then in more detail as well.

Logged

Tom
Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #46 on: March 13, 2011, 09:33:35 AM »

I always try to stick to system colors, but need to check that. I'll check with classic as well though.

I didn't get to it today though.

The lack of progress bars is very odd though... I'll pursue that avenue.
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
rjbull
Charter Member
***
Posts: 2,788

View Profile Give some DonationCredits to this forum member
« Reply #47 on: November 14, 2011, 04:14:52 PM »

In case you don't already know, it looks like there's a similar program with the same name - Duplicate Photo Finder.  It looks to be US$49.90 at full price, but they make the price very hard to find (bottom of the Upgrade page).
Logged
Curt
Supporting Member
**
Posts: 6,349

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #48 on: June 18, 2012, 10:06:24 AM »

In case you don't already know, it looks like there's a similar program with the same name - Duplicate Photo Finder.  It looks to be US$49.90 at full price, but they make the price very hard to find (bottom of the Upgrade page).

-right now it is merely $10: http://www.duplicatephotofinder.com/uninstall.html for 1 year's free upgrade (if any!), and $20 for 2 years. I tested it, and did not purchase a key! because the included "similarity finder" is too imaginative...

Logged
Renegade
Charter Member
***
Posts: 11,803



Tell me something you don't know...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #49 on: June 18, 2012, 02:57:38 PM »

In case you don't already know, it looks like there's a similar program with the same name - Duplicate Photo Finder.  It looks to be US$49.90 at full price, but they make the price very hard to find (bottom of the Upgrade page).

-right now it is merely $10: http://www.duplicatephotofinder.com/uninstall.html for 1 year's free upgrade (if any!), and $20 for 2 years. I tested it, and did not purchase a key! because the included "similarity finder" is too imaginative...

I looked into that functionality before, and it's very difficult to get done right. At the end of the day, computers fail the Turing test. Sad
Logged

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker
Pages: Prev 1 [2] 3 Next   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.058s | Server load: 0.05 ]