topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Sunday December 15, 2024, 12:58 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: NANY 2011 Release: Duplicate Photo Finder  (Read 85241 times)

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #25 on: January 02, 2011, 07:30 AM »
Perry helped point out that identical photos with different tags (e.g. EXIF) were not being identified. (Yeah... Naughty me... I skipped that test...)

v1.2 adds in hashing for pixel data and lists these 3 methods:

  • Simple - Compares file sizes - Fast, medium reliability
  • File Signature - Compares file hashes - Very Slow, highly reliability (includes EXIF differrences)
  • Photo Signature - Compares pixel hashes - Very Slow, highly reliability (excludes EXIF differences)

On a side note, I learned an interesting little tidbit - .NET Controls Constructed Off-Screen Display Black. Very odd.

Anyways, I hope the update there is useful for people. (Should have done it in the first place.)

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

Perry Mowbray

  • N.A.N.Y. Organizer
  • Moderator
  • Joined in 2005
  • *****
  • Posts: 1,817
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #26 on: January 02, 2011, 06:02 PM »
v1.2 adds in hashing for pixel data and lists these 3 methods:

  • Simple - Compares file sizes - Fast, medium reliability
  • File Signature - Compares file hashes - Very Slow, highly reliability (includes EXIF differrences)
  • Photo Signature - Compares pixel hashes - Very Slow, highly reliability (excludes EXIF differences)


Perfect!!!

Duplicate Photo Finder.pngNANY 2011 Release: Duplicate Photo Finder

worstje

  • Honorary Member
  • Joined in 2009
  • **
  • Posts: 588
  • The Gent with the White Hat
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #27 on: January 02, 2011, 06:38 PM »
So, just so I get this right... this tool can't recognise scaled down versions? I've got some old messy HDs lying around, needing backups, but I know from memory huge amounts of those lack sorting and have been scaled down as needed so other people could reliably view them. If I could easily go through it all and find the smaller-sized images to delete those, that'd help with sorting.

If it can't do this already, is there any chance such similar-image functionality could be added? :)

I plead guilty to not testing the app since I don't actually have said old HD hooked up - it is just gathering dust at present. It is for one of those proverbial rainy days.

Perry Mowbray

  • N.A.N.Y. Organizer
  • Moderator
  • Joined in 2005
  • *****
  • Posts: 1,817
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #28 on: January 02, 2011, 06:44 PM »
So, just so I get this right... this tool can't recognise scaled down versions? I've got some old messy HDs lying around, needing backups, but I know from memory huge amounts of those lack sorting and have been scaled down as needed so other people could reliably view them. If I could easily go through it all and find the smaller-sized images to delete those, that'd help with sorting.

If it can't do this already, is there any chance such similar-image functionality could be added? :)


That's right: it will only detect exact duplicates. The new EXIF exclusion functionality allows matching of the same image with different EXIF Tags.

Comparing scaled images would be neat, eh? I wonder if it could be done by scaling the in memory image down to the smaller size... but there's other factors involved, like the jpg quality used... I can hear Renegade saying that it's out of scope  ;)

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #29 on: January 02, 2011, 08:40 PM »
So, just so I get this right... this tool can't recognise scaled down versions? I've got some old messy HDs lying around, needing backups, but I know from memory huge amounts of those lack sorting and have been scaled down as needed so other people could reliably view them. If I could easily go through it all and find the smaller-sized images to delete those, that'd help with sorting.

If it can't do this already, is there any chance such similar-image functionality could be added? :)


That's right: it will only detect exact duplicates. The new EXIF exclusion functionality allows matching of the same image with different EXIF Tags.

Comparing scaled images would be neat, eh? I wonder if it could be done by scaling the in memory image down to the smaller size... but there's other factors involved, like the jpg quality used... I can hear Renegade saying that it's out of scope  ;)

:D It's out of scope. :)

I'd put that in a "pro" version.

What I'd also include there though:

* Network storage (currently only local devices can be scanned)
* "Live" folder browser (currently does not refresh for changes in file system)
* SURF - Allows "fuzzy" detection for things like slightly different or possibly scaled images
* Database back end - For storing file path, hash and image metadata to speed up things & allow for better scanning
* Recursive folder searches - "Include subfolders"
* Other image format support - GIF, PNG, BMP, NEF, RAW, etc.
* Better data output - More than just file paths for duplicates with checkboxes
* Performance increases - Thread pooling and all that jazz.

What's in there right now is pretty much what most people need -- find "extra backups" and the like. It's simple, straight forward, and wasn't too much for me to get done by the deadline~! :D

Some of those I wanted to get in there even if I hid the functionality.

I'd actually spent most of my time doing research rather than actual programming. e.g. For the hashing, I spent probably close to 2 days just reading on different image comparisons and hashing methods. I'd also spent a good amount of time reading on fuzzy logic methods like SURF and SIFT.

I suppose if the program were to gain any kind of popularity I'd go back and do a pro version.
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

JavaJones

  • Review 2.0 Designer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 2,739
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #30 on: January 02, 2011, 09:20 PM »
I think there are already quite a few more sophisticated duplicate finders out there. I see the simplicity of this tool as one of its biggest benefits.

- Oshyan

Perry Mowbray

  • N.A.N.Y. Organizer
  • Moderator
  • Joined in 2005
  • *****
  • Posts: 1,817
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #31 on: January 02, 2011, 09:29 PM »
I think there are already quite a few more sophisticated duplicate finders out there. I see the simplicity of this tool as one of its biggest benefits.

Yes: I very much agree

I suppose if the program were to gain any kind of popularity I'd go back and do a pro version.

That would be nice too  ;)
« Last Edit: January 02, 2011, 10:10 PM by Perry Mowbray »

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #32 on: January 02, 2011, 09:29 PM »
I think there are already quite a few more sophisticated duplicate finders out there. I see the simplicity of this tool as one of its biggest benefits.

- Oshyan

True. I don't really see much point in doing much more there. It addresses basic needs. Or it addresses mine anyways~! :D
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,914
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Deleter
« Reply #33 on: January 07, 2011, 08:37 AM »
I watched an early test screencast that Wraith made and it went into some nice detail and examples that were cool. Wraith pointed out something that i thought was important.  He pointed out that if you select the same folder for both, and checked all items, it was possible to delete BOTH copies of a photo by accident.

I have a really simple solution suggestion:

Right before actually deleting, check that the original file partner of the pair still exists -- if not, skip the delete.

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #34 on: January 09, 2011, 04:58 AM »
I watched an early test screencast that Wraith made and it went into some nice detail and examples that were cool. Wraith pointed out something that i thought was important.  He pointed out that if you select the same folder for both, and checked all items, it was possible to delete BOTH copies of a photo by accident.

I have a really simple solution suggestion:

Right before actually deleting, check that the original file partner of the pair still exists -- if not, skip the delete.

Went a different route. Tried a few things, but I think this works best.

The warning screen used to be this:

Screenshot - 2011-01-09 , 9_40_48 PM.png

And it was possible to delete them all.

The problem that was supposed to address was "deciding which was the original". It's a damned if you do, damned if you don't situation.

* Decide -- removes control from user
* Don't decide -- leaves room for user error

Behavior now is:

1) Warning/Notification (now waffling on this as to whether or not it is needed as it is purely informative now):

Screenshot - 2011-01-09 , 9_52_57 PM.png

2) Use only 1st image as "original" -- all successive photos are considered duplicates. e.g. The following images (1.jpg to 8.jpg are all identical):

Screenshot - 2011-01-09 , 9_53_26 PM.png


Anyways, it is no longer possible to delete originals as described above.

Version updated to 1.3 (other speed optimizations added).

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #35 on: January 09, 2011, 05:19 AM »
Thanks Renegade, for this improvement :up:
I'm always extra careful, even with lots of backups, with all my photo's, so I'm usually waiting for this kind of improvements before I even install tools like these.

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #36 on: January 09, 2011, 05:28 AM »
Thanks Renegade, for this improvement :up:
I'm always extra careful, even with lots of backups, with all my photo's, so I'm usually waiting for this kind of improvements before I even install tools like these.

I hear you. I'm paranoid about things being deleted and regularly have too many backups of some things.

In the past I've always leaned in favor of giving control to the user, but I think those days are pretty much done. Decisions are "hard", so just making the decision for the user just makes things "easy".
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #37 on: January 09, 2011, 05:35 AM »
Murphy's law is still actively working as it always has. 8)

cranioscopical

  • Friend of the Site
  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,776
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #38 on: January 09, 2011, 05:40 AM »
Anyways, it is no longer possible to delete originals as described above.
-Renegade

 :Thmbsup:

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,914
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #39 on: January 09, 2011, 11:30 AM »
yep, i think as long as you can no longer accidentally delete all copies, the particular solution isn't important.

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #40 on: January 14, 2011, 07:06 PM »
Well, I'm on to doing some threading stuff now to speed things up. (Got a few hours to burn today.)

On a funny note, my first attempt focused on threading for the simple comparison method, but it was simply too fast in parts and didn't work. In other words, time to do some significant refactoring. :)

Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

nharding

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 36
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #41 on: January 14, 2011, 09:47 PM »
When you do the check hashes for identical files, you can speed that up significantly by only calculation the hashes for duplicate file sizes. So if you have 8 files
a.jpg [102456 bytes] b.jpg [232583 bytes] c.jpg [104356 bytes] d.jpg [232483 bytes] e.jpg [102456 bytes] f.jpg [232583 bytes] g.jpg [38914 bytes] h.jpg [89583 bytes] then you only need to calculate hashes for the files that are 102456 or 232583 bytes long. This is what I do as part of my check all archives for duplicates in DCDisplay (only there, since I want to go via image data, I check if hash maps then I check number of pages, and average resolution)

Neil Harding

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #42 on: January 14, 2011, 10:55 PM »
When you do the check hashes for identical files, you can speed that up significantly by only calculation the hashes for duplicate file sizes. So if you have 8 files
a.jpg [102456 bytes] b.jpg [232583 bytes] c.jpg [104356 bytes] d.jpg [232483 bytes] e.jpg [102456 bytes] f.jpg [232583 bytes] g.jpg [38914 bytes] h.jpg [89583 bytes] then you only need to calculate hashes for the files that are 102456 or 232583 bytes long. This is what I do as part of my check all archives for duplicates in DCDisplay (only there, since I want to go via image data, I check if hash maps then I check number of pages, and average resolution)

Neil Harding


Thanks for the tip Neil.

At the moment, my logic for 2 methods are combined and need refactoring. One checks for file exactness while the other compares image data exactness. File size doesn't help for image data exactness, so at the moment that won't get done. I need to refactor things to make it cleaner. Threading forces that, so once I get that done, your tip will be an excellent optimization~! :)


Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,964
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #43 on: March 12, 2011, 03:36 PM »
I hope you wont be cursing me for these bug reports Renegade  :)

I was helping someone move their photos to a new computer today. Their older pc has a maze of duplicated folders and files - there seemed to be 10 copies of one batch of images in the one folder :tellme:
Everything was already copied onto an external drive.

So I installed your app on the new laptop:
Windows 7 64 bit; classic theme (I believe it's uptodate but dont know if SP1 installed)

I compared the pics on the external drive.

Some minor problems:

1) when I selected the folder in the lower pane, the selected folder in the upper pane was no longer readable - ie, text and highlighted background were both dark blue, if I then clicked on the folder in the upper half, it became readable - but the lower one was then unreadable.

2) when it was checking for duplicates, there was no indication it was working, until I actually clicked (anywhere) on the programme window - then the hour-glass showed.

3) eventually I had to compare the "my pictures" folder with itself. (I used the fast setting.) I'm not sure how many exactly, but there was over two thousand loose images in there and a few folders**. It showed the warning message, I clicked okay, but basically it seized up after that - when I clicked (anywhere) on the window it said no response (or whatever it says - "keine rückmeldung") in the titlebar. So eventually I closed it down - it did close normally, I didnt have to kill it. I tried a few times but it was definitely too much for it...


** I presume contents of subfolders are NOT looked into?
Tom

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #44 on: March 12, 2011, 07:06 PM »
I hope you wont be cursing me for these bug reports Renegade  :)

I was helping someone move their photos to a new computer today. Their older pc has a maze of duplicated folders and files - there seemed to be 10 copies of one batch of images in the one folder :tellme:
Everything was already copied onto an external drive.

So I installed your app on the new laptop:
Windows 7 64 bit; classic theme (I believe it's uptodate but dont know if SP1 installed)

I compared the pics on the external drive.

Some minor problems:

1) when I selected the folder in the lower pane, the selected folder in the upper pane was no longer readable - ie, text and highlighted background were both dark blue, if I then clicked on the folder in the upper half, it became readable - but the lower one was then unreadable.

2) when it was checking for duplicates, there was no indication it was working, until I actually clicked (anywhere) on the programme window - then the hour-glass showed.

3) eventually I had to compare the "my pictures" folder with itself. (I used the fast setting.) I'm not sure how many exactly, but there was over two thousand loose images in there and a few folders**. It showed the warning message, I clicked okay, but basically it seized up after that - when I clicked (anywhere) on the window it said no response (or whatever it says - "keine rückmeldung") in the titlebar. So eventually I closed it down - it did close normally, I didnt have to kill it. I tried a few times but it was definitely too much for it...


** I presume contents of subfolders are NOT looked into?

Cursing? Heck no! I'm glad you've told me~! :)


1) when I selected the folder in the lower pane, the selected folder in the upper pane was no longer readable - ie, text and highlighted background were both dark blue, if I then clicked on the folder in the upper half, it became readable - but the lower one was then unreadable.

Is the computer using a theme? Or is the default color scheme changed in Windows?

2) when it was checking for duplicates, there was no indication it was working, until I actually clicked (anywhere) on the programme window - then the hour-glass showed.

That shouldn't happen. The progress bars at the bottom should start.

3) eventually I had to compare the "my pictures" folder with itself. (I used the fast setting.) I'm not sure how many exactly, but there was over two thousand loose images in there and a few folders**. It showed the warning message, I clicked okay, but basically it seized up after that - when I clicked (anywhere) on the window it said no response (or whatever it says - "keine rückmeldung") in the titlebar. So eventually I closed it down - it did close normally, I didnt have to kill it. I tried a few times but it was definitely too much for it...

Can you let me know the computer specs? I was testing on folders of about 1,000 photos that were 4~5 GB, and it worked fine.

Also, can you let me know the rough sizes of the pictures?

Offhand, I don't know what the problem is. There's nothing special going on, and nothing really tricky that could "mess up".

I think it might be the folder browser control... It's based on the stock listview. I probably should replace that with a custom control I have that's designed for very large data sets. Still... 2000 isn't "that" many...

I have some work to get done here, so I'll look into it this evening if I get done in time, or tomorrow, and get back to you.

Thanks for letting me know.
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,964
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #45 on: March 13, 2011, 08:37 AM »
1) Classic theme - it was modified but just the grey and the titlebar colours modified (selection colour definitely not modified)
Have you tried it with default classic? - I can test it again next weekend**

2) no, didnt see progress bars working at any stage

3) computer was laptop without a lot of memory (by Win.7 standards) - 2GB I think. I'll have to ask them & get back to you with more details. (Scratch that**)


** It was someone elses machine, and they wouldnt be able to tell me any of this info - I probably wont see them till next weekend. I can check out the theme then in more detail as well.

Tom

Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #46 on: March 13, 2011, 09:33 AM »
I always try to stick to system colors, but need to check that. I'll check with classic as well though.

I didn't get to it today though.

The lack of progress bars is very odd though... I'll pursue that avenue.
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #47 on: November 14, 2011, 04:14 PM »
In case you don't already know, it looks like there's a similar program with the same name - Duplicate Photo Finder.  It looks to be US$49.90 at full price, but they make the price very hard to find (bottom of the Upgrade page).

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #48 on: June 18, 2012, 10:06 AM »
In case you don't already know, it looks like there's a similar program with the same name - Duplicate Photo Finder.  It looks to be US$49.90 at full price, but they make the price very hard to find (bottom of the Upgrade page).

-right now it is merely $10: http://www.duplicate...r.com/uninstall.html for 1 year's free upgrade (if any!), and $20 for 2 years. I tested it, and did not purchase a key! because the included "similarity finder" is too imaginative...


Renegade

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 13,291
  • Tell me something you don't know...
    • View Profile
    • Renegade Minds
    • Donate to Member
Re: NANY 2011 Release: Duplicate Photo Finder
« Reply #49 on: June 18, 2012, 02:57 PM »
In case you don't already know, it looks like there's a similar program with the same name - Duplicate Photo Finder.  It looks to be US$49.90 at full price, but they make the price very hard to find (bottom of the Upgrade page).

-right now it is merely $10: http://www.duplicate...r.com/uninstall.html for 1 year's free upgrade (if any!), and $20 for 2 years. I tested it, and did not purchase a key! because the included "similarity finder" is too imaginative...

I looked into that functionality before, and it's very difficult to get done right. At the end of the day, computers fail the Turing test. :(
Slow Down Music - Where I commit thought crimes...

Freedom is the right to be wrong, not the right to do wrong. - John Diefenbaker