topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 5:07 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: Find and remove duplicate files  (Read 45780 times)

Kristian_CE

  • Participant
  • Joined in 2008
  • *
  • Posts: 7
    • View Profile
    • Donate to Member
Find and remove duplicate files
« on: April 07, 2008, 08:52 AM »
PilotMan software has released Clone Tools, a new windows program that helps you find, remove and organize both duplicated files and folders with similar or identical content.

When working with images, mp3:s or other documents, making backups, or just accessing the internet it is easy to accumulate a lot of files that are in fact clones, they have the exact same contents. It is near impossible to find these files manually. Clone Tools helps you find and remove these duplicate files

Finding the duplicate files is often not enough. Recognizing which files to keep and which to remove is the hard part. Clone Tools helps you take a step back, explore and compare the folders that contain the duplicate files. This gives a better overview than other programs that just lists the duplicate files; and makes it easier to make decisions about which files to save and which to remove. The program also have automatic functions that removes all duplicates from a folder or merges two folders, leaving only one folder with no duplicates in it.

Clone Tools Offers a New Perspective
Clone Tools have a unique way of viewing and exploring duplicate files. The traditional way duplicate file finders presents the results is focused on the files themselves, this makes it hard to get an overview of  the file structures. Clone Tools takes a different approach by focusing on the folders instead, you can immediately see which folders have similar contents. The program lets you explore and work with the contents of these folders in a side by side browser. This browser displays all files and folders in two folders, side by side; this gives you an immediate overview the files of the folders and you can easily see which files have duplicates in the other folder. This way of viewing the files in the context of the other files in the folder gives you better control and it is easier to make informed decisions about what to do with the files and the folders that contain duplicates, you can delete them, move them or simply leave them.



Find and Remove Duplicate Files Safe
When using a program to find and remove files it is very important that it is safe to use, you do not want to accidentally delete important files. Clone Tools uses a byte by byte scan on the files to determine if they are duplicates. This makes sure that the files are 100% identical. Files that are not duplicates can not be deleted by Clone Tools; you can never accidentally remove the last copy of a file. The deleted files will be placed in the recycle bin; if you change your mind you can recover the files.

Further information and a free trial download is available at http://www.pilotman.com

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Find and remove duplicate files
« Reply #1 on: April 07, 2008, 09:16 AM »
Clone Tools uses a byte by byte scan on the files to determine if they are duplicates.
Does this mean you keep doing byte-by-byte compare of one file to the "suspected clones", or are you doing the smart thing and comparing MD5/SHA/... hashes?
- carpe noctem

Kristian_CE

  • Participant
  • Joined in 2008
  • *
  • Posts: 7
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #2 on: April 07, 2008, 09:40 AM »
The clone scan is performed in several steps. First there is a preliminary scan based on hash values and only the suspected duplicate files that passes this scan are compared in a complete byte by byte scan. Also if there are more than one copy of a file, lets say there are three identical files A, B and C then A is compared to B and then B to C but A is not compared to C since we already know that it is identical to B and thus also to C.

I would say that the scan algorithm is very efficient and also 100% secure as all clones are verified in a byte-by-byte scan.

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Find and remove duplicate files
« Reply #3 on: April 07, 2008, 09:47 AM »
It seems a bit wasteful doing byte-by-byte checking if you're using a secure hash - after all, the cryptographic hashes are designed so a single bit difference should yield a large hash difference... but I guess byte-by-byte is an OK option for the paranoid people ;)

I assume you start off by only considering files with the same size, not blindly calculating hashes for every file? :)
- carpe noctem

Kristian_CE

  • Participant
  • Joined in 2008
  • *
  • Posts: 7
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #4 on: April 07, 2008, 10:07 AM »
Yes Clone Tools only compares files that have the same size.

A complete byte by byte scan is the only 100% secure method. Even if it is very unlikely that a good hash algorithm would produce the same hash for two different files it would not be 100% accurate but rather 99.999999999% or so.

Another feature of clone tools is that it is possible to start working with the scan result before the complete byte by byte scan is ready while the scan keeps going in the background (with the smallest files being compared first). This means that you can start cleaning your hard drive withour having to wait for Clone Tools comparing your dvd files etc. It is however not possible to delete any files that have not been verified by the byte by byte scan.

Dormouse

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,952
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #5 on: April 07, 2008, 10:17 AM »
I (very) vaguely realise that I might benefit from a bit of this sort of cleaning up.

But I'm not sure what to look at when making a choice between Clone Tools and other options. If we take Double Killer Pro, for example, which has been well recommended here, what advantages and disadvantages does Clone Tools have in comparison?

Quite important as the base price of Double Killer Pro is half that of Clone Tools and it has also recently been available here with a 30% discount.

mediaguycouk

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 247
    • View Profile
    • Mediaguy
    • Donate to Member
Re: Find and remove duplicate files
« Reply #6 on: April 07, 2008, 10:19 AM »
This does look pretty useful to be honest. At work we have a media server that I'm sure people pass around video files on, but instead of keeping a copy in a shared space they will have a 2GB copy of top gear (or something equally stupid) in each of their filestores.
Learning C# - Graham Robinson

Kristian_CE

  • Participant
  • Joined in 2008
  • *
  • Posts: 7
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #7 on: April 07, 2008, 10:37 AM »
The largest benefit of removing duplicate files is that your hard drive gets more organized another is that it saves space. Many people (me included) are not always careful enough when working with photos and other files and it is easy to end up with a total mess. A good first step when trying to solve this mess is to ensure that there are no duplicate files.

The killer feature of Clone Tools is that it works with folders containing duplicate files rather then the duplicate files themself. The folders are then compared and deduped in a side by side browser that basically works pretty much the same as similar browsers seen in synchronization programs. To me the problem with other duplicate file finders is that when I scan for example my "my documents" folder I end up with a list of 10.000+ clone files and it is very time consuming to try to decide for each file whether I want to keep it or not. With Clone Tools this work is much easier since the clones are presented in a the side by side browser which gives a great overview of where the clone are located. Also there are tools to quickly merge the content of one folder to another or clean all shared duplicate content between two folders.

I know that many other programs have tried to solve the problem of working with many duplicates by automatic functions that for example only saves the latest edited copy of each duplicate, but to me it seems kind of risky to use such operations without my control on a huge list of files.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: Find and remove duplicate files
« Reply #8 on: April 07, 2008, 10:39 AM »
The killer feature of Clone Tools is that it works with folders containing duplicate files rather then the duplicate files themself. The folders are then compared and deduped in a side by side browser that basically works pretty much the same as similar browsers seen in synchronization programs.

That's actually very clever and I can see how that would be useful.

suleika

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 117
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Find and remove duplicate files
« Reply #9 on: April 07, 2008, 12:12 PM »

To me the problem with other duplicate file finders is that when I scan for example my "my documents" folder I end up with a list of 10.000+ clone files and it is very time consuming to try to decide for each file whether I want to keep it or not. With Clone Tools this work is much easier since the clones are presented in a the side by side browser which gives a great overview of where the clone are located. Also there are tools to quickly merge the content of one folder to another or clean all shared duplicate content between two folders.

This would make all the difference to me.  I've used duplicate finders from time to time and been frustrated by that very problem.

mwb1100

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,645
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #10 on: April 07, 2008, 02:18 PM »
Quite important as the base price of Double Killer Pro is half that of Clone Tools and it has also recently been available here with a 30% discount.

Something to point out that is important for me (and I'm sure others on this forum):  Double Killer Pro is licensed per-computer.  Clone Tools is licensed per-user, so I can legally use it to clean up all my machines with a single license.

The Double Killer Pro per-machine license restriction was a deal killer for me.  Also, Double Killer Pro is same price as Clone Tools if you use it for non-personal (ie., business or gov't) use.

Still, I hope PilotMan takes the hint about a DC discount...

Darwin

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,984
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #11 on: April 07, 2008, 05:15 PM »
I, too, am intrigued by the folder to folder comparison - I have a hard time deciding what files to keep and what to delete and make my decision by referencing the files locations on my harddrive... This would save some work!

Looks interesting  :Thmbsup:

f0dder

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 9,153
  • [Well, THAT escalated quickly!]
    • View Profile
    • f0dder's place
    • Read more about this member.
    • Donate to Member
Re: Find and remove duplicate files
« Reply #12 on: April 07, 2008, 06:28 PM »
Thanks for answering my questions :) - I still think byte-by-byte is (a bit too) paranoid and should be left as a togglable option, but I value your safety concerns - very nice that you can't delete the last copy of a file etc.
- carpe noctem

kartal

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 1,529
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #13 on: April 07, 2008, 08:02 PM »
what are the limitations of trial version?

Kristian_CE

  • Participant
  • Joined in 2008
  • *
  • Posts: 7
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #14 on: April 08, 2008, 02:40 AM »
In the trial version you are only allowed to delete 100 files per scan.

I must say that I am impressed by the response in this forum.  :Thmbsup: We are also always interested in user input so if anyone decides to test the program then we would be very happy if you told us about your experiences.

mediaguycouk

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 247
    • View Profile
    • Mediaguy
    • Donate to Member
Re: Find and remove duplicate files
« Reply #15 on: April 08, 2008, 03:11 AM »
It seems good but the byte by byte scan is painfully slow on our wav files on the media server. I was hoping for a bit more speed when you said it takes an educated guess when showing you the results and does the rest in the background.

I'm sure if you are scanning for word documents and 3MB mp3 files it is a lot better.
Learning C# - Graham Robinson

mediaguycouk

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 247
    • View Profile
    • Mediaguy
    • Donate to Member
Re: Find and remove duplicate files
« Reply #16 on: April 08, 2008, 03:38 AM »
Something that I definitely think is missing is a line that says 'If you deleted all cloned files you would save ### space'. The trial scan I did was really just for statistical purposes (so I can tell staff off) but I can see the number of clones

Folder size: 96GB
Files: 6184
Clones: 4191

but I can't see how much space 6184-4191 is.
Learning C# - Graham Robinson

Kristian_CE

  • Participant
  • Joined in 2008
  • *
  • Posts: 7
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #17 on: April 08, 2008, 03:48 AM »
As you guess the speed problem probably comes from the large file size combined with the fact that it is a media server (on a network?).
Our criteria on which files the program will scan in the background also involves the file creation date. So when the files have different creation dates they will be scanned before we present the result. The reason is that our tests has shown that the miss ratio of our guess was too high when we only included the checksum scan without regard to the file creation date. Many files have the same size and almost the same data, for example image files with only small differences or word documents with only one spelling error ("adn" to "and" for example) that has been corrected.

Including the size of all duplicate files is a good idea. We will try to add it in the next version.

nosh

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,441
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #18 on: April 08, 2008, 09:59 AM »
I tried this program because it takes a unique approach and I have some issues with it.

 - Filters don't function 100%
From the help file "The clones list to the right of the result list shows all folders that shares content (clone files or folders containing shared clones) with the folder selected in the result list."
I've filtered out *.ini and yet when I select a folder in the result list that has only 6 clones, I see a whole bunch of folders (totaling to over 100 clones) in the clones list - I'm assuming this is because of all the desktop.ini dupes which haven't been filtered. The filtered files also show in the bottom pane.

 - I selected show only clone files but the right hand bottom pane still shows all files, this is a nightmare situation if you have hundreds or thousands of files in that folder.

 - The bottom pane columns do not sort as one would expect them to when the column headers are clicked.

I didn't find the folder based approach particularly useful either, just the opposite, infact. After getting really frustrated with it I fired up Easy Duplicate Finder, a freeware alternative - and it did the scan in a fraction of the time and gave me the results in a really intuitive format - the column sorting works - I can sort by folder (it shows paths) and click "select all dupes in this folder", which is a way more functional folder-based approach.

I'm sorry for being so harsh but IMO, Clone Tools is confusing, frustrating and, at $40 in a category where there are really decent freeware alternatives available, over-priced. Maybe I slipped up somewhere and could not fully exploit the functionality but I did spend time reading the tutorial, I'm sure you won't hesitate in setting me right if that's the case.  ;)
« Last Edit: April 08, 2008, 10:01 AM by nosh »

Kristian_CE

  • Participant
  • Joined in 2008
  • *
  • Posts: 7
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #19 on: April 08, 2008, 11:19 AM »
Even if it feels better to read positiv reviews than negaitve the latter ones are perhaps more valuable as it enables us to improve our software.

<<
 - Filters don't function 100%
From the help file "The clones list to the right of the result list shows all folders that shares content (clone files or folders containing shared clones) with the folder selected in the result list."
I've filtered out *.ini and yet when I select a folder in the result list that has only 6 clones, I see a whole bunch of folders (totaling to over 100 clones) in the clones list - I'm assuming this is because of all the desktop.ini dupes which haven't been filtered. The filtered files also show in the bottom pane.
>>

I am not sure about what the problem with the filter might be but I do believe that it is not a bug but rather that you do not like the behaviour of the filter. The idea is that the filter only excludes folders and files from the result list. So when you exclude *.ini from your result you will not see any folders containing only *.ini duplicates and no other duplicates.  If you have a folder with both *.ini duplicates and other duplicates you will see it and when you select it you will also see all related folders including those that are related through *.ini duplicates.

The reason is that we think it is misleading to show the folder as having less related folders and content than it really has. We also thought that it might be confusing if we had excluded file types from the twin browser as it might seem odd that folders that looks like they have the exact same content in reality did not have that as they had differences in files that where not visible due to the filter.

I can however see your problem too. We will try to get some more user input and see they think this is a good approach or if we should exclude the folders from the "Clones List" and the twin browser as well.

<<
 - I selected show only clone files but the right hand bottom pane still shows all files, this is a nightmare situation if you have hundreds or thousands of files in that folder.
>>

To me it seems like it works but if not I am interested in solving this problem.

<<
 - The bottom pane columns do not sort as one would expect them to when the column headers are clicked.
>>

Nope there is no sorting of the twin browser. It is on the wish list for a future release.

<<
I didn't find the folder based approach particularly useful either, just the opposite, infact. After getting really frustrated with it I fired up Easy Duplicate Finder, a freeware alternative - and it did the scan in a fraction of the time and gave me the results in a really intuitive format - the column sorting works - I can sort by folder (it shows paths) and click "select all dupes in this folder", which is a way more functional folder-based approach.

I'm sorry for being so harsh but IMO, Clone Tools is confusing, frustrating and, at $40 in a category where there are really decent freeware alternatives available, over-priced. Maybe I slipped up somewhere and could not fully exploit the functionality but I did spend time reading the tutorial, I'm sure you won't hesitate in setting me right if that's the case.  Wink
>>

If you like the folder based approach or not is perhaps a matter of taste and also to some extent on what kind of duplicate problem you have, ie only some duplicated files here and there or entire duplicated folder structures. Obviousy a program like Clone Tools requires a bit more of a learning effort since working with folders is more complicated than working with files only. Operating an excavator takes some more time to learn than  than operating a spade but once you get the hang of it it enables you to dig faster (depending on the size of the hole). :)

The corresponding operation to "select all dupes in this folder" would be the "Delete all shared clones" button although it is not exactly the same since it only deletes the shared content with the selected folder in the result list. The fastest way to delete clones is to  locate your "master folder" that you want to keep and select that one in the result list (you can unse the funny looking button with a folder and a pair of glasses for that). Then either delete all shared clones in the other folders with shared content or merge them with the master folder using the buttons above the "Clones List". If you want to keep a folder still containing duplicate files then use your right mouse button and select "Ignore this file/folder in the result list" so that you dont have to see it in the result list any more.

Regarding the scan speed, my guess is that the scan with Easy Duplicate Finder was much faster since you had already scanned the folder with Clone Tools and thus put the content in the RAM cache. Try them the other way around.

Today we have added a video tutorial to our site that might be of help.

Thank you for your input and taking time to test our program.

nosh

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,441
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #20 on: April 08, 2008, 11:46 AM »
Kristian, I admire your positive positive attitude to my somewhat harsh take on things.  :up:

I suggest you give the UI elements as many useful popup tips/hints as you can in subsequent releases, if done right it could  make the program a lot more friendlier for people evaluating it. Advanced users could turn these off.

Good luck with your software. :)   
« Last Edit: April 08, 2008, 11:49 AM by nosh »

Cleaner007

  • Guest
Re: Find and remove duplicate files
« Reply #21 on: July 06, 2008, 10:54 AM »
There is very good program on removal of duplicate files named Clone Remover. The program removes all picture and musical clones.

jgpaiva

  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 4,727
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #22 on: October 08, 2008, 05:04 AM »
Cleaner007 was spamming and has been banned, see more in this thread.

Davidtheo

  • Participant
  • Joined in 2008
  • *
  • Posts: 119
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #23 on: October 08, 2008, 09:27 PM »
How does your program do with files that are not in English?, I have a lot of files then have files names in Chinese and are written in Chinese.

(Edited wording)

kartal

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 1,529
    • View Profile
    • Donate to Member
Re: Find and remove duplicate files
« Reply #24 on: October 31, 2008, 09:45 PM »
I am looking for a program that can remove duplicate shortcuts. any ideas?