Another day, another Wild Goose Chase.
Today I was asked to create the above from a folder containing about 50,000 files. Many of those files are probably duplicates but I cannot just delete them even when I find them as they are the resource pool for a DB. I just need to find out how many of them are actually in there twice or even three or four times and create a list showing the paths to all of them.
Someone else (TG!) not me .. is going to have to figure out which ones to keep and which not.
But I have to create this list showing the full path to all the files that exist in more than one location but are really duplicates.
In the Best of Worlds, this would be done using a hash of the document and not just the filename because it is entirely possible, even likely, that the same file was entered by two different people at different places. Yet that file would still be a duplicate meaning that the BB links should show a single location for both entries instead of having the file in there twice.
I have been testing a few duplicate finders but most of them are a bit over zealous and want to help get rid of the problem. In this one case, I only need to create a file showing the path listings to all the duplicates but Do Nothing. Just make the list.
I wondered if anyone had any suggestions of a particular tool to use for that. Hopefully something you have used before for a similar task?