DonationCoder.com Software > Coding Snacks
The Best Way to handle finding and removiong Duplicate Files
IainB:
NB: I have just modified the images in my last comment above, to make it clear that file size and contents (checksum) are being used for duplicate checking in the example given. I made the images a bit smaller (so no scrolling needed) and added some comments/arrows to them.
questorfla:
Well, up to 4GB in size you could use VBinDiff but HxD will handle files up to 8EB ... if you feel the need :D
Possibly a good start would be to generate hashes on each file, (SHA-1, etc), then you need only compare those with matching hashes - you could probably whip up a command file to do it ;)
EDIT: The inbuilt Windows fc command can apparently also handle files >4GB.
Do all these files resides in the same directory?
Maybe I'll look at a simple, (or not), command file.
-4wd (October 26, 2014, 12:14 AM)
--- End quote ---
I used to have a Duplicate remover for Photos that would have worked. It shows you all the various files and where they are along with dates and sizes etc. They you go to each group of duplicates and pick which one(s) to keep. I used this back in the Windows XP days do do not remember the name. But i am sure there are plenty out there like it. The preference would be to move all but the ones I pick to another folder maintaining the directory structure of where they were removed from.
Seems like it was a pretty fast program (for that point in time)
I can see i need to review your proffered code as well, 4WD :) It might be all i need.
questorfla:
Thanks IainB. ;) I am looking at your choice as well.
This is a HUGE folder with multiple subdirectories and levels. So it isn't the size of the file that matters so much during the scanning phase.
And to be honest, if it had the ability to be selective about the compare fields, i would first wan to look for files of the same name, date, and size.
This would at least allow me to clean out the "dead-wood". After that, I can get more selective on subsequent scans.
I need to get it down to a reasonable size to be used with File Locator Pro. Excellent program but it does not build any kind of Index on each run. This uses a lot of time on each subsequent run and it is a feature that they wlll be adding in the next version.
Stoic Joker:
If they ever do let you upgrade that server, Server 2012 has a built-in deduplication feature that produced very promising results (like a 60% size reduction) in the tests I ran.
This approach can be helpful when dealing with users that chronically like to squirrel away common files to their own little stash ... And then freak out when is can't be found. Otherwise 6 months after you spend all that time cleaning up their mess they'll just recreate it and run you out of space again.
4wd:
If you want to just match on name, date (which one?), and size, I can modify the command file easily enough. I'll look at it in a few hours.
Probably also get it to spit out a command file to do the moving of matching files.
Not far different from the original.
EDIT: New version, old one is still above.
Compares Size, Date, Name, and optionally compares binaries.
Generates a logfile that includes the commands to copy or move the duplicates to another directory using Robocopy, (duplicates folder tree). Files from the 2nd chosen directory are the sacrificial victims.
--- Code: Text ---@if (@CodeSection == @Batch) @then @echo off color 1a setlocal EnableExtensions setlocal DisableDelayedExpansion echo Select PRIMARY folder for /F "delims=" %%S in ('CScript //nologo //E:JScript "%~F0"') do ( set srce=%%S ) if "%srce%"=="" goto :End2 echo Selected folder: "%srce%" echo Select folder to COMPARE for /F "delims=" %%D in ('CScript //nologo //E:JScript "%~F0"') do ( set dest=%%D ) if "%dest%"=="" goto :End2 echo Selected folder: "%dest%" if "%srce%"=="%dest%" goto :End1 echo. echo The following prompt asks you for a folder to move duplicates to. echo. echo ---- HOWEVER, THIS COMMAND FILE DOES NOT MOVE THEM ---- echo. echo The path is written to the logfile so it can be run separately if echo required. echo. echo Select folder to MOVE duplicates to for /F "delims=" %%S in ('CScript //nologo //E:JScript "%~F0"') do ( set mdest=%%S ) if "%mdest%"=="" goto :End2 echo Selected folder: "%mdest%" set /P como=Copy or Move Duplicates (Set in logfile) [ENTER = Copy]: echo. echo If you just want to match on Size, Date, and Name just hit echo ENTER at the next prompt. echo If you want to do a binary compare also, enter any character. echo. set /P cbin=Do binary compare [ENTER = No]: :GetExt echo. set /P ext=Please enter extension [eg. jpg, * = ALL] ENTER to exit: if not defined ext goto :End2 set /a totfiles=0 set /a matchfiles=0 call :SetLogfile %~dp0% echo @echo off >"%logfile%" echo echo %date% %time% >>"%logfile%" for /r "%srce%" %%a in (*.%ext%) do (call :CheckSize "%dest%" %%~za "%%~fa") :End echo Total files: %totfiles% echo Matching files: %matchfiles% if %matchfiles% equ 0 (echo Matching files: 0 >>"%logfile%") set totfiles= set matchfiles= set ext= if exist "%logfile%" call :ViewLog goto :GetExt :End1 color 1c echo **** SOURCE and DESTINATION are the same! **** :End2 set srce= set dest= pause rem exit :SetLogfile set "logfile=%~1FCompare-%time:~0,2%%time:~3,2%%time:~6,2%.cmd" goto :EOF :ViewLog set /p view=View logfile [y,n] if "%view%"=="y" (start notepad.exe "%logfile%") set view= goto :EOF :CheckSize set /a totfiles+=1 for /r %1 %%b in (*.%ext%) do ( if %2==%%~zb ( echo. echo Comparing: "%~3" "%%~b" echo Sizes: Match call :CheckDate "%~3" "%%~b" ) ) goto :EOF :CheckDate if "%~t1" equ "%~t2" ( echo Dates: Match call :CheckName "%~1" "%~2" ) goto :EOF :CheckName if "%~nx1" equ "%~nx2" ( echo Names: Match if not defined cbin ( call :Matching "%~1" "%~2" ) else ( call :CheckBin "%~1" "%~2" ) ) goto :EOF :CheckBin .\cmp.exe -s "%~1" "%~2" if errorlevel 0 ( echo Binaries: Match call :Matching "%~1" "%~2" ) goto :EOF :Matching set /a matchfiles+=1 echo echo "%~1" matches "%~2" >>"%logfile%" set tdir=%~dp2 set tdir=%tdir:~0,-1% set tdir2=%tdir:~2% if not defined como ( echo robocopy "%tdir%" "%mdest%%tdir2%" "%~nx2" >>"%logfile%" ) else ( echo robocopy "%tdir%" "%mdest%%tdir2%" "%~nx2" /MOV >>"%logfile%" ) echo. goto :EOF endlocal End of Batch section@end // JScript section // Creates a dialog box that enables the user to select a folder and display it.var title = "Select a folder", rootFolder = 0x11;var shl = new ActiveXObject("Shell.Application");var folder = shl.BrowseForFolder(0, title, 0, rootFolder);WScript.Stdout.WriteLine(folder ? folder.self.path : "");
Sample output with Move option.
--- Code: Text ---@echo off echo Wed 29/10/2014 12:53:49.64 echo "R:\test\Root\SimpleBackup.cmd" matches "D:\Root\SimpleBackup.cmd" robocopy "D:\Root" "D:\test\Root" "SimpleBackup.cmd" /MOV echo "R:\test\Root\fred h\1234\Hunters & Collecters - Holy Grail.mp3" matches "D:\Root\fred h\1234\Hunters & Collecters - Holy Grail.mp3" robocopy "D:\Root\fred h\1234" "D:\test\Root\fred h\1234" "Hunters & Collecters - Holy Grail.mp3" /MOV echo "R:\test\Root\fred h\34\cache\Don McLean - American Pie.mp3" matches "D:\Root\fred h\34\cache\Don McLean - American Pie.mp3" robocopy "D:\Root\fred h\34\cache" "D:\test\Root\fred h\34\cache" "Don McLean - American Pie.mp3" /MOV
NOTE:
If you want another free program, I heartily recommend Duplicate File Finder by Rashid Hoda - it has best layout for a dupe checker I have ever seen, all the options are right in front of you without having to screw around in menus.
Hasn't been updated in years but the only problem I've found with it is if you have a rather large number of files to check, eg. 40k+
The Best Way to handle finding and removiong Duplicate Files
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version