ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Coding Snacks

The Best Way to handle finding and removiong Duplicate Files

<< < (2/6) > >>

IainB:
NB: I have just modified the images in my last comment above, to make it clear that file size and contents (checksum) are being used for duplicate checking in the example given. I made the images a bit smaller (so no scrolling needed) and added some comments/arrows to them.

questorfla:
Well, up to 4GB in size you could use VBinDiff but HxD will handle files up to 8EB ... if you feel the need  :D

Possibly a good start would be to generate hashes on each file, (SHA-1, etc), then you need only compare those with matching hashes - you could probably whip up a command file to do it  ;)

EDIT: The inbuilt Windows fc command can apparently also handle files >4GB.

Do all these files resides in the same directory?

Maybe I'll look at a simple, (or not), command file.
-4wd (October 26, 2014, 12:14 AM)
--- End quote ---

I used to have a Duplicate remover for Photos that would have worked.  It shows you all the various files and where they are along with dates and sizes etc.  They you go to each group of duplicates and pick which one(s) to keep.  I used this back in the Windows XP days do do not remember the name.  But i am sure there are plenty out there like it.  The preference would be to move all but the ones I pick to another folder maintaining the directory structure of where they were removed from.
Seems like it was a pretty fast program (for that point in time)

I can see i need to review your proffered code as well, 4WD  :)  It might be all i need.

questorfla:
Thanks IainB.     ;) I am looking at your choice as well.
This is a HUGE folder with multiple subdirectories and levels. So it isn't the size of the file that matters so much during the scanning phase.
And to be honest, if it had the ability to be selective about the compare fields, i would first wan to look for files of the same name, date, and size. 
This would at least allow me to clean out the "dead-wood".  After that, I can get more selective on subsequent scans.
I need to get it down to a reasonable size to be used with File Locator Pro.  Excellent program but it does not build any kind of Index on each run.  This uses a lot of time on each subsequent run and it is a feature that they wlll be adding in the next version.

Stoic Joker:
If they ever do let you upgrade that server, Server 2012 has a built-in deduplication feature that produced very promising results (like a 60% size reduction) in the tests I ran.

This approach can be helpful when dealing with users that chronically like to squirrel away common files to their own little stash ... And then freak out when is can't be found. Otherwise 6 months after you spend all that time cleaning up their mess they'll just recreate it and run you out of space again.

4wd:
If you want to just match on name, date (which one?), and size, I can modify the command file easily enough.  I'll look at it in a few hours.

Probably also get it to spit out a command file to do the moving of matching files.

Not far different from the original.

EDIT: New version, old one is still above.

Compares Size, Date, Name, and optionally compares binaries.
Generates a logfile that includes the commands to copy or move the duplicates to another directory using Robocopy, (duplicates folder tree).  Files from the 2nd chosen directory are the sacrificial victims.


--- Code: Text ---@if (@CodeSection == @Batch) @then     @echo off    color 1a    setlocal EnableExtensions    setlocal DisableDelayedExpansion        echo Select PRIMARY folder    for /F "delims=" %%S in ('CScript //nologo //E:JScript "%~F0"') do (        set srce=%%S    )    if "%srce%"=="" goto :End2    echo Selected folder: "%srce%"        echo Select folder to COMPARE    for /F "delims=" %%D in ('CScript //nologo //E:JScript "%~F0"') do (       set dest=%%D    )    if "%dest%"=="" goto :End2        echo Selected folder: "%dest%"        if "%srce%"=="%dest%" goto :End1        echo.    echo The following prompt asks you for a folder to move duplicates to.    echo.    echo ---- HOWEVER, THIS COMMAND FILE DOES NOT MOVE THEM ----    echo.    echo The path is written to the logfile so it can be run separately if    echo required.    echo.    echo Select folder to MOVE duplicates to    for /F "delims=" %%S in ('CScript //nologo //E:JScript "%~F0"') do (        set mdest=%%S    )    if "%mdest%"=="" goto :End2    echo Selected folder: "%mdest%"     set /P como=Copy or Move Duplicates (Set in logfile) [ENTER = Copy]:      echo.    echo If you just want to match on Size, Date, and Name just hit    echo ENTER at the next prompt.    echo If you want to do a binary compare also, enter any character.    echo.    set /P cbin=Do binary compare [ENTER = No]:             :GetExt    echo.    set /P ext=Please enter extension [eg. jpg, * = ALL] ENTER to exit:     if not defined ext goto :End2    set /a totfiles=0    set /a matchfiles=0    call :SetLogfile %~dp0%        echo @echo off >"%logfile%"    echo echo %date%  %time% >>"%logfile%"     for /r "%srce%" %%a in (*.%ext%) do (call :CheckSize "%dest%" %%~za "%%~fa")     :End    echo Total files:    %totfiles%    echo Matching files: %matchfiles%    if %matchfiles% equ 0 (echo Matching files: 0 >>"%logfile%")    set totfiles=    set matchfiles=    set ext=    if exist "%logfile%" call :ViewLog    goto :GetExt    :End1    color 1c    echo **** SOURCE and DESTINATION are the same! ****    :End2    set srce=    set dest=    pause    rem exit     :SetLogfile    set "logfile=%~1FCompare-%time:~0,2%%time:~3,2%%time:~6,2%.cmd"    goto :EOF     :ViewLog    set /p view=View logfile [y,n]    if "%view%"=="y" (start notepad.exe "%logfile%")    set view=    goto :EOF     :CheckSize    set /a totfiles+=1    for /r %1 %%b in (*.%ext%) do (        if %2==%%~zb (            echo.            echo Comparing: "%~3" "%%~b"            echo Sizes: Match            call :CheckDate "%~3" "%%~b"        )    )    goto :EOF     :CheckDate    if "%~t1" equ "%~t2" (        echo Dates: Match        call :CheckName "%~1" "%~2"    )    goto :EOF     :CheckName    if "%~nx1" equ "%~nx2" (        echo Names: Match        if not defined cbin (            call :Matching "%~1" "%~2"        ) else (            call :CheckBin "%~1" "%~2"        )    )    goto :EOF     :CheckBin    .\cmp.exe -s "%~1" "%~2"    if errorlevel 0 (        echo Binaries: Match        call :Matching "%~1" "%~2"    )    goto :EOF     :Matching    set /a matchfiles+=1    echo echo "%~1" matches "%~2" >>"%logfile%"    set tdir=%~dp2    set tdir=%tdir:~0,-1%    set tdir2=%tdir:~2%    if not defined como (        echo robocopy "%tdir%" "%mdest%%tdir2%" "%~nx2" >>"%logfile%"    ) else (        echo robocopy "%tdir%" "%mdest%%tdir2%" "%~nx2" /MOV >>"%logfile%"    )    echo.    goto :EOF     endlocal     End of Batch section@end  // JScript section // Creates a dialog box that enables the user to select a folder and display it.var title = "Select a folder", rootFolder = 0x11;var shl = new ActiveXObject("Shell.Application");var folder = shl.BrowseForFolder(0, title, 0, rootFolder);WScript.Stdout.WriteLine(folder ? folder.self.path : "");
Sample output with Move option.

--- Code: Text ---@echo off echo Wed 29/10/2014  12:53:49.64 echo "R:\test\Root\SimpleBackup.cmd" matches "D:\Root\SimpleBackup.cmd" robocopy "D:\Root" "D:\test\Root" "SimpleBackup.cmd" /MOV echo "R:\test\Root\fred h\1234\Hunters & Collecters - Holy Grail.mp3" matches "D:\Root\fred h\1234\Hunters & Collecters - Holy Grail.mp3" robocopy "D:\Root\fred h\1234" "D:\test\Root\fred h\1234" "Hunters & Collecters - Holy Grail.mp3" /MOV echo "R:\test\Root\fred h\34\cache\Don McLean - American Pie.mp3" matches "D:\Root\fred h\34\cache\Don McLean - American Pie.mp3" robocopy "D:\Root\fred h\34\cache" "D:\test\Root\fred h\34\cache" "Don McLean - American Pie.mp3" /MOV

NOTE:
If you want another free program, I heartily recommend Duplicate File Finder by Rashid Hoda - it has best layout for a dupe checker I have ever seen, all the options are right in front of you without having to screw around in menus.

Hasn't been updated in years but the only problem I've found with it is if you have a rather large number of files to check, eg. 40k+

The Best Way to handle finding and removiong Duplicate Files

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version