Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • September 01, 2015, 09:27:51 PM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: DONE: Are any of the files missing???  (Read 8042 times)

nkormanik

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 299
    • View Profile
    • Donate to Member
DONE: Are any of the files missing???
« on: June 25, 2012, 11:47:53 AM »

Perhaps not a request for a coding snack, but possibly.

There are existing programs which probably suffice.

But I'd like advice on which you would use, given the following circumstance:

Given:

--- One complete set of 'dummy' files, held within subdirectories of a primary folder:

c:\dummy set of images\subset 1\...
c:\dummy set of images\subset 2\...
c:\dummy set of images\subset 3\...
c:\dummy set of images\subset 4\...
etc.

--- One set of actual files, but not complete..., certain ones are missing:

c:\real set of images\subset 1\... (various ones missing)
c:\real set of images\subset 2\... (various ones missing)
c:\real set of images\subset 3\... (various ones missing)
c:\real set of images\subset 4\... (various ones missing)
etc.


--- All files in "Group Dummy" have unique file names.
--- "Group Real" file names correspond to Group Dummy names.
--- If all is perfect, there should be TWO of each file in the combined lot.
--- If a file is missing (from "Group Real"), there will only be ONE of that file in the combined lot.


Task:  Find out which files are missing in "Group Real."


Thanks.

Nicholas Kormanik
nkormanik@gmail.com


rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 2,856
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #1 on: June 25, 2012, 03:12:24 PM »
er, can't you just DIR /B in dummy and real folders, and compare them with a diff program?

Carol Haynes

  • Waffles for England (patent pending)
  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 7,969
    • View Profile
    • Dales Computer Services
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #2 on: June 25, 2012, 03:21:34 PM »
How about Beyond Compare - shows you not only files in one folder and not the other but also files that have contents that don't match. See: http://www.scootersoftware.com/

I haven't tried it but this page may be worth a look too: http://www.techsuppo...mparison-utility.htm

nkormanik

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 299
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #3 on: June 25, 2012, 03:35:17 PM »

rjbull, DIR /B /S is good.  That can get us a list of files of both folders.  Then strip away path.  Once the two lists are in hand, what diff program would you use?

Carol, remember the files are buried within subdirectories.  Most such programs as you mention want all files to be one level deep, no further.


Carol Haynes

  • Waffles for England (patent pending)
  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 7,969
    • View Profile
    • Dales Computer Services
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #4 on: June 25, 2012, 04:52:15 PM »
Beyond Compare works its way through trees (and even inside archives) very well. Ok it isn't free but it is very good - and you can use it from Explorer with a right click.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,973
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #5 on: June 25, 2012, 07:03:16 PM »
Using a bit of Google-fu:

This batch file will get you a list of files sans paths, (rename output file before running a second time):

Code: Text
  1. REM listfile.bat
  2. @echo off
  3. for /r %1 %%g in (*) do echo %%~nxg>>list.txt
  4. sort list.txt /O sorted.txt
  5. del list.txt
  6.  

eg. listfile.bat c:\ will give you a file called sorted.txt containing a list of every file on C: drive

WinMerge, (free), will let you compare two text files for differences.

BTW, WinMerge will also compare folders but my way added a bit of scripting  :P

I suppose a simple GUI frontend could be made using Auto(It|HK) to automate all the steps, ie. pick two folders, hit Go.
« Last Edit: June 25, 2012, 09:31:27 PM by 4wd, Reason: Add the delete line else it will just add to existing file. »

nkormanik

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 299
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #6 on: June 25, 2012, 11:22:15 PM »

4wd, I, for one, would appreciate the GUI front end.

But even if you don't get the chance, thanks for the script above.


4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,973
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #7 on: June 26, 2012, 06:22:20 AM »
TCBOO - output will open in your default text editor, where you can view or save it.

TCBOO = There Can Be Only One   :) (In honour of TaoPhoenix)

2012-07-02_22-04-17.jpg

Only files that exist once will be listed.

The interface is self-explanatory, tool tips for everything of interest.

Caveats:
  • It can, in theory, only handle about 16 million files but I still wouldn't like to wait for the output.  Technical reason: Array limits.
  • It uses strings for the comparison, maximum string length is ~2 Giga-characters - which is a lot.  However, if you're are doing a lot of really deep nested files, then this limit may be exceeded but it would really have to be a lot.
  • Source code is messy and uncommented but hey, that's the way I like it.  Technical reason: It keeps my brain active.

Update (starting at v0.4 which was a fairly large rewrite):
  • Comparison speed has increased due to skwire pointing out a rather simpler way of doing things - thanks skwire!
  • You can optionally exclude the folder tree, (relative to initial path), from the comparison
  • Filters and Folders are written to an ini file, (same name as executable), when you exit
  • It works for network shares, (ie. really hacky mod that stopped it is no more)

Update (v0.5)
  • Uses a much faster sort routine.

Differences from previous version:
  • It only looks for and removes from the output, files that occur in multiples of 2 - I had over-engineered the previous versions to remove any file that occurred 2 or more times, as skwire kindly pointed out  :-[
    eg. So if a file exists 3 times, it will appear once in the output.  If it exists 4 times, it won't appear in the output. (Dependant on Different Tree setting)

How much faster is it from the previous version?
On my computer, (3.3GHz x6):
Previous version: 54548 filenames in 281 seconds, (HDD uncached read)
Version 0.4:        54550 filenames in 5 seconds, (HDD cached read)
Version 0.4:        253350 filenames in 63 seconds, (SSD uncached read)
Version 0.4:        253350 filenames in 28 seconds, (SSD cached read)

Most of the time is spent getting the file lists as from v0.5.
« Last Edit: July 24, 2012, 02:03:57 AM by 4wd »

nkormanik

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 299
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #8 on: June 26, 2012, 11:59:34 AM »

Thanks for working on TIOOO, 4wd.

Would be a plus for the program to give some idea it's still chugging along, especially on large comparison jobs.  In present test the progress bar has already gone from left to right, but job appears incomplete.  I'll keep letting it run for... a week.

If job completes successfully, will a text file list of the one-only files appear in the TIOOO directory?

Thanks much!

=====


Okay, list popped up in my default editor.  Worked like a charm!


« Last Edit: June 26, 2012, 12:45:31 PM by nkormanik »

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 2,856
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #9 on: June 26, 2012, 03:19:23 PM »
rjbull, DIR /B /S is good.  That can get us a list of files of both folders.  Then strip away path.  Once the two lists are in hand, what diff program would you use?
I realise you're sorted now, but for the record, I'd most probably use a the GNU port of Unix diff, see DiffUtils for Windows.

However, it now occurs to me that in your case, another Unix utility might be better, comm, e.g. the one contained in GNU utilities for Win32:
Usage: comm [OPTION]... LEFT_FILE RIGHT_FILE
Compare sorted files LEFT_FILE and RIGHT_FILE line by line.

  -1              suppress lines unique to left file
  -2              suppress lines unique to right file
  -3              suppress lines unique to both files
      --help      display this help and exit
      --version   output version information and exit

Report bugs to <bug-textutils@gnu.org>.

If that appeals, you might also like a possibly more friendly alternative, File Intersection (fintrsct):
Quote
This program takes two text input files. It finds all the lines that are the same, and writes those out into a text file (defaults to common). Then it finds the lines that are unique to the first file and writes those out (defaults to unique1), and finds the lines that are unique to the second file (defaults to writing out to unique2). [...] original purpose was comparing system files like autoexec.bat between different systems in order to troubleshoot. My purpose was to help me back stuff up: I'd have a list of files (say, just for the sake of argument, a bunch of music files) that were on a CD, and a list of files that were in a folder, and this would help me figure out which were already backed up.

nkormanik

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 299
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #10 on: June 26, 2012, 03:37:34 PM »

Really useful additional tools, rjbull.  Thanks for following up.


4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,973
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #11 on: June 26, 2012, 07:48:33 PM »
Would be a plus for the program to give some idea it's still chugging along, especially on large comparison jobs.  In present test the progress bar has already gone from left to right, but job appears incomplete.  I'll keep letting it run for... a week.

If job completes successfully, will a text file list of the one-only files appear in the TIOOO directory?

It outputs a text file to your system temp directory, (currently WTFlist.txt because I forgot to change the output filename :) ).

Quote
Okay, list popped up in my default editor.  Worked like a charm!

So that was about an hour for how many files in total, (folder 1 + 2) ?

Maybe I can speed it up.  BTW, any chance you can ZeroZip the two folders and make them available for testing with?

However, it now occurs to me that in your case, another Unix utility might be better, comm, e.g. the one contained in GNU utilities for Win32:
Usage: comm [OPTION]... LEFT_FILE RIGHT_FILE
Compare sorted files LEFT_FILE and RIGHT_FILE line by line.

  -1              suppress lines unique to left file
  -2              suppress lines unique to right file
  -3              suppress lines unique to both files
      --help      display this help and exit
      --version   output version information and exit

I think that's the opposite of what we're trying to do - we want to suppress lines that are common to both files.

However, the File Intersection program looks pretty well spot on.  :Thmbsup:
« Last Edit: June 26, 2012, 08:48:27 PM by 4wd »

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 2,856
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #12 on: June 27, 2012, 04:16:20 PM »
However, it now occurs to me that in your case, another Unix utility might be better, comm, e.g. the one contained in GNU utilities for Win32:
Usage: comm [OPTION]... LEFT_FILE RIGHT_FILE
Compare sorted files LEFT_FILE and RIGHT_FILE line by line.

  -1              suppress lines unique to left file
  -2              suppress lines unique to right file
  -3              suppress lines unique to both files
      --help      display this help and exit
      --version   output version information and exit

I think that's the opposite of what we're trying to do - we want to suppress lines that are common to both files.
It does do what I think you want, but as ever with GNU/FSF, is inscrutable.  Don't overlook that you can combine arguments.  Consider two files:

1.txt        2.txt   
-----        -----   
Ash          Ash     
Holly        Beech   
Oak          Holly   
Rowan        Rowan   
Whitebeam    Whitebeam

Then:

c:\Zdir>c:\dos\Utils\comm.exe -3 1.txt 2.txt
        Beech
Oak

(when there are two output streams, they are normally separated by a tab)
Combining two arguments:

c:\Zdir>c:\dos\Utils\comm.exe -13 1.txt 2.txt
Beech

Just don't combine all of arguments 1, 2 and 3, or you won't learn much  :)
However, the File Intersection program looks pretty well spot on.  :Thmbsup:
It's good, but as I recall it, insists on writing all three files each time.  That means that in batch processes, you have to remember to delete the unwanted as well as wanted ones afterwards.  But, it seems more intuitive than comm.

TaoPhoenix

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 4,381
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #13 on: June 27, 2012, 05:05:14 PM »

Oh yeah, TIOOO = There Is Only One Of   :)


Are we going to forgo the chance to call it TCBOO aka Highlander's There Can Be Only One?

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,973
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #14 on: June 27, 2012, 08:13:42 PM »
It does do what I think you want, but as ever with GNU/FSF, is inscrutable.  Don't overlook that you can combine arguments.  Consider two files:

1.txt        2.txt    
-----        -----    
Ash          Ash      
Holly        Beech    
Oak          Holly    
Rowan        Rowan    
Whitebeam    Whitebeam

Then:

c:\Zdir>c:\dos\Utils\comm.exe -3 1.txt 2.txt
        Beech
Oak


Oh that is surely inscrutable, pretty much goes against my knowledge of the word suppress, ie. suppress = remove;hide;subdue       ;D


Quote
However, the File Intersection program looks pretty well spot on.  :Thmbsup:
It's good, but as I recall it, insists on writing all three files each time.  That means that in batch processes, you have to remember to delete the unwanted as well as wanted ones afterwards.  But, it seems more intuitive than comm.

But at least you would be able to associate an output file with an input list which would get you the path, (well, at least the initial folder path pertaining to each list).

I have an idea on how to get path output with filename but it involves modifying the sub-routine that returns the file list, (someone else's), into outputting a two dimensional array rather than one.
Speeding up the processing, I've realised, is easy (read: I'm stupid), but still wouldn't be as fast as the CLI C based alternatives I'd say.

Are we going to forgo the chance to call it TCBOO aka Highlander's There Can Be Only One?

I did think of that but I didn't want to sully the name of a semi-decent movie  ;D

In present test the progress bar has already gone from left to right, but job appears incomplete.

Mea culpa....I didn't realise the progressbar is virtually useless with large numbers so I've removed it and just added a statusbar that spits out the numbers as it progresses.

Also made each folder individually recurseable, (Am?), and will add a button to stop the process, (or change the Go button to Stop).

Update :up: thatta way
« Last Edit: June 27, 2012, 09:16:41 PM by 4wd »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,973
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #15 on: June 30, 2012, 02:44:25 AM »
Update up there:
  • Full filename and path are listed in sorted output.
  • Time taken display.
  • Added Exclusion filter.

I just realised I can possibly increase the comparison speed considerably.

"I'll be back."
« Last Edit: June 30, 2012, 02:57:19 AM by 4wd »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,973
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #16 on: June 30, 2012, 07:50:11 AM »
I'm back!

Update:
  • It got faster :)

It's gone from this
  • 14003 comparisons
  • 291 seconds
  • 12% CPU

to this:
  • 54548 comparisons
  • 281 seconds
  • 1% CPU

It even found the eight files there was only one of  :D

I think I'm going to stop while I'm ahead.

It's up there.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 2,856
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #17 on: July 01, 2012, 03:30:04 PM »
Oh that is surely inscrutable, pretty much goes against my knowledge of the word suppress, ie. suppress = remove;hide;subdue

Maybe I should reboot my explanation of how I think comm works  :)

In comparing two files, you have three cases: lines in File 1 that aren't in File 2; lines in File 2 that aren't in File 1; and lines that are common to both files, i.e. cross-file duplicates.  File Intersection takes the sort of approach that end-users might expect, and sends each data stream to a separate file, sensibly named.  comm sends all three data streams to STDOUT, simultaneously, in three parallel columns separated by tabs.  You can catch its output in a pager like MORE, or redirect it into a text file and examine it with an editor, but it's very hard to read, even if you make tab characters visible, the more so as data lines are usually different lengths.  The switches clarify output.  They suppress one or more data streams, so you end up with only the information you want.  So, the use of the word "suppress" is correct, and the concept is logical, but it's the logic of a coffeed-up supergeek in a 3 AM coding session.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,973
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #18 on: July 02, 2012, 07:48:25 AM »
So, the use of the word "suppress" is correct, and the concept is logical, but it's the logic of a coffeed-up supergeek in a 3 AM coding session.



Update up there.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,973
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #19 on: July 23, 2012, 07:25:02 AM »
UPDATE (v0.5) here:
  • Using a much faster sort routine.

@nkormanik:  Using the file tree you sent me, it was taking a ridiculously long time to sort the list, (I gave up waiting), so it may have failed because of something in the old sort routine.

New version goes through ~368000 files in about 10 seconds on my computer.
« Last Edit: July 23, 2012, 07:40:00 AM by 4wd »

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 4,393
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #20 on: July 23, 2012, 11:28:41 PM »
I'm going to go ahead and mark this request as DONE, if that's cool with you guys.   ;)

nkormanik

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 299
    • View Profile
    • Donate to Member
Re: DONE: Are any of the files missing???
« Reply #21 on: July 23, 2012, 11:34:46 PM »

And DONE quite well.  Thanks.