Welcome Guest.   Make a donation to an author on the site July 28, 2014, 05:31:14 PM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
Free DonationCoder.com Member Kit: Submit Request.
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: DONE: Are any of the files missing???  (Read 6621 times)
nkormanik
Participant
*
Posts: 283

View Profile Give some DonationCredits to this forum member
« on: June 25, 2012, 11:47:53 AM »


Perhaps not a request for a coding snack, but possibly.

There are existing programs which probably suffice.

But I'd like advice on which you would use, given the following circumstance:

Given:

--- One complete set of 'dummy' files, held within subdirectories of a primary folder:

c:\dummy set of images\subset 1\...
c:\dummy set of images\subset 2\...
c:\dummy set of images\subset 3\...
c:\dummy set of images\subset 4\...
etc.

--- One set of actual files, but not complete..., certain ones are missing:

c:\real set of images\subset 1\... (various ones missing)
c:\real set of images\subset 2\... (various ones missing)
c:\real set of images\subset 3\... (various ones missing)
c:\real set of images\subset 4\... (various ones missing)
etc.


--- All files in "Group Dummy" have unique file names.
--- "Group Real" file names correspond to Group Dummy names.
--- If all is perfect, there should be TWO of each file in the combined lot.
--- If a file is missing (from "Group Real"), there will only be ONE of that file in the combined lot.


Task:  Find out which files are missing in "Group Real."


Thanks.

Nicholas Kormanik
nkormanik@gmail.com

Logged
rjbull
Charter Member
***
Posts: 2,737

View Profile Give some DonationCredits to this forum member
« Reply #1 on: June 25, 2012, 03:12:24 PM »

er, can't you just DIR /B in dummy and real folders, and compare them with a diff program?
Logged
Carol Haynes
Waffles for England (patent pending)
Global Moderator
*****
Posts: 7,952



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #2 on: June 25, 2012, 03:21:34 PM »

How about Beyond Compare - shows you not only files in one folder and not the other but also files that have contents that don't match. See: http://www.scootersoftware.com/

I haven't tried it but this page may be worth a look too: http://www.techsupportale...le-comparison-utility.htm
Logged

nkormanik
Participant
*
Posts: 283

View Profile Give some DonationCredits to this forum member
« Reply #3 on: June 25, 2012, 03:35:17 PM »


rjbull, DIR /B /S is good.  That can get us a list of files of both folders.  Then strip away path.  Once the two lists are in hand, what diff program would you use?

Carol, remember the files are buried within subdirectories.  Most such programs as you mention want all files to be one level deep, no further.

Logged
Carol Haynes
Waffles for England (patent pending)
Global Moderator
*****
Posts: 7,952



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #4 on: June 25, 2012, 04:52:15 PM »

Beyond Compare works its way through trees (and even inside archives) very well. Ok it isn't free but it is very good - and you can use it from Explorer with a right click.
Logged

4wd
Supporting Member
**
Posts: 3,265



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #5 on: June 25, 2012, 07:03:16 PM »

Using a bit of Google-fu:

This batch file will get you a list of files sans paths, (rename output file before running a second time):

Formatted for Generic Code with the GeSHI Syntax Highlighter [copy or print]
  1. REM listfile.bat
  2. @echo off
  3. for /r %1 %%g in (*) do echo %%~nxg>>list.txt
  4. sort list.txt /O sorted.txt
  5. del list.txt

eg. listfile.bat c:\ will give you a file called sorted.txt containing a list of every file on C: drive

WinMerge, (free), will let you compare two text files for differences.

BTW, WinMerge will also compare folders but my way added a bit of scripting  tongue

I suppose a simple GUI frontend could be made using Auto(It|HK) to automate all the steps, ie. pick two folders, hit Go.
« Last Edit: June 25, 2012, 09:31:27 PM by 4wd; Reason: Add the delete line else it will just add to existing file. » Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
nkormanik
Participant
*
Posts: 283

View Profile Give some DonationCredits to this forum member
« Reply #6 on: June 25, 2012, 11:22:15 PM »


4wd, I, for one, would appreciate the GUI front end.

But even if you don't get the chance, thanks for the script above.

Logged
4wd
Supporting Member
**
Posts: 3,265



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #7 on: June 26, 2012, 06:22:20 AM »

TCBOO - output will open in your default text editor, where you can view or save it.

TCBOO = There Can Be Only One   smiley (In honour of TaoPhoenix)



Only files that exist once will be listed.

The interface is self-explanatory, tool tips for everything of interest.

Caveats:
  • It can, in theory, only handle about 16 million files but I still wouldn't like to wait for the output.  Technical reason: Array limits.
  • It uses strings for the comparison, maximum string length is ~2 Giga-characters - which is a lot.  However, if you're are doing a lot of really deep nested files, then this limit may be exceeded but it would really have to be a lot.
  • Source code is messy and uncommented but hey, that's the way I like it.  Technical reason: It keeps my brain active.

Update (starting at v0.4 which was a fairly large rewrite):
  • Comparison speed has increased due to skwire pointing out a rather simpler way of doing things - thanks skwire!
  • You can optionally exclude the folder tree, (relative to initial path), from the comparison
  • Filters and Folders are written to an ini file, (same name as executable), when you exit
  • It works for network shares, (ie. really hacky mod that stopped it is no more)

Update (v0.5)
  • Uses a much faster sort routine.

Differences from previous version:
  • It only looks for and removes from the output, files that occur in multiples of 2 - I had over-engineered the previous versions to remove any file that occurred 2 or more times, as skwire kindly pointed out  embarassed
    eg. So if a file exists 3 times, it will appear once in the output.  If it exists 4 times, it won't appear in the output. (Dependant on Different Tree setting)

How much faster is it from the previous version?
On my computer, (3.3GHz x6):
Previous version: 54548 filenames in 281 seconds, (HDD uncached read)
Version 0.4:        54550 filenames in 5 seconds, (HDD cached read)
Version 0.4:        253350 filenames in 63 seconds, (SSD uncached read)
Version 0.4:        253350 filenames in 28 seconds, (SSD cached read)

Most of the time is spent getting the file lists as from v0.5.

* TCBOO.7z (345.3 KB - downloaded 523 times.)
« Last Edit: July 24, 2012, 02:03:57 AM by 4wd » Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
nkormanik
Participant
*
Posts: 283

View Profile Give some DonationCredits to this forum member
« Reply #8 on: June 26, 2012, 11:59:34 AM »


Thanks for working on TIOOO, 4wd.

Would be a plus for the program to give some idea it's still chugging along, especially on large comparison jobs.  In present test the progress bar has already gone from left to right, but job appears incomplete.  I'll keep letting it run for... a week.

If job completes successfully, will a text file list of the one-only files appear in the TIOOO directory?

Thanks much!

=====


Okay, list popped up in my default editor.  Worked like a charm!


« Last Edit: June 26, 2012, 12:45:31 PM by nkormanik » Logged
rjbull
Charter Member
***
Posts: 2,737

View Profile Give some DonationCredits to this forum member
« Reply #9 on: June 26, 2012, 03:19:23 PM »

rjbull, DIR /B /S is good.  That can get us a list of files of both folders.  Then strip away path.  Once the two lists are in hand, what diff program would you use?
I realise you're sorted now, but for the record, I'd most probably use a the GNU port of Unix diff, see DiffUtils for Windows.

However, it now occurs to me that in your case, another Unix utility might be better, comm, e.g. the one contained in GNU utilities for Win32:
Usage: comm [OPTION]... LEFT_FILE RIGHT_FILE
Compare sorted files LEFT_FILE and RIGHT_FILE line by line.

  -1              suppress lines unique to left file
  -2              suppress lines unique to right file
  -3              suppress lines unique to both files
      --help      display this help and exit
      --version   output version information and exit

Report bugs to <bug-textutils@gnu.org>.

If that appeals, you might also like a possibly more friendly alternative, File Intersection (fintrsct):
Quote
This program takes two text input files. It finds all the lines that are the same, and writes those out into a text file (defaults to common). Then it finds the lines that are unique to the first file and writes those out (defaults to unique1), and finds the lines that are unique to the second file (defaults to writing out to unique2). [...] original purpose was comparing system files like autoexec.bat between different systems in order to troubleshoot. My purpose was to help me back stuff up: I'd have a list of files (say, just for the sake of argument, a bunch of music files) that were on a CD, and a list of files that were in a folder, and this would help me figure out which were already backed up.
Logged
nkormanik
Participant
*
Posts: 283

View Profile Give some DonationCredits to this forum member
« Reply #10 on: June 26, 2012, 03:37:34 PM »


Really useful additional tools, rjbull.  Thanks for following up.

Logged
4wd
Supporting Member
**
Posts: 3,265



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #11 on: June 26, 2012, 07:48:33 PM »

Would be a plus for the program to give some idea it's still chugging along, especially on large comparison jobs.  In present test the progress bar has already gone from left to right, but job appears incomplete.  I'll keep letting it run for... a week.

If job completes successfully, will a text file list of the one-only files appear in the TIOOO directory?

It outputs a text file to your system temp directory, (currently WTFlist.txt because I forgot to change the output filename smiley ).

Quote
Okay, list popped up in my default editor.  Worked like a charm!

So that was about an hour for how many files in total, (folder 1 + 2) ?

Maybe I can speed it up.  BTW, any chance you can ZeroZip the two folders and make them available for testing with?

However, it now occurs to me that in your case, another Unix utility might be better, comm, e.g. the one contained in GNU utilities for Win32:
Usage: comm [OPTION]... LEFT_FILE RIGHT_FILE
Compare sorted files LEFT_FILE and RIGHT_FILE line by line.

  -1              suppress lines unique to left file
  -2              suppress lines unique to right file
  -3              suppress lines unique to both files
      --help      display this help and exit
      --version   output version information and exit

I think that's the opposite of what we're trying to do - we want to suppress lines that are common to both files.

However, the File Intersection program looks pretty well spot on.  Thmbsup
« Last Edit: June 26, 2012, 08:48:27 PM by 4wd » Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
rjbull
Charter Member
***
Posts: 2,737

View Profile Give some DonationCredits to this forum member
« Reply #12 on: June 27, 2012, 04:16:20 PM »

However, it now occurs to me that in your case, another Unix utility might be better, comm, e.g. the one contained in GNU utilities for Win32:
Usage: comm [OPTION]... LEFT_FILE RIGHT_FILE
Compare sorted files LEFT_FILE and RIGHT_FILE line by line.

  -1              suppress lines unique to left file
  -2              suppress lines unique to right file
  -3              suppress lines unique to both files
      --help      display this help and exit
      --version   output version information and exit

I think that's the opposite of what we're trying to do - we want to suppress lines that are common to both files.
It does do what I think you want, but as ever with GNU/FSF, is inscrutable.  Don't overlook that you can combine arguments.  Consider two files:

1.txt        2.txt   
-----        -----   
Ash          Ash     
Holly        Beech   
Oak          Holly   
Rowan        Rowan   
Whitebeam    Whitebeam

Then:

c:\Zdir>c:\dos\Utils\comm.exe -3 1.txt 2.txt
        Beech
Oak

(when there are two output streams, they are normally separated by a tab)
Combining two arguments:

c:\Zdir>c:\dos\Utils\comm.exe -13 1.txt 2.txt
Beech

Just don't combine all of arguments 1, 2 and 3, or you won't learn much  smiley
However, the File Intersection program looks pretty well spot on.  Thmbsup
It's good, but as I recall it, insists on writing all three files each time.  That means that in batch processes, you have to remember to delete the unwanted as well as wanted ones afterwards.  But, it seems more intuitive than comm.
Logged
TaoPhoenix
Supporting Member
**
Posts: 3,471



0 - 60 ... then back to 0 again!

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #13 on: June 27, 2012, 05:05:14 PM »


Oh yeah, TIOOO = There Is Only One Of   smiley


Are we going to forgo the chance to call it TCBOO aka Highlander's There Can Be Only One?
Logged
4wd
Supporting Member
**
Posts: 3,265



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #14 on: June 27, 2012, 08:13:42 PM »

It does do what I think you want, but as ever with GNU/FSF, is inscrutable.  Don't overlook that you can combine arguments.  Consider two files:

1.txt        2.txt    
-----        -----    
Ash          Ash      
Holly        Beech    
Oak          Holly    
Rowan        Rowan    
Whitebeam    Whitebeam

Then:

c:\Zdir>c:\dos\Utils\comm.exe -3 1.txt 2.txt
        Beech
Oak


Oh that is surely inscrutable, pretty much goes against my knowledge of the word suppress, ie. suppress = remove;hide;subdue       Grin


Quote
However, the File Intersection program looks pretty well spot on.  Thmbsup
It's good, but as I recall it, insists on writing all three files each time.  That means that in batch processes, you have to remember to delete the unwanted as well as wanted ones afterwards.  But, it seems more intuitive than comm.

But at least you would be able to associate an output file with an input list which would get you the path, (well, at least the initial folder path pertaining to each list).

I have an idea on how to get path output with filename but it involves modifying the sub-routine that returns the file list, (someone else's), into outputting a two dimensional array rather than one.
Speeding up the processing, I've realised, is easy (read: I'm stupid), but still wouldn't be as fast as the CLI C based alternatives I'd say.

Are we going to forgo the chance to call it TCBOO aka Highlander's There Can Be Only One?

I did think of that but I didn't want to sully the name of a semi-decent movie  Grin

In present test the progress bar has already gone from left to right, but job appears incomplete.

Mea culpa....I didn't realise the progressbar is virtually useless with large numbers so I've removed it and just added a statusbar that spits out the numbers as it progresses.

Also made each folder individually recurseable, (Am?), and will add a button to stop the process, (or change the Go button to Stop).

Update thumbs up thatta way
« Last Edit: June 27, 2012, 09:16:41 PM by 4wd » Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
4wd
Supporting Member
**
Posts: 3,265



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #15 on: June 30, 2012, 02:44:25 AM »

Update up there:
  • Full filename and path are listed in sorted output.
  • Time taken display.
  • Added Exclusion filter.

I just realised I can possibly increase the comparison speed considerably.

"I'll be back."
« Last Edit: June 30, 2012, 02:57:19 AM by 4wd » Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
4wd
Supporting Member
**
Posts: 3,265



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #16 on: June 30, 2012, 07:50:11 AM »

I'm back!

Update:
  • It got faster smiley

It's gone from this
  • 14003 comparisons
  • 291 seconds
  • 12% CPU

to this:
  • 54548 comparisons
  • 281 seconds
  • 1% CPU

It even found the eight files there was only one of  cheesy

I think I'm going to stop while I'm ahead.

It's up there.
Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
rjbull
Charter Member
***
Posts: 2,737

View Profile Give some DonationCredits to this forum member
« Reply #17 on: July 01, 2012, 03:30:04 PM »

Oh that is surely inscrutable, pretty much goes against my knowledge of the word suppress, ie. suppress = remove;hide;subdue

Maybe I should reboot my explanation of how I think comm works  smiley

In comparing two files, you have three cases: lines in File 1 that aren't in File 2; lines in File 2 that aren't in File 1; and lines that are common to both files, i.e. cross-file duplicates.  File Intersection takes the sort of approach that end-users might expect, and sends each data stream to a separate file, sensibly named.  comm sends all three data streams to STDOUT, simultaneously, in three parallel columns separated by tabs.  You can catch its output in a pager like MORE, or redirect it into a text file and examine it with an editor, but it's very hard to read, even if you make tab characters visible, the more so as data lines are usually different lengths.  The switches clarify output.  They suppress one or more data streams, so you end up with only the information you want.  So, the use of the word "suppress" is correct, and the concept is logical, but it's the logic of a coffeed-up supergeek in a 3 AM coding session.
Logged
4wd
Supporting Member
**
Posts: 3,265



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #18 on: July 02, 2012, 07:48:25 AM »

So, the use of the word "suppress" is correct, and the concept is logical, but it's the logic of a coffeed-up supergeek in a 3 AM coding session.



Update up there.
Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
4wd
Supporting Member
**
Posts: 3,265



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #19 on: July 23, 2012, 07:25:02 AM »

UPDATE (v0.5) here:
  • Using a much faster sort routine.

@nkormanik:  Using the file tree you sent me, it was taking a ridiculously long time to sort the list, (I gave up waiting), so it may have failed because of something in the old sort routine.

New version goes through ~368000 files in about 10 seconds on my computer.
« Last Edit: July 23, 2012, 07:40:00 AM by 4wd » Logged

Four wheel drive: Helping you get stuck faster, harder, further from help...........and it's no different on this forum Evil
skwire
Moderator
*****
Posts: 4,019



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #20 on: July 23, 2012, 11:28:41 PM »

I'm going to go ahead and mark this request as DONE, if that's cool with you guys.   Wink
Logged

nkormanik
Participant
*
Posts: 283

View Profile Give some DonationCredits to this forum member
« Reply #21 on: July 23, 2012, 11:34:46 PM »


And DONE quite well.  Thanks.

Logged
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.073s | Server load: 0.02 ]