Main Area and Open Discussion > General Software Discussion
Looking for Software with this feature
Shades:
I'm wondering why file size matters, because wouldn't the date be off by even a few seconds if it's two different copies of a file? Even in a high speed automated "log.txt" or something updated and then aggressively backed up, do any of the options above change context if it doesn't need to know the file size (or maybe checksum, because for ex someone opens a text file and then say MS Word adds a line break it's now different.)
_______________________
-TaoPhoenix (July 03, 2015, 02:39 PM)
--- End quote ---
The OP refers to file "name, extension and size", but file size is generally an unreliable/imprecise basis for file comparison, whereas content (checksum) is pretty definitive as a data analysis tool.
You seem to have conflated "time" with "size", and yes, "time" is also an imprecise basis for file comparison - mainly because of the different and inconsistent time-stamping standards applied by different file operations and file management tools.
-IainB (July 03, 2015, 09:21 PM)
--- End quote ---
IainB is right about having a checksum from both files and comparing these to find out if the files are the same or not.
Unfortunately, it looks like xplorer2 uses CRC to generate these checksum values. The advantage of CRC checksums are that generating these is fast. Disadvantage is that these checksums are not always unique.
So these were replaced with MD5 hash values (which take a bit more time to generate) but nowadays these can also be tricked. Best option for now is to generate SHA-based hash values of files to identify these. But again, these take even longer to generate.
The method IainB suggests is the best method you can apply to identify if files are unique or not. CRC is better than nothing for this purpose, but not much. SHA is much better, but consumes a lot of computational resources, so if your system doesn't have much of that (readily) available...expect to wait long times.
MilesAhead:
Especially if files are large, say > 1GB video as example, doing a hash of A then doing a hash of B to see if A = B could get very slow. A better method may be side by side
--- ---;;pseudocode
hashA = hashB = initial_hash;
do
{
hashA = hash(fileAbuffer,hashA);
hashB = hash(fileBbuffer,hashB);
if (hashA != hashB)
return false;
fillbuffer(fileA);
fillbuffer(fileB);
} until(hash_done(fileA) || hash_done(fileB));
final compare here etc
iow cut off the comparison as soon as there is a difference
ednja:
""Size" could be a potentially unreliable comparison, so I would recommend using "Content" instead."
Yes I was aware of this and was wondering if anyone would notice it and mention it.
It would be better to check the content and there are duplicate removal programs that do compare the content with checksums. I only mentioned size because that would atleast be better than my manually moving the files. After posting the question, I was thinking I should have included the content. When I'm moving files, most of them are of same size and I overwrite the files, but I often am moving too fast and when it gets to two files of different size, I overshoot and overwrite (so I lose one). Also, sometimes I've manually compared the content in two files of equal size and found sometimes that one file is corrupted and won't open, even though it has the same size. So for sure, it would be better to compare the content. In fact, as far as I'm concerned, this would make the program valuable. In the programs for duplicates removal, the programs do a scan and once done it lists all the duplicates. Then to remove them you have to manually select which ones to remove (or you can allow the program to select, but I haven't found one yet that can be trusted for this function. If there aren't many duplicates in the hard drive, then manually selecting the ones to remove isn't a big deal. But when having to remove hundreds or even thousands of duplicates, this can take many hours of work. An example of having such large numbers of duplicates is when people use hard drive backup programs and the programs keeps copying the same files over and over into different folders and mixing them up until the hard drive is full and the user doesn't know what to do. I don't use backup software, but I know of people that do and I've been asked to sort it out. I've done this in my brother's computer and it took me many hours.
I would modifiy the filters as follows:
Filter #1: If the file being moved has the same name, extension, size and content as a file already existing in the target folder, then overwrite the existing file.
Filter #2: If the file being moved has the same name and extension as a file already existing in the target folder, but the two files are different in size or have different content or both, then move the file, but keep both files.
ednja:
I found this software called NiceCopier today. It's the closest so far to what I'm looking for, but still doesn't have enough automation. Plus, it doesn't compare the contents of the files. It's the only one so far that compares the size.
https://sourceforge.net/projects/nicecopier/
IainB:
@ednja: This is your description of your revised filters, with the Opening Post filters inserted below each in the quotes, for comparison. I have highlighted the difference in the newer Filter description:
... I would modifiy the filters as follows:
Filter #1: If the file being moved has the same name, extension, size and content as a file already existing in the target folder, then overwrite the existing file.
Whereas the Opening Post says:
Filter #1: If the file being moved has the same name, extension and size as a file already existing in the target folder, then overwrite the existing file.
______________________________
--- End quote ---
Filter #2: If the file being moved has the same name and extension as a file already existing in the target folder, but the two files are different in size or have different content or both, then move the file, but keep both files.
Whereas the Opening Post says:
Filter #2: If the file being moved has the same name and extension as a file already existing in the target folder, but the two files are different in size, then move the file, but keep both files.
______________________________
--- End quote ---
-ednja (July 06, 2015, 03:37 PM)
--- End quote ---
Sorry, but I think at this point I must be missing something as I do not understand:
(a) Why you need to "overwrite the existing file." with the file in Source in Filter #1. It is a superfluous/redundant step. You need only to leave the Source file as-is (or delete it if it is not wanted), and leave the target file untouched.
(b) Why you persist in including the use of file size as a basis of comparison at all, when it would seem to be irrelevant (QED). It confuses the issue unnecessarily.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version