ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

The problem with text compare tools - similar, in database compare tools

(1/5) > >>

evamaria:
Here on DC, there is an old "comparitive" review of Beyond Compare:

http://www.donationcoder.com/Reviews/Archive/CompareTools/index.html

It states, there is one prob with BC, which is lack of in-place editing. Now, the review being from 2005, this prob has been resolved long ago.

But there is another prob with almost all of these text compare tools, which is they are unable to compare "displaced text parts", or whatever you would call them.

"All" these tools just compare line by line, and the respective line content, and are thus unable to "see" that a bunch of 10 or 50 lines is UNCHANGED, whenever that bunch of lines has been displaced elsewhere in the second text.

For every programmer who displaces routines within his global set of programming lines, this makes these ordinary tools almost unusable, and existing tools which lack such functionality, unfortunately are NOT amended in such a way (and the developers of BC (of which I have a license) are very friendly but don't do the necessary development to their otherwise fine program either), but there should be SOME tools at least that DO such a thing.


Now let me explain. I know that for discovering that a single LINE has been displaced elsewhere, first, such a tool would need a much more complicated algorithm, since without being really sophisticated, it would process lots of false "hits": It goes without saying that in normal texts, here and there, but in programming, LOTS of lines would be identical, but without being displaced, it's just that the same lines occur, again and again, within many, DIFFERENT, contexts. So, these respective contexts would have to be analyzed, too, by such a tool.

Also, the same problem COULD occur with "PARAGRAPHS": Since in programming, a line is also a paragraph, for such a tool, checking for "paragraph" first, would not be helpful in order to avoid such "false hits". On the other hand, most paragraphs in normal texts would be "enclosed" by blank lines, whilst in programming, these lines would be paragraphs, but normally NOT "enclosed" by blank lines, so the algorithm could check for "REAL" paragraphs vs. paragraphs that are just separate lines, and then try for finding just these "real paragraphs" elsewhere, whenever in place they would be expected, in text 2, they are missing. So this would be ONE OPTION for such a tool.

A SECOND OPTION for such a tool (and this could be realized even within the same tool, "by option") would be to look after a special character or any other "divider character combination" or such, i.e. it would not even to check for displaced "real paragraphs", but only for displaced "paragraph groups" / "text entities" or such, meaning, when text enclosed in these "divider codes" is there in text 1, but missing in text 2. Such "divider codes" could be TWO blank lines, or a really special character that does not occur BUT for separating your "programming entities" within your big file (e.g. the Japanese Yen character, or anything you want), and it would be easy to put such a very special character into your programming text whenever needed, since any programming language has got a special character for "comment line", and you would only put such lines between your sub-routines, in the form

CommentLineCharacter and then SeparateEntityBeginCode


Also, such a tool, with such functionality, would be a relief for anybody doing his work within an outliner, since outliners "invite" to re-arrange all your stuff again and again, in order to have, in the end and ideally, all your stuff within "meaning ful context", in the same way some people don't so much multiply separate paper files, but put them, whenever possible, into lever files, in order to group them. (It goes without saying here that this is a good thing for "reference material", but not really advisable for separate customer files or such.)

Now, most of these outliners have got an EXPORT function, to plain text at least, and for a text compare tool, this is the format in which such "outliner files" could be read and be compared, and it immediately becomes evident what the above-mentioned functionality would be able to to here:

Any outliner item that you just would have displaced, would be checked by such a tool, and then discarded as identical, which in fact it is, and this tool would only show then items in which you would have done ADDITIONS or CHANGES, or NEW items, or (in the other pane, respectively,) DELETED items, but it would NOT show all these, perhaps numerous, items that are unchanged, but just displaced to another position within your tree.


So, here as with programming bits above, the question is, which TEXT compare tools are able to to such a more sophisticated comparison, without all those "false hits" showing up in less elaborate tools,

AND, there also should perhaps be some DATABASE compare tools that are able to do such a comparison, by database "RECORD CONTENT" comparison. Here, the very first problem is that most databse compare tools do NOT EVEN compare content, but only structure, and those that are (possibly) able to compare content, are "just for MySQL", so the question is, are they able to compare "records" in just text format, and with the "record begin code" of your choice (of course, it would be possible to use the special character the db compare tools then "needs", or to replace the "commentlinecharacter plus recordbegincharer" to the special "recordbegincharacter" the db compare tool then needs in order to properly function.

But there is also the question if these db compare tools, comparing content, are able to then compare the content of any record to the content of any record, or if they, too, as in the usual text compare tools, just compare content of record 1 in "text" 1 to content of record 1 in "text" 2, then record 2 to record 2, and so on, which would be devoid of any sense. (Of course, there is an additional prob with db compare tools, price: some of them are 1,000 dollars or even more, so I'm looking out for such a tool, in case, that's not as expensive as that.)



Hence my questions:

- any insight into text compare tools, with respect to these details?
- or any insight into database compare tools, ditto?

evamaria:
"ExamDiff Pro" seems to have a "moved blocks" function, but is 35 dollars per year. That's not so expensive after all if it functions well, but I totally hate subscription schemes...

Ath:
For text comparison I always use the shareware Compare It! by Grigsoft
Development has stalled the last couple of years, but for me it's a highly valued tool, and I would buy it again if I had to.

There is an option 'Find moved sections' in the 'Options/Options/Comparison/Advanced' options page that needs to be enabled for finding moved stuff like you described. And setting 'Use adaptive comparison' on the same page could also help.
I usually advise to enable 'Options/Lines view' from the menu (it's disabled by default), and the same for 'Options/Options/Editor' checkbox "Scroll left pane horizontally with mouse wheel".

Disclaimer:
I wrote a translation for my native language for it (and also for Synchronize It!), but have no other bonds with the product(s) or company.

evamaria:
There's a thread asking a similar question here:

http://stackoverflow.com/questions/96051/which-file-comparison-tool-can-handle-block-movement-and-multiple-revisions

"Which file comparison tool can handle block movement and multiple revisions?"

unfortunately closed by an overzealous mod, but then, many of the answers are false hits, and 21 "up" votes for the mention of a program that precisely doesn't offer the wanted feature, well... but in any closed thread, false info will stay forever, instead of being corrected...!

As for the similar problem in databases, there's this thread:

http://stackoverflow.com/questions/225772/compare-two-mysql-databases :

"Compare two MySQL databases"

which also has been closed by an overzealous mod, for "being not constructive", and here, I'm a little bit speechless, though, since as we all can see, it's a highly constructive question, and many answers seem to be more than helpful, so it would have been a good thing to add more of such.


Thank you, Ath, will try! There also seem to be some more of this kind now, will report if there are positive results.

Shades:
Apparently ExamDiff Pro can handle moved blocks since May 2009 (version 4.5 and up), see here.

I don't know which version (of ExamDiff) you were using/trying, but I cannot stand by when one of my favorites get shot down  :P

Navigation

[0] Message Index

[#] Next page

Go to full version