1
General Software Discussion / The problem with text compare tools - similar, in database compare tools
« on: July 14, 2013, 05:26 AM »
Here on DC, there is an old "comparitive" review of Beyond Compare:
https://www.donation...pareTools/index.html
It states, there is one prob with BC, which is lack of in-place editing. Now, the review being from 2005, this prob has been resolved long ago.
But there is another prob with almost all of these text compare tools, which is they are unable to compare "displaced text parts", or whatever you would call them.
"All" these tools just compare line by line, and the respective line content, and are thus unable to "see" that a bunch of 10 or 50 lines is UNCHANGED, whenever that bunch of lines has been displaced elsewhere in the second text.
For every programmer who displaces routines within his global set of programming lines, this makes these ordinary tools almost unusable, and existing tools which lack such functionality, unfortunately are NOT amended in such a way (and the developers of BC (of which I have a license) are very friendly but don't do the necessary development to their otherwise fine program either), but there should be SOME tools at least that DO such a thing.
Now let me explain. I know that for discovering that a single LINE has been displaced elsewhere, first, such a tool would need a much more complicated algorithm, since without being really sophisticated, it would process lots of false "hits": It goes without saying that in normal texts, here and there, but in programming, LOTS of lines would be identical, but without being displaced, it's just that the same lines occur, again and again, within many, DIFFERENT, contexts. So, these respective contexts would have to be analyzed, too, by such a tool.
Also, the same problem COULD occur with "PARAGRAPHS": Since in programming, a line is also a paragraph, for such a tool, checking for "paragraph" first, would not be helpful in order to avoid such "false hits". On the other hand, most paragraphs in normal texts would be "enclosed" by blank lines, whilst in programming, these lines would be paragraphs, but normally NOT "enclosed" by blank lines, so the algorithm could check for "REAL" paragraphs vs. paragraphs that are just separate lines, and then try for finding just these "real paragraphs" elsewhere, whenever in place they would be expected, in text 2, they are missing. So this would be ONE OPTION for such a tool.
A SECOND OPTION for such a tool (and this could be realized even within the same tool, "by option") would be to look after a special character or any other "divider character combination" or such, i.e. it would not even to check for displaced "real paragraphs", but only for displaced "paragraph groups" / "text entities" or such, meaning, when text enclosed in these "divider codes" is there in text 1, but missing in text 2. Such "divider codes" could be TWO blank lines, or a really special character that does not occur BUT for separating your "programming entities" within your big file (e.g. the Japanese Yen character, or anything you want), and it would be easy to put such a very special character into your programming text whenever needed, since any programming language has got a special character for "comment line", and you would only put such lines between your sub-routines, in the form
CommentLineCharacter and then SeparateEntityBeginCode
Also, such a tool, with such functionality, would be a relief for anybody doing his work within an outliner, since outliners "invite" to re-arrange all your stuff again and again, in order to have, in the end and ideally, all your stuff within "meaning ful context", in the same way some people don't so much multiply separate paper files, but put them, whenever possible, into lever files, in order to group them. (It goes without saying here that this is a good thing for "reference material", but not really advisable for separate customer files or such.)
Now, most of these outliners have got an EXPORT function, to plain text at least, and for a text compare tool, this is the format in which such "outliner files" could be read and be compared, and it immediately becomes evident what the above-mentioned functionality would be able to to here:
Any outliner item that you just would have displaced, would be checked by such a tool, and then discarded as identical, which in fact it is, and this tool would only show then items in which you would have done ADDITIONS or CHANGES, or NEW items, or (in the other pane, respectively,) DELETED items, but it would NOT show all these, perhaps numerous, items that are unchanged, but just displaced to another position within your tree.
So, here as with programming bits above, the question is, which TEXT compare tools are able to to such a more sophisticated comparison, without all those "false hits" showing up in less elaborate tools,
AND, there also should perhaps be some DATABASE compare tools that are able to do such a comparison, by database "RECORD CONTENT" comparison. Here, the very first problem is that most databse compare tools do NOT EVEN compare content, but only structure, and those that are (possibly) able to compare content, are "just for MySQL", so the question is, are they able to compare "records" in just text format, and with the "record begin code" of your choice (of course, it would be possible to use the special character the db compare tools then "needs", or to replace the "commentlinecharacter plus recordbegincharer" to the special "recordbegincharacter" the db compare tool then needs in order to properly function.
But there is also the question if these db compare tools, comparing content, are able to then compare the content of any record to the content of any record, or if they, too, as in the usual text compare tools, just compare content of record 1 in "text" 1 to content of record 1 in "text" 2, then record 2 to record 2, and so on, which would be devoid of any sense. (Of course, there is an additional prob with db compare tools, price: some of them are 1,000 dollars or even more, so I'm looking out for such a tool, in case, that's not as expensive as that.)
Hence my questions:
- any insight into text compare tools, with respect to these details?
- or any insight into database compare tools, ditto?
https://www.donation...pareTools/index.html
It states, there is one prob with BC, which is lack of in-place editing. Now, the review being from 2005, this prob has been resolved long ago.
But there is another prob with almost all of these text compare tools, which is they are unable to compare "displaced text parts", or whatever you would call them.
"All" these tools just compare line by line, and the respective line content, and are thus unable to "see" that a bunch of 10 or 50 lines is UNCHANGED, whenever that bunch of lines has been displaced elsewhere in the second text.
For every programmer who displaces routines within his global set of programming lines, this makes these ordinary tools almost unusable, and existing tools which lack such functionality, unfortunately are NOT amended in such a way (and the developers of BC (of which I have a license) are very friendly but don't do the necessary development to their otherwise fine program either), but there should be SOME tools at least that DO such a thing.
Now let me explain. I know that for discovering that a single LINE has been displaced elsewhere, first, such a tool would need a much more complicated algorithm, since without being really sophisticated, it would process lots of false "hits": It goes without saying that in normal texts, here and there, but in programming, LOTS of lines would be identical, but without being displaced, it's just that the same lines occur, again and again, within many, DIFFERENT, contexts. So, these respective contexts would have to be analyzed, too, by such a tool.
Also, the same problem COULD occur with "PARAGRAPHS": Since in programming, a line is also a paragraph, for such a tool, checking for "paragraph" first, would not be helpful in order to avoid such "false hits". On the other hand, most paragraphs in normal texts would be "enclosed" by blank lines, whilst in programming, these lines would be paragraphs, but normally NOT "enclosed" by blank lines, so the algorithm could check for "REAL" paragraphs vs. paragraphs that are just separate lines, and then try for finding just these "real paragraphs" elsewhere, whenever in place they would be expected, in text 2, they are missing. So this would be ONE OPTION for such a tool.
A SECOND OPTION for such a tool (and this could be realized even within the same tool, "by option") would be to look after a special character or any other "divider character combination" or such, i.e. it would not even to check for displaced "real paragraphs", but only for displaced "paragraph groups" / "text entities" or such, meaning, when text enclosed in these "divider codes" is there in text 1, but missing in text 2. Such "divider codes" could be TWO blank lines, or a really special character that does not occur BUT for separating your "programming entities" within your big file (e.g. the Japanese Yen character, or anything you want), and it would be easy to put such a very special character into your programming text whenever needed, since any programming language has got a special character for "comment line", and you would only put such lines between your sub-routines, in the form
CommentLineCharacter and then SeparateEntityBeginCode
Also, such a tool, with such functionality, would be a relief for anybody doing his work within an outliner, since outliners "invite" to re-arrange all your stuff again and again, in order to have, in the end and ideally, all your stuff within "meaning ful context", in the same way some people don't so much multiply separate paper files, but put them, whenever possible, into lever files, in order to group them. (It goes without saying here that this is a good thing for "reference material", but not really advisable for separate customer files or such.)
Now, most of these outliners have got an EXPORT function, to plain text at least, and for a text compare tool, this is the format in which such "outliner files" could be read and be compared, and it immediately becomes evident what the above-mentioned functionality would be able to to here:
Any outliner item that you just would have displaced, would be checked by such a tool, and then discarded as identical, which in fact it is, and this tool would only show then items in which you would have done ADDITIONS or CHANGES, or NEW items, or (in the other pane, respectively,) DELETED items, but it would NOT show all these, perhaps numerous, items that are unchanged, but just displaced to another position within your tree.
So, here as with programming bits above, the question is, which TEXT compare tools are able to to such a more sophisticated comparison, without all those "false hits" showing up in less elaborate tools,
AND, there also should perhaps be some DATABASE compare tools that are able to do such a comparison, by database "RECORD CONTENT" comparison. Here, the very first problem is that most databse compare tools do NOT EVEN compare content, but only structure, and those that are (possibly) able to compare content, are "just for MySQL", so the question is, are they able to compare "records" in just text format, and with the "record begin code" of your choice (of course, it would be possible to use the special character the db compare tools then "needs", or to replace the "commentlinecharacter plus recordbegincharer" to the special "recordbegincharacter" the db compare tool then needs in order to properly function.
But there is also the question if these db compare tools, comparing content, are able to then compare the content of any record to the content of any record, or if they, too, as in the usual text compare tools, just compare content of record 1 in "text" 1 to content of record 1 in "text" 2, then record 2 to record 2, and so on, which would be devoid of any sense. (Of course, there is an additional prob with db compare tools, price: some of them are 1,000 dollars or even more, so I'm looking out for such a tool, in case, that's not as expensive as that.)
Hence my questions:
- any insight into text compare tools, with respect to these details?
- or any insight into database compare tools, ditto?