Beyond Compare v4 Open Beta

ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

<< < (3/4) > >>

xtabber:
Beyond Compare 4 was released on September 1, 2014.

A list of changes from BC3 can be found here.

A list of features that also indicates the differences between Standard and Pro editions can be found here.

anandcoral:
Hi peter.s

I am curious about the points you mention,
1. BC is unable to treat new lines as "no change"[/li][/list]
2. BC is unable to "detect" changes in paragraph order.[/li][/list]

I am a developer and I can develop a text file compare program. I just can not get the idea what logic I should do to satisfy the point.1

If I treat \n as null, say then whole text file will become one line !

For point.2, how far should I read to get that para order has changed.

Well user requirement do vary and we programmer try to satisfy them, that's why they pay us, but we all programmer are tied by the limitation of the programming logic.

Here I do not think BC has failed. Please point me the program which does it, I will like to check it.

Regards,

Anand

tomos:

* 2. BC is unable to "detect" changes in paragraph order.[..]
For point.2, how far should I read to get that para order has changed
-anandcoral (September 06, 2014, 02:38 AM)
--- End quote ---

this seems like a reasonable wish to me: simply indicating somehow that the block of text is elsewhere in the other file would be very helpful.
(Note: I dont use BC - not yet anyways)

(I'd imagine that this could be problematic with comparing code though, or files where stuff gets repeated/duplicated a lot.)

peter.s:
we convene that lines are "paragraphs", i.e. separated by line feed or other characters or character combinations; the differ should identify these, and/or the user should be able to identify these by option; it's understood that we don't speak of displayed screen "lines"

regular comparison:
text A - text B
compare line A1 to line B1 and so on
if beyond line Bx there are y non corresponding lines, compare line Ax+1 with every line Bx+y+1+z within a reasonable scope, or perhaps shift to B after some tries and try to find lines in A, corresponding to lines in B
i.e. I did not really delve into regular comparison

problem 1 above, by option, do this additional comparison, interwoven with regular comparison:
compare Ax with Bx, but also with compound of (Bx and Bx+1), the control-n or control-n plus control-r in B being considered null, a space, a space and a special character and a space, or any other "replacement" string the user will make the differ consider "equal", just as in regular comparison presets for text IN line (cf. the various and otherwise very satisfying possibilities for in-line comparison in BC)
if by this "joined lines" comparison, Bx and Bx+1 are considered equal to Ax,
then next don't compare Ax+1 with Bx+2, but with Bx+2, and, according to our optional rule here, to Bx+3, etc.
do respectively for comparing Bx to Ax and (Ax plus Ax+1); the option would be simpler to program if the user had to decide on which "side" the possible "1 line split up into 2 lines" could occur, but it's evident it would be more "elegant" for the differ to detect such splits on both sides (I do not treat the additional problem brought up by 3-way comparisons)
maintain additional buffers in order to store these intermediate results and do again regular compare from these

problem 2 above, by option
first, the underlying idea:
do the regular comparison
identify (i.e. put into an additional buffer A2) all lines in A for which the differ doesn't find corresponding lines in B by the regular comparison (which could include comparison according to problem 1 and other options)
compare these lines in A2, one by one, with every line in an additional buffer B2 which will contain every line of B for which the differ has NOT identified, by the above "rich regular" comparison, a corresponding line in A
if A2x is present in B2, replace the line "not found" in original buffer B (and which corresponds to the corresponding line in original buffer A), by a line "REL" for "relocated" or such (and mark the line in A2 als "resolved")
idem for the "not found" line in A, which corresponds to the corresponding line in B (and which contains the respective line of B2): "REL" here in A's otherwise empty line which does not correspond to B's line (but which corresponds, as seen, to some OTHER line within A)
(it's evident that you will need ID numbers for every original line in A and B, and which follows them into the additional buffers, since line numbers in the intermediate buffers will not correspond to original line numbers in A and B)
remaining "unsolved" buffer A2 lines will correspond to those lines in A which do not corresponding lines in B, and ditto for orphan lines in B2, without corresponding lines in A, for lines in B
it's understood that the differ, by the above script, will consider every FIRST occurrence of a relocated line as the relocation, possible copies of that line further being considered as additional copies; in most real-life scenarios (expecially textbooks and programming with shuffling around of text passages or code bits), we all could live with such an assumption (see below)

and now for the real-life solution of problem 2 since identifying displaced lines is devoid of sense, in 99 p.c. of possible scenarios (you will notice that in my post above, I did NOT list "line 1/2/3" but "para 1/2/3"):
in fact, authors/programmers do not displace single lines, but compounds of lines, i.e. paragraph suites, or if you prefer, "real" paragraphs, comprising perhaps 3 or 15 "paragraphs" = technically, lines, and even some blank lines for grouping, within
this means, in order to resolve problem 2, in a realistic i.e. useful way, we must allow the user to identify "real paragraphs", e.g. by 2 blank lines (whilst 1 blank line would be identified as a regular blank line within such a "compound paragraph"), or a ^n^r (in this scenario, the author or programmer would "hold together" "real" paragraphs by just typing ^n and even, for blank lines, ^n^n, within them, but "close" them with a ^n^r); similar means of differenciating "real" paragraphs from "just new lines within a paragraph" would be conceivable
then, complication of my explanation above, this "identify replaced paras" routine would of course not do the above-explained work for lines, but first of all, it would unite all lines in A and in B, respectively, into "real para" compounds, and it would then do the above-described work for those paras instead (cf. 1 blank line in A vs. 20 blank lines in B, at the same position, and which do not cause any problem to traditional differs)

the only remaining weakness in this (albeit quite satisfactory yet) real-life solution would be the non-identification of relocated "real" paras in which additionally you will have done some minor changes: in such cases, the above routine would only identify two non-correspondences, one in A, one in B, and without telling you, neither for the A para nor for the B one (which, remember, has been relocated), that in fact, there is a corresponding piece of code or of text elsewhere, just slightly changed/amended
therefore, we spice up the above routine by a sub-routine which checks, for these paras in A2, all paras in B2 for "slight variation", similarly to a pic differ, and which, according to the preferences of the user, would allow for some changed or added or replaced or deleted single lines if the replaced code/text otherwise was clearly "identifiable" as a variant/amendment*
and finally, it's not really practical (i.e. especially in the case of not-100-p.c.-tally) that you would necessarily have just "REL" lines, on both sides, for these single occurrences at that given location in A resp. in B: by option, it should be possible to replicate the "text/code over there, HERE", instead, i.e. replaced "real paras" would be cloned in the corresponding B display, for their A occurence, even though they have been relocated in B, and another clone would be shown in A, at the corresponding position for their new position in B; these replications would be in a pre-determined color of course

*= the same could apply, by option, to NOT-relocated, but slightly changed text/code: here, too, an optional sub-routine could identify the whole "real para", and then display it in whole, on both sides, in a predefined color, instead of the usual display which would only display the non-changed lines within those text or code blocks, without their context; an even more useful variant of this would be a dedicated command/toggle "show full blocks on both sides" if the mouse cursor is, on either side, within a line identified as "not identical"

in a deluxe version, you then could even implement a similar routine which both would show indications to other (identical or near-) duplicates, and show ALL those, i.e. all occurences of them, of one or the other or both sides, alternatively to the regular display

it's all about the multiplication of intermediate / supporting buffers and concordance tables in arrays for lines and blocks

EDIT:

you also could identify the length of a block, and the fact if it's (to be considered and to be treated as) a block or not, by "following" the first relocated line of a possible block and then check, "over there", how much "meat" it comprises, i.e. you identify the first relocated occurence of its first line, then compare "backwards", i.e. identify if the lines following there (new location in B) are identical / similar to what follows the original line in A (but not on the location in B corresponding to that location in A); this would be much more elegant, especially if you allow for possible new content interwoven into that relocated block, i.e. do NOT stop the comparison after just some (new) lines in the B flavour of the block, but continue to search for content identical / similar to what follows in A, even (reasonably) further down there in B

at this time, e.g. in BC, "identical vs. non-identical" is either-or, i.e. you can define SPECIFIC inconsistencies to be regarded as "equal"; it's evident that you could introduce a "fuzzy" approach ("similar in spite of up to x differences of this kind AND/OR y differences of that other kind"), alternatively and in general, or specifically for within (possible) blocks

peter.s:
Whilst I'm just smiling about the innumerable mistakes hard core posters here make - it's not by posting 11,000 posts of which 10,950 are just narcissistic rubbish that you accelerate "progress" in this world, and in general, "opinions" are worthless anywhere if not backed up by (valid) arguments, but go and explain this to 90 p.c. or more of this world's population, and my intervening in third partys' affairs would never ever be but for substantial fees -, I'm unable to leave behind loose ends of my own - it's rare that I'm entirely wrong, but even omissions of alternative aspects really torture me as if I had deliberately lied. Hence my urge to put some of these things into perspective.

As we have seen above, the "programmer" having asked me about hints for HOW to amend text comparing, presumably had just asked me to "prove" my alleged incompetence, by showing I had been asking for routines I would not be able to code myself; well the Germans have a verbal expression for that: Get up earlier for... nice try, anyway: I read this post, had you spontaneously participating in my thinking about it (I call this "public thinking", or "thinking in public" if you prefer), and within some 2 hours and a half, had come up with valid advice - no "thank you" whatsoever, albeit previous pretending to need such advice in order to design the adequate sw - I always said it, there is a blatant lack in sw designers today, even 30 (or is it 35?) years after the "invention" of the so-called pc, whilst coding is for anyone it seems, myriads of tools and full-grown applications lacking the strict minimum of sw design proving my assertion every day.

This being said, I would like to develop two aspects of BC and of compare tools in general.

I

First of all, Beyond Compare (BC) quite successfully hide their real strong feature, which is file-and-folder-compare. In fact, these last weeks, I had delved a little bit into their help file (always in version 3, cf. above), and from then on, I have extensively used their folder/file compare (ffc) routines.

I had always said it, here and elsewhere: Their ffc display is outstanding (and remember, I said I had trialled every synch tool up to about 200 bucks). One day, I spoke of their lack to develop BC into a grown-up synch tool, and even in their (kind) answer, the fact was left out that even BC-of-today/yesterday (v. 3) is as good as ANY dedicated synch tool out there, except for Syncovery (cf. my thread on that's behalf), and with BC having the superior display, far superior to anything else I've ever seen in that competition (and even far superior to Vice Versa's, which is a joy to use in its own respect).

a)

I really became interested in BC's ffc capabilities by needing a useful ftp tool; I construct my sites not by "Wordpress" as everyone else (incl. prof. developers, it seems) appears to do now, but within some 2-pane outliner, with heavy scripting (in AHK, of course, cf. my musings about that script tool on this site): Some coding within the outliner, then export to html, then an extensive script runs on the compound html file in order to create both the respective site content elements (which are necessarily different for every page of your site... and be it the bolded entries in otherwise identical trees/lists), and multiple simili-clones*, both of pages and of whole substructures/chapters.

* = Of interest here: Outliners which allow for "live clones" (e.g. Ultra Recall; MyInfo e.g. on the other hand does not update cloned parent items, which would exclude it from any such "automate cloning in my sites" idea anyway... and no, RightNote did not do anything about their ridiculous referring to thru items in their items' history...) surprisingly do not have a conceptual advantage here, since, in light of the above (= different site content elements for each individual page of that site), their "total (!) cloning paradigm" is counterproductive to say the least; in other words, your outliner supporting cloning or not, you will have to find other means in order to replicate both individual pages and entire substructures in other parts of your site(s).

Now I urgently invite you to read BC's help file's "Folder Compare" chapter thoroughly, and especially the page named "Folder Sync", and then try to use BC as an ftp tool: You'll be delighted to a degree that will you felicitate yourself to have bought BC in the first place*, even if you have discarded it as your text compare tool in the meantime (cf. my posts above): BC** is a first-rate ftp tool***, as you will quickly discover.

*= It's trial version's 30 days are non-consecutive, so you will have plenty of time before deciding about buying.

**= "BC" means "BC Prof" in my musings; never bothered about the standard version, so cannot say if / to what degree it will do the work, too

***= There are some dedicated ftp tools, but "reviews" out there being as bad as they are, I had not been able to decide which one(s) is/are able to synch between your local folder and your site-sided folder correctly, without trialling them one by one, which I did not do; on the other hand, the usual file managers are quite underwhelming, and if some of them both offers "folder synch" and "ftp", that not necessarily means that it offers them combined, too.

b)

Also, some other both quite hidden and in special cases tremendously (and what do I say: spectacularly!) useful ffc capabilities of BCC lies in the option "Ignore folder structure"; you will see similar functionality in X2's "flatten out subfolders" function (or whatever they call it), but here in BC you will have got it within a real synch environment, which will bring outstanding results, in special cases where no other means applies anymore. (Just imagine your backup image is faulty, and you try to save what it gets, or some chaos you will have created by working on files on two different devices at the same time...) In this context, don't overlook the right-click command(s) "Copy to..." (or "Move to...", of course).

Of course, for traditional synch jobs, an excellent tool like Syncovery arguably remains preferable, since it's able to

In summary, BC is a substandard text comparer, very unfortunately, but you will never regret your 50 bucks / soon 100 euro (incl. VAT) if you thoroughly use it for ffc and ftp.

II

As for differs detecting moved "lines", just some useful links (in disorder); as some almost-11,000-mostly-unuseful-poster said, you're expected to look them up by yourself, and to make up your mind on them without my guiding hand, or in other words, I'm too lazy to develop on them here and today:

http://en.wikipedia.org/wiki/Diff-Text
http://www.diff-text.com/ (online)
http://stackoverflow.com/questions/96051/which-file-comparison-tool-can-handle-block-movement-and-multiple-revisions
http://superuser.com/questions/184969/how-to-ignore-moved-lines-in-a-diff
http://www.grigsoft.com/wincmp3.htm
https://en.wikipedia.org/wiki/User:Cacycle/diff

http://www.scootersoftware.com/vbulletin/showthread.php?2259-Moved-lines-and-Alignment-on-File-comparsion and
http://www.scootersoftware.com/support.php?zz=kb_externalconversion (just compare, this is outrageous, especially in the light of my post above, "lines" vs. blocks)
http://stackoverflow.com/questions/10066129/is-there-a-diff-like-algorithm-that-handles-moving-block-of-lines

http://www.semanticdesigns.com/Products/SmartDifferencer/ (very interesting approach, commercially, and otherwise quite revolting: they differenciate what I said above, for every possible coding language (I would not call them "programming languages" anymore, programming being the compound of sw design AND sw coding), and then sell the spliced-up sub-routines to their customers, instead of providing those alternatives by options/settings, even in combination, and especially, instead of doing some smart thinking about non-coding texts (of all kinds, btw: legal, textbook, or, even, why not, stage or screenplays, and let alone user-specific settings, independently of, and additionaly to, some such standard text TYPE "format"))

http://blog.bartdemeyer.be/2013/04/new-merge-tool-semanticmerge-tested-on-svn/ and finally (April 4th, 1978!):
http://dl.acm.org/citation.cfm?doid=359460.359467 :
http://documents.scribd.com/docs/10ro9oowpo1h81pgh1as.pdf : enjoy!

EDIT: I'm sorry I left out one important link from this list: Walter F. Tichy 1983 (!):
http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1377&context=cstech

P.S. The intellectual level of this site really went down these last weeks; I very much hope my reading experience here will improve by remaining posters striving to amend their argumentation... and if they don't have got any, by their refraining from posting to begin with. Thank you so very much.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version