often I use text comparison tools, for example WinMerge, in order to compare two document versions, for example from different sources.
The problem is that the documents often originate from (different) OCR processes and are only slightly different mainly due to the OCR errors. Therefore, even if the contents of two original documents should be identical, WinMerge often does not recognize this and therefore marks the entire document (erroneously) as completely different.
Therefore my question is whether anyone knows of a text comparison tool that has a even more robust block similarity detection*
than for example WinMerge has.
Thanks for hints
*) Examples for similarity/moved block detection:
(also not good with OCR'ed files)
(could not find a not commercial
version for Windows)