Recently I've been trying out the tesseract OCR option (both via gimagereader and via the command line) with mixed (but tolerably good IMHO) results at least for English text.
In my usage, I notice occasional recognition results such as:
It seems to me that for some of these cases, there is little point in accepting the results as-is (e.g. "vv" seems like it's seldom used). I'm about to go through a page with a description of tesseract configuration parameters
in hope of turning up something applicable -- but anyone have any relevant tesseract experience to share?
I'm using tesseract 3.02.02.