Author Topic: *NIX; tesseract OCR experiences (Read 6423 times)

ewemoa · « **on:** December 01, 2013, 10:07 PM »

Recently I've been trying out the tesseract OCR option (both via gimagereader and via the command line) with mixed (but tolerably good IMHO) results at least for English text.

In my usage, I notice occasional recognition results such as:

"fi" being recognized as "ﬁ" (see: http://www.fileforma.../char/fb01/index.htm)
"fl" being recognized as "ﬂ" (see: http://www.fileforma.../char/fb02/index.htm)
lowercase "w" recognized as "W"
lowercase "v" recognized as "V"
lowercase "w" recognized as "vv"

It seems to me that for some of these cases, there is little point in accepting the results as-is (e.g. "vv" seems like it's seldom used). I'm about to go through a page with a description of tesseract configuration parameters in hope of turning up something applicable -- but anyone have any relevant tesseract experience to share?

I'm using tesseract 3.02.02.

Author Topic: *NIX; tesseract OCR experiences (Read 6423 times)

ewemoa

*NIX; tesseract OCR experiences