ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Screenshot tool with built-in OCR?

<< < (2/4) > >>

Steven Avery:
Hi Folks,

 First, the product mentioned by rjbull. 

"load the PDF into Foxit PDF Editor, increasing text size if necessary, and using freeware JOCR to capture and recognize the text.  This works surprisingly well, but JOCR is dependent on Microsoft Office components, which I'd rather avoid if possible."

Rjbull used this after a pre-processing through Fox-it PDF editor, and that is one of many methods.  I wonder if a .pdf editor is the best method for increasing text size, this is a bit hard to automate, I presume you could get the same results with an image editing tool but that Fox-it was easy to use ?  However the editor is a $100 product so for beyond trial that is pricey.

JOCR - Freeware (aka GOCR) - Open Source (2006)
Requires Microsoft Office component - "Microsoft Office Document Imaging" (MODI)
http://jocr.sourceforge.net/
EverRex Software - JCR Freeware
http://home.megapass.co.kr/~woosjung/Product_JOCR.html
"JOCR requires Microsoft Office 2003 or higher version. If JCOR does not work, please manually install "Micorosoft Office Document Imaging" (MODI) that is included in the setup file of Microsoft Office. You can find MODI under "Office Tools" of the setup file."

=========================================

This next article used MODI more directly through the Microsoft components and stirred tremendous interest, over more than 2 years !  This started with a screenshot to the clipboard using cropper.  (Any improvement if it is sent to a file as a .gif or .jpg or even a .bmp ? .. Dunno.)

==

Free OCR software? You may already have it...- Jon Galloway
http://weblogs.asp.net/jgalloway/archive/2006/10/01/Free-OCR-software_3F00_-You-may-already-have-it_2E002E002E00_.aspx

==

Note that OneNote is mentioned a number of time times in the comments, one pointing out that it is only so-so at low res, others seeming to indicate that the OneNote component is the MODI component .. perhaps.  There were many comments about which versions of Office have, or do not have, this component.

These were some of the more interesting comments in general.

"This is not free though, deceptive. The price for Office was in your computer. Many ... come without it." ...

"So much quicker than text bridge."

"One important point that you did not mention for people scanning files instead of pasting them is the .mdi extension.  Use this for ease of transfer of text to MS applications. Unfortunately, either way you export to Word, the accuracy is still not that great ... if you need to edit a document while keeping the document intact, then you'll have to buy a retail version. "

 "It doesn't do well on columns of numbers. "

"Microsoft Office Document Scanning located right underneath the Microsoft O D Imaging. It does it all. It uses your scanner and automactically runs the OCR...then it pasted it directly into the Microsoft O D Imaging. Then I clicked on the Export to Word toolbar button as you suggested and it worked! "

===================================================

PDF to TXT

  Here are the comments that relate to PDF-to-.txt .  Generally speaking, I'm not sure of the much utility in doing screen capture through PDF (does increasing size help that much ?)  however these relate to the issues in general.

"doing an image capture from the screen of an enlarged image of the PDF, converting that to TIF and using the Office OCR should work just fine.  However, be advised that if the PDF has such security settings, then working around them may be a violation of the owners copyright."

"I was surprised how well this worked on a pdf of a scanned document someone sent me.  Exported from Acrobat as a tiff and ran it through MODI into Word.  Only one spelling error.  If only it could keep formatting..."

"if you want to OCR and entire multipage PDF file, open in Adobe Reader and print using the printer "Microsoft Office Document Image Writer"; this will save a .mdi file and automatically open MS Office Document Imaging, from where you can OCR the entire doc."

Shalom,
Steven

Steven Avery:
Hi Folks,

  From the Jon Galloway comments, these two notes are worthy of note.

"Nokia N95. Just take a picture with the phone camera (better at max pixel resolution), upload to your PC .jpg's and use OCR software to convert pictures to the text. I use a good old FineReader. The quality almost do not differ from the desktop scanner and takes less time to scan."

Please check out TopOCR, at www.topocr.com .  This is free OCR for your digital camera, it works quite well, and let's you use your cameras on your cellphone as a scanner.  You can save to PDF or other formats, even text to speech and MP3

Digital Camera OCR - Freeware
http://www.topocr.com/topocr.html

Shalom,
Steven

tinjaw:
Hmm, just thinking out loud. Nothing to contribute unfortunately.  :-[

You would think that the best-of-breed OCR engine would be an open sourced implementation of something from some university. I am sure something of that nature would be of great interest and value to academic institutions.

Steven Avery:
Hi Folks,

Textractor (freeware) - Resplendence
Monitor and capture all text any program writes to the screen
http://www.resplendence.com/textractor

Added to above list.  
Not OCR, reads the text that works through the Windows API.

Shalom,
Steven

rjbull:
@Steven Avery:

Amazing list, thanks, though only some of them are of interest owing to needing to read PDFs.  The ABBYY ScreenShot Reader looks a good deal if you only want a (possibly better) equivalent of JOCR.  Thanks to Darwin, I reported Nuance's success at reading a complex page here on DC

pre-processing through Fox-it PDF editor, and that is one of many methods.  I wonder if a .pdf editor is the best method for increasing text size, this is a bit hard to automate, I presume you could get the same results with an image editing tool but that Fox-it was easy to use ?  However the editor is a $100 product so for beyond trial that is pricey.

--- End quote ---

Foxit PDF Editor: I have a licence for Foxit PDF Editor already, for other reasons.  I used it for this job because it renders PDFs at least as well as the other PDF utilities I tried (Adobe 9 Standard, PDF-XChange Viewer and Able2Extract Pro), and is by far the fastest to load.

In an unexpected (by me) new development, the latest version (4.25) of IrfanView has an OCR plugin, as noted here on DC.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version