topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Sunday December 15, 2024, 11:22 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Screenshot tool with built-in OCR?  (Read 39350 times)

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Screenshot tool with built-in OCR?
« on: June 11, 2009, 05:51 AM »
Please, is there a good screenshot tool with built-in OCR?

I have to extract editable text from documents, including PDFs that are scanned-in image files.  PDF seem to be a problem because it handles fonts differently than Windows.  Some screenshot tools, like the recent free version of SnagIt, can't do it.  I could save an image and use Evernote to OCR the text, though I haven't tried that.  What I do at present is load the PDF into Foxit PDF Editor, increasing text size if necessary, and using freeware JOCR to capture and recognize the text.  This works surprisingly well, but JOCR is dependent on Microsoft Office components, which I'd rather avoid if possible.  I also have Able2Extract which isn't, erm, as able as its name, especially with small fonts.  [For comparison with Able2Extract, I've asked Darwin about Nuance elsewhere on DC].

[Edit]
And lived to discover I was wrong.  Able2Extract is able to do the job, see my confessional message   :-[
[/Edit]

So, please, any recommendations for anything "better?"

Thanks...
« Last Edit: June 18, 2009, 08:26 AM by rjbull »

justice

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,898
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #1 on: June 11, 2009, 06:33 AM »
Microsoft Onenote or Evernote should do the job? they both come with their own screenshot tool
« Last Edit: June 11, 2009, 07:17 AM by justice »

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #2 on: June 11, 2009, 09:26 AM »
Hmmm...  Don't have OneNote, but hadn't thought of using Evernote like that.  Thanks!  :)

Saw a comment in the Snagit forums that WordScale by TMA Software does a good job in this area.  Haven't heard of TMA before.  They're too coy to put the price on their Web site, so I'll have to wait for an e-mail reply to find out the current rate.  But I really should try Evernote   :-[

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Screen capture - OCR roundup (text capture)
« Reply #3 on: June 11, 2009, 12:41 PM »
Hi Folks,

Here is a summary of the softwares to consider for screen text capture (so far) with small notes.  There may be a few here that are worthwhile for precision and/or speed, mileage in usage tends to vary.  Some I include, including the Tesseract stuff, might state that they it is designed for higher-resolution or less cluttered input such as scanned documents.  Other stuff is geared to the document/office/scan environment.

So I will put the Tesseract and the well-liked MODI stuff separately, MODI includes JOCR and an excellent article by Jon Galloway.  However OneNote is included below, it can be separate from Office .. I wonder if it uses the exact same MODI component. Also I am not including .pdf to .txt converters, of which there are many.

No personal recommendations yet, however :

The Abbyy product for $10 looks new and may be a keeper.  

At least the info may save a spot of time searching.  Included is some office document stuff, since there is overlap and your company or clients may be using this or that.

================================

WEB OCR

Evernote - multi-platform
http://www.evernote.com/
This usage is often discussed on the net, have not checked for strengh of screen capture results.

OCR Terminal
http://www.ocrterminal.com/
"OCR Terminal is a free online Optical Character Recognition service that allows you to convert scanned images and PDFs into editable and text searchable documents. It accurately preserves formatting and layout of documents."

======================================================

COMMERCIAL AND SHAREWARE

Abbyy ScreenShot Reader - $10 - (2008 new product) - 30 day return
http://www.abbyyusa.com/shop/SSR.htm
"you can use Screenshot Reader to select and copy pieces of text from images, Flash files, PDFs, and other image-based files, then convert them into true text which you can then edit or insert into other documents."
ABBYY FineReader 9.0 Express Edition - $50 - trial 15 days
http://www.abbyyusa.com/frexpress/
Abbyy Scan to Office - $50 --> replaced by FineExpress Express per Tommy in sales
http://www.abbyyusa.com/shop/STO.htm
USA  (408) 457-9777 Milpetas, CA and international offices, main office Russia, no forums
(also Finereader in Corp section.)

Kleptomania - $30 StructuRise (also Textract SDK 2.9  for text-to-program) - Moscow
(40 day trial - nice screen pic demo)
http://www.structurise.com/kleptomania/
Good website, no forum, phone # (perfers email)

Capture Text - $30 - Dmitry Sokolovskiy  - Seattle WA (206) 888-6807 and email
http://www.capturetext.com/

Screen OCR - $30  - 21-day trial - (206) 338-5863 - Seattle, WA and email
http://www.screenocr.com/

Aqua Deskperience - $15 - Bucharest, Romania
http://www.deskperie...skperience/aqua.html
Snapfiles - "30 day trial. Some features disabled."

TextBridge Pro 11 - $80 (Nuance - also do OmniPage) - Burlington, MA (781) 565-5000
http://www.nuance.com/textbridge/
Wiki, KnowledgeBase, etc.(30 day refund only - no trial)

OCR Tools - Desktop Application - Freeware
http://www.ocrtools....m/fi/prdOCRFree.aspx St Paul, MN
"OCR Desktop Application is a desktop utility that generates ASCII text from images such as a bitmap or image file. OCR Desktop is free, the registered version turns off popups and advertising."

Onenote - Microsoft - pricey $100 - unless with MS Office Suite - perhaps uses MODI component
http://office.micros...onenote/default.aspx
"paste or otherwise import an image containg text into a OneNote page, right click on it and choose "Copy Text from Picture"... then simply paste the copied text wherever you'd like it. Of course, it's not perfect OCR for low resolution." (note from Jon Galloway blog comments)

SimpleOCR - freeware - support is FAQ
And ScanStore Knoxville, TN - support phone # with order + web chat
http://www.simpleocr.com/
"May I use SimpleOCR to process screen captures? Sure, but SimpleOCR usually returns poor results with screen captures." -- also have command line and SDK kit.

Yet their store has fascinating reviews of the higher-end products - below.

====================================================

Screen Capture Programs with some Text Capture
(These are generally not OCR based, Windows API, or may have limited OCR functions.)

HyperSnap 6  - $35
http://www.hyperionics.com/
Tech Support Message Board
http://www.hyperionics.com/

Snag-It - TechSmith - $50
http://www.techsmith...m/screen-capture.asp
Forums
http://forums.techsmith.com/

Screen Hunter Pro - $30 - Wisdom-Soft
http://www.wisdom-so...cts/screenhunter.htm

WinCapture Pro - $40
http://www.wincapture.com/en/

======================================================

Special Placement

Cropper - Freeware
http://blogs.geekdoj...rticles/Cropper.aspx
(not sure of functionality, used by some as a tool starting point with open source)

JiveQ - Freeware - TMA Software - Encinitas, CA
http://www.tmasoftware.com/
http://www.pcadvisor...470&itemId=72962
"What I found most annoying, though, is that it's not too hard to use the WebShot page-capture feature, but I never could figure out how to use the text capture."
Programmer library tool below.

==============================================

Windows Controls & API Text Capture

SysExporter - Nirsoft - Freeware
http://www.nirsoft.net/utils/sysexp.html


Textractor - Resplendence -Freeware
http://www.resplendence.com/textractor
Monitor and capture all text any program writes to the screen

================================

Programmer Library

TextGRAB SDK - $30
http://www.renovatio...dk/textgrab-sdk.html
TextGRAB SDK is a library that allows screen text capture in Windows applications.
(Windows API, not OCR)

OCR Tools - (basic tool above)
http://www.ocrtools..../prdOCRStandard.aspx
Programmer Library  $600 -
OCR.Net Component is a .Net component that can be integrated into your application to generate ASCII text from a bitmap or an image file such as a TIF, GIF, BMP, or JPG file. The OCR .

WordScale Text Capture SDK
http://www.tmasoftware.com/
JiveQ above

Aqua Deskperience (program above)
Text Capture X - library for programmers
http://www.deskperie...er/textcapturex.html

================================================

Geared to Corporate and Enterprise - thinking from scanning and docs more than screenshot

ScanStore
http://www.simpleocr...R_Software_Guide.asp
"We have tested the latest versions of FineReader, OmniPage, ReadIRIS, TextBridge and SimpleOCR and have determined that ABBYY FineReader 9 is the best overall value for business users, while ReadIRIS is the best Omnipage Review for under $150."

==============

Nuance - OmniPage 17 - Standard $150 - Professional $500 -
http://www.nuance.co...roducts/omnipage.asp
(Your biz clients may be using this already.)
OmniPage Capture Software Developers Kit - Capture SDK
           - PaperPort 11 Standard $100  - Professional $200
http://www.nuance.co...oducts/paperport.asp

Iris - ReadIris 12 Pro - $120
http://www.irislink....---OCR-Software.aspx

Abbyy - FineReader - $400 for Pro - company info above and less $ softwares
http://finereader.abbyy.com/  

=========================================

Screen Capture Tools: 40+ Free Tools and Techniques
http://www.hongkiat....ools-and-techniques/

Category: OCR Software [Optical Character Recognition Software]
http://www.scanguru....age/links.php?cat.16

=========================================

Planned shortly, the MODI and Tesseract and Digital Camera/Phone notes sections.

Shalom,
Steven
« Last Edit: June 12, 2009, 10:44 AM by Steven Avery »

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Screenshot - Tesseract (open source) OCR tools
« Reply #4 on: June 12, 2009, 07:13 AM »
Hi Folks,

These Tesseract products are interesting, although it is not likely that any are really high-class (good quality image and easy to use) for Screen Capture --> Text today.

THE BASICS

Tesseract OCR engine - Freeware Open Source
http://code.google.com/p/tesseract-ocr/
Forum (active)
http://groups.google...group/tesseract-ocr/

TESSERACT - WEB OCR

WeOCR server
http://asv.aso.ecei....oku.ac.jp/tesseract/
"This server cannot handle page images. The input image should be a TEXT BLOCK image containing some texts only, with clear background, without any dirt around the texts. "

PRODUCTS AND PROJECTS

FreeOCR
http://freeocr.co.uk/
FreeOCR is a scan & OCR program including the Tesseract free ocr engine also known as a Tesseract GUI. It includes a Windows installer and It is very simple to use and supports multi-page tiff's, fax documents as well as most image types including compressed Tiff's which the Tesseract engine on its own cannot read .It now has Twain scanning. ... Please note that the Tesseract OCR engine requires images at a resolution of 200 dpi or greater and as such it is not suited for reading PC screen shots which are only about 72dpi although we have made some enhancements in V2.3 which will produce better accurarcy from low quality image sources.

gscan2pdf - A GUI to produce a multipage PDF or DjVu from a scan. -
freeware open source - Tesseract
http://gscan2pdf.sourceforge.net/

Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine
http://www.pixel-tec...m/freeware/tessnet2/

Shalom,
Steven Avery
« Last Edit: June 12, 2009, 07:46 AM by Steven Avery »

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Screenshot - MODI - (Microsoft) interface methods
« Reply #5 on: June 12, 2009, 07:41 AM »
Hi Folks,

 First, the product mentioned by rjbull. 

"load the PDF into Foxit PDF Editor, increasing text size if necessary, and using freeware JOCR to capture and recognize the text.  This works surprisingly well, but JOCR is dependent on Microsoft Office components, which I'd rather avoid if possible."

Rjbull used this after a pre-processing through Fox-it PDF editor, and that is one of many methods.  I wonder if a .pdf editor is the best method for increasing text size, this is a bit hard to automate, I presume you could get the same results with an image editing tool but that Fox-it was easy to use ?  However the editor is a $100 product so for beyond trial that is pricey.

JOCR - Freeware (aka GOCR) - Open Source (2006)
Requires Microsoft Office component - "Microsoft Office Document Imaging" (MODI)
http://jocr.sourceforge.net/
EverRex Software - JCR Freeware
http://home.megapass...ng/Product_JOCR.html
"JOCR requires Microsoft Office 2003 or higher version. If JCOR does not work, please manually install "Micorosoft Office Document Imaging" (MODI) that is included in the setup file of Microsoft Office. You can find MODI under "Office Tools" of the setup file."

=========================================

This next article used MODI more directly through the Microsoft components and stirred tremendous interest, over more than 2 years !  This started with a screenshot to the clipboard using cropper.  (Any improvement if it is sent to a file as a .gif or .jpg or even a .bmp ? .. Dunno.)

==

Free OCR software? You may already have it...- Jon Galloway
http://weblogs.asp.n...t_2E002E002E00_.aspx

==

Note that OneNote is mentioned a number of time times in the comments, one pointing out that it is only so-so at low res, others seeming to indicate that the OneNote component is the MODI component .. perhaps.  There were many comments about which versions of Office have, or do not have, this component.

These were some of the more interesting comments in general.

"This is not free though, deceptive. The price for Office was in your computer. Many ... come without it." ...

"So much quicker than text bridge."

"One important point that you did not mention for people scanning files instead of pasting them is the .mdi extension.  Use this for ease of transfer of text to MS applications. Unfortunately, either way you export to Word, the accuracy is still not that great ... if you need to edit a document while keeping the document intact, then you'll have to buy a retail version. "

 "It doesn't do well on columns of numbers. "

"Microsoft Office Document Scanning located right underneath the Microsoft O D Imaging. It does it all. It uses your scanner and automactically runs the OCR...then it pasted it directly into the Microsoft O D Imaging. Then I clicked on the Export to Word toolbar button as you suggested and it worked! "

===================================================

PDF to TXT

  Here are the comments that relate to PDF-to-.txt .  Generally speaking, I'm not sure of the much utility in doing screen capture through PDF (does increasing size help that much ?)  however these relate to the issues in general.

"doing an image capture from the screen of an enlarged image of the PDF, converting that to TIF and using the Office OCR should work just fine.  However, be advised that if the PDF has such security settings, then working around them may be a violation of the owners copyright."

"I was surprised how well this worked on a pdf of a scanned document someone sent me.  Exported from Acrobat as a tiff and ran it through MODI into Word.  Only one spelling error.  If only it could keep formatting..."

"if you want to OCR and entire multipage PDF file, open in Adobe Reader and print using the printer "Microsoft Office Document Image Writer"; this will save a .mdi file and automatically open MS Office Document Imaging, from where you can OCR the entire doc."

Shalom,
Steven

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Screenshot - digital cameras and cell-phones
« Reply #6 on: June 12, 2009, 07:46 AM »
Hi Folks,

  From the Jon Galloway comments, these two notes are worthy of note.

"Nokia N95. Just take a picture with the phone camera (better at max pixel resolution), upload to your PC .jpg's and use OCR software to convert pictures to the text. I use a good old FineReader. The quality almost do not differ from the desktop scanner and takes less time to scan."

Please check out TopOCR, at www.topocr.com .  This is free OCR for your digital camera, it works quite well, and let's you use your cameras on your cellphone as a scanner.  You can save to PDF or other formats, even text to speech and MP3

Digital Camera OCR - Freeware
http://www.topocr.com/topocr.html

Shalom,
Steven

tinjaw

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,927
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #7 on: June 12, 2009, 09:56 AM »
Hmm, just thinking out loud. Nothing to contribute unfortunately.  :-[

You would think that the best-of-breed OCR engine would be an open sourced implementation of something from some university. I am sure something of that nature would be of great interest and value to academic institutions.

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Screenshot - Texttractor freeware from Resplendence
« Reply #8 on: June 12, 2009, 10:42 AM »
Hi Folks,

Textractor (freeware) - Resplendence
Monitor and capture all text any program writes to the screen
http://www.resplendence.com/textractor

Added to above list.  
Not OCR, reads the text that works through the Windows API.

Shalom,
Steven

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #9 on: June 18, 2009, 05:19 AM »
@Steven Avery:

Amazing list, thanks, though only some of them are of interest owing to needing to read PDFs.  The ABBYY ScreenShot Reader looks a good deal if you only want a (possibly better) equivalent of JOCR.  Thanks to Darwin, I reported Nuance's success at reading a complex page here on DC

pre-processing through Fox-it PDF editor, and that is one of many methods.  I wonder if a .pdf editor is the best method for increasing text size, this is a bit hard to automate, I presume you could get the same results with an image editing tool but that Fox-it was easy to use ?  However the editor is a $100 product so for beyond trial that is pricey.

Foxit PDF Editor: I have a licence for Foxit PDF Editor already, for other reasons.  I used it for this job because it renders PDFs at least as well as the other PDF utilities I tried (Adobe 9 Standard, PDF-XChange Viewer and Able2Extract Pro), and is by far the fastest to load.

In an unexpected (by me) new development, the latest version (4.25) of IrfanView has an OCR plugin, as noted here on DC.


Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Abbyy screenreader - $10 superb OCR
« Reply #10 on: December 21, 2009, 02:54 PM »
Hi Folks,

A llittle follow-up.

Abby ScreenReader is a $10 gem.
 
Very simple .. click, then make the box (or the whole page) and click again.
The text in the section goes to clipboard, ready to paste.

I used it the other day on a number of paragraphs in google books (the ones
in Limited Preview that do not have a text mode) for a project and it works
very fine.

This is ONLY a text generator to clipboard, it is not screenshot --> printer.

Correction, it can send an image as well to the clipboard, so that might have
uses too.

They added a 15 day trial, if it meets your needs, you will know in 15 minutes.

Shalom,
Steven
« Last Edit: March 19, 2010, 07:30 PM by Steven Avery »

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: Abbyy screenreader - $10 superb OCR
« Reply #11 on: January 15, 2010, 05:16 PM »
Abby ScreenReader is a $10 gem.

I tried installing it the other day.  It was very insistent on starting with Windows.  I run WinPatrol Plus, and Scottie kept barking at it because it wouldn't take No for an answer.  I see no reason whatsoever why it should start with Windows; I expect to load it as necessary.  I got annoyed, and concerned that it was behaving like malware.  I uninstalled it without even trying it out.

Was that too hasty?  Does anyone have good long-term experience with it?

cranioscopical

  • Friend of the Site
  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,776
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #12 on: January 15, 2010, 09:34 PM »
It worked well for me, FWIW

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #13 on: January 16, 2010, 04:48 PM »
It worked well for me, FWIW
-cranioscopical (January 15, 2010, 09:34 PM)

Following your reply, I re-downloaded and re-installed it.  This time I noticed and unchecked the box that said "Start with Windows?"
 :-[  :wallbash:

<sigh>

Thanks   :-[

cranioscopical

  • Friend of the Site
  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,776
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #14 on: January 16, 2010, 06:32 PM »
This time I noticed and unchecked the box that said "Start with Windows?"
-rjbull

 ;D Almost all of us all do stuff like that, far too often in my own case.

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #15 on: March 19, 2010, 07:31 PM »

Note on Abbyy ScreenReader.

For some reason they use your Remote Procedure Call server, with their own Abbyy licensing service. (Ok, they want to make sure your $10 program is licensed, but this seems to be an unusual method.)

I have had something, not sure what yet, disabling my Abbyy licensing service, making the program give me an informative message that the service is not there, do not start.

ABByy Screenshot Reader will be terminated.
ABByy licensing service is unavailable: The RPC service is unavailable.
Please contact your system administrator to solve this problem.

So far I have gotten around this simply with an uninstall and reinstall, cumbersome.  Why is the service disabled by something .. dunno yet.  (Zemana and WinPatrol are two possibles).

A bit unusual.

Shalom,
Steven Avery

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: Screenshot tool with built-in OCR?
« Reply #16 on: March 20, 2010, 07:08 AM »
Why is the service disabled by something .. dunno yet.  (Zemana and WinPatrol are two possibles).
-Steven Avery (March 19, 2010, 07:31 PM)

I run WinPatrolPlus, and it hasn't stopped ABBYY Screenshot Reader.