ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Screenshot tool with built-in OCR?

(1/4) > >>

rjbull:
Please, is there a good screenshot tool with built-in OCR?

I have to extract editable text from documents, including PDFs that are scanned-in image files.  PDF seem to be a problem because it handles fonts differently than Windows.  Some screenshot tools, like the recent free version of SnagIt, can't do it.  I could save an image and use Evernote to OCR the text, though I haven't tried that.  What I do at present is load the PDF into Foxit PDF Editor, increasing text size if necessary, and using freeware JOCR to capture and recognize the text.  This works surprisingly well, but JOCR is dependent on Microsoft Office components, which I'd rather avoid if possible.  I also have Able2Extract which isn't, erm, as able as its name, especially with small fonts.  [For comparison with Able2Extract, I've asked Darwin about Nuance elsewhere on DC].

[Edit]
And lived to discover I was wrong.  Able2Extract is able to do the job, see my confessional message   :-[
[/Edit]

So, please, any recommendations for anything "better?"

Thanks...

justice:
Microsoft Onenote or Evernote should do the job? they both come with their own screenshot tool

rjbull:
Hmmm...  Don't have OneNote, but hadn't thought of using Evernote like that.  Thanks!  :)

Saw a comment in the Snagit forums that WordScale by TMA Software does a good job in this area.  Haven't heard of TMA before.  They're too coy to put the price on their Web site, so I'll have to wait for an e-mail reply to find out the current rate.  But I really should try Evernote   :-[

Steven Avery:
Hi Folks,

Here is a summary of the softwares to consider for screen text capture (so far) with small notes.  There may be a few here that are worthwhile for precision and/or speed, mileage in usage tends to vary.  Some I include, including the Tesseract stuff, might state that they it is designed for higher-resolution or less cluttered input such as scanned documents.  Other stuff is geared to the document/office/scan environment.

So I will put the Tesseract and the well-liked MODI stuff separately, MODI includes JOCR and an excellent article by Jon Galloway.  However OneNote is included below, it can be separate from Office .. I wonder if it uses the exact same MODI component. Also I am not including .pdf to .txt converters, of which there are many.

No personal recommendations yet, however :

The Abbyy product for $10 looks new and may be a keeper.  

At least the info may save a spot of time searching.  Included is some office document stuff, since there is overlap and your company or clients may be using this or that.

================================

WEB OCR

Evernote - multi-platform
http://www.evernote.com/
This usage is often discussed on the net, have not checked for strengh of screen capture results.

OCR Terminal
http://www.ocrterminal.com/
"OCR Terminal is a free online Optical Character Recognition service that allows you to convert scanned images and PDFs into editable and text searchable documents. It accurately preserves formatting and layout of documents."

======================================================

COMMERCIAL AND SHAREWARE

Abbyy ScreenShot Reader - $10 - (2008 new product) - 30 day return
http://www.abbyyusa.com/shop/SSR.htm
"you can use Screenshot Reader to select and copy pieces of text from images, Flash files, PDFs, and other image-based files, then convert them into true text which you can then edit or insert into other documents."
ABBYY FineReader 9.0 Express Edition - $50 - trial 15 days
http://www.abbyyusa.com/frexpress/
Abbyy Scan to Office - $50 --> replaced by FineExpress Express per Tommy in sales
http://www.abbyyusa.com/shop/STO.htm
USA  (408) 457-9777 Milpetas, CA and international offices, main office Russia, no forums
(also Finereader in Corp section.)

Kleptomania - $30 StructuRise (also Textract SDK 2.9  for text-to-program) - Moscow
(40 day trial - nice screen pic demo)
http://www.structurise.com/kleptomania/
Good website, no forum, phone # (perfers email)

Capture Text - $30 - Dmitry Sokolovskiy  - Seattle WA (206) 888-6807 and email
http://www.capturetext.com/

Screen OCR - $30  - 21-day trial - (206) 338-5863 - Seattle, WA and email
http://www.screenocr.com/

Aqua Deskperience - $15 - Bucharest, Romania
http://www.deskperience.com/aqua-deskperience/aqua.html
Snapfiles - "30 day trial. Some features disabled."

TextBridge Pro 11 - $80 (Nuance - also do OmniPage) - Burlington, MA (781) 565-5000
http://www.nuance.com/textbridge/
Wiki, KnowledgeBase, etc.(30 day refund only - no trial)

OCR Tools - Desktop Application - Freeware
http://www.ocrtools.com/fi/prdOCRFree.aspx St Paul, MN
"OCR Desktop Application is a desktop utility that generates ASCII text from images such as a bitmap or image file. OCR Desktop is free, the registered version turns off popups and advertising."

Onenote - Microsoft - pricey $100 - unless with MS Office Suite - perhaps uses MODI component
http://office.microsoft.com/en-us/onenote/default.aspx
"paste or otherwise import an image containg text into a OneNote page, right click on it and choose "Copy Text from Picture"... then simply paste the copied text wherever you'd like it. Of course, it's not perfect OCR for low resolution." (note from Jon Galloway blog comments)

SimpleOCR - freeware - support is FAQ
And ScanStore Knoxville, TN - support phone # with order + web chat
http://www.simpleocr.com/
"May I use SimpleOCR to process screen captures? Sure, but SimpleOCR usually returns poor results with screen captures." -- also have command line and SDK kit.

Yet their store has fascinating reviews of the higher-end products - below.

====================================================

Screen Capture Programs with some Text Capture
(These are generally not OCR based, Windows API, or may have limited OCR functions.)

HyperSnap 6  - $35
http://www.hyperionics.com/
Tech Support Message Board
http://www.hyperionics.com/

Snag-It - TechSmith - $50
http://www.techsmith.com/screen-capture.asp
Forums
http://forums.techsmith.com/

Screen Hunter Pro - $30 - Wisdom-Soft
http://www.wisdom-soft.com/products/screenhunter.htm

WinCapture Pro - $40
http://www.wincapture.com/en/

======================================================

Special Placement

Cropper - Freeware
http://blogs.geekdojo.net/brian/articles/Cropper.aspx
(not sure of functionality, used by some as a tool starting point with open source)

JiveQ - Freeware - TMA Software - Encinitas, CA
http://www.tmasoftware.com/
http://www.pcadvisor.co.uk/downloads/index.cfm?categoryid=1470&itemId=72962
"What I found most annoying, though, is that it's not too hard to use the WebShot page-capture feature, but I never could figure out how to use the text capture."
Programmer library tool below.

==============================================

Windows Controls & API Text Capture

SysExporter - Nirsoft - Freeware
http://www.nirsoft.net/utils/sysexp.html


Textractor - Resplendence -Freeware
http://www.resplendence.com/textractor
Monitor and capture all text any program writes to the screen

================================

Programmer Library

TextGRAB SDK - $30
http://www.renovation-software.com/text-grab-sdk/textgrab-sdk.html
TextGRAB SDK is a library that allows screen text capture in Windows applications.
(Windows API, not OCR)

OCR Tools - (basic tool above)
http://www.ocrtools.com/fi/prdOCRStandard.aspx
Programmer Library  $600 -
OCR.Net Component is a .Net component that can be integrated into your application to generate ASCII text from a bitmap or an image file such as a TIF, GIF, BMP, or JPG file. The OCR .

WordScale Text Capture SDK
http://www.tmasoftware.com/
JiveQ above

Aqua Deskperience (program above)
Text Capture X - library for programmers
http://www.deskperience.com/screen-scraper/textcapturex.html

================================================

Geared to Corporate and Enterprise - thinking from scanning and docs more than screenshot

ScanStore
http://www.simpleocr.com/OCR_Software_Guide.asp
"We have tested the latest versions of FineReader, OmniPage, ReadIRIS, TextBridge and SimpleOCR and have determined that ABBYY FineReader 9 is the best overall value for business users, while ReadIRIS is the best Omnipage Review for under $150."

==============

Nuance - OmniPage 17 - Standard $150 - Professional $500 -
http://www.nuance.com/imaging/products/omnipage.asp
(Your biz clients may be using this already.)
OmniPage Capture Software Developers Kit - Capture SDK
           - PaperPort 11 Standard $100  - Professional $200
http://www.nuance.com/imaging/products/paperport.asp

Iris - ReadIris 12 Pro - $120
http://www.irislink.com/c2-1584-189/Readiris-12---OCR-Software.aspx

Abbyy - FineReader - $400 for Pro - company info above and less $ softwares
http://finereader.abbyy.com/  

=========================================

Screen Capture Tools: 40+ Free Tools and Techniques
http://www.hongkiat.com/blog/screen-capture-tools-40-free-tools-and-techniques/

Category: OCR Software [Optical Character Recognition Software]
http://www.scanguru.com/e107_plugins/links_page/links.php?cat.16

=========================================

Planned shortly, the MODI and Tesseract and Digital Camera/Phone notes sections.

Shalom,
Steven

Steven Avery:
Hi Folks,

These Tesseract products are interesting, although it is not likely that any are really high-class (good quality image and easy to use) for Screen Capture --> Text today.

THE BASICS

Tesseract OCR engine - Freeware Open Source
http://code.google.com/p/tesseract-ocr/
Forum (active)
http://groups.google.com/group/tesseract-ocr/

TESSERACT - WEB OCR

WeOCR server
http://asv.aso.ecei.tohoku.ac.jp/tesseract/
"This server cannot handle page images. The input image should be a TEXT BLOCK image containing some texts only, with clear background, without any dirt around the texts. "

PRODUCTS AND PROJECTS

FreeOCR
http://freeocr.co.uk/
FreeOCR is a scan & OCR program including the Tesseract free ocr engine also known as a Tesseract GUI. It includes a Windows installer and It is very simple to use and supports multi-page tiff's, fax documents as well as most image types including compressed Tiff's which the Tesseract engine on its own cannot read .It now has Twain scanning. ... Please note that the Tesseract OCR engine requires images at a resolution of 200 dpi or greater and as such it is not suited for reading PC screen shots which are only about 72dpi although we have made some enhancements in V2.3 which will produce better accurarcy from low quality image sources.

gscan2pdf - A GUI to produce a multipage PDF or DjVu from a scan. -
freeware open source - Tesseract
http://gscan2pdf.sourceforge.net/

Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine
http://www.pixel-technology.com/freeware/tessnet2/

Shalom,
Steven Avery

Navigation

[0] Message Index

[#] Next page

Go to full version