Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • September 26, 2016, 08:59:40 PM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Feature request: automatic OCR of captured images.  (Read 2238 times)

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 6,048
  • Slartibartfarst
    • View Profile
    • Donate to Member
Feature request: automatic OCR of captured images.
« on: November 24, 2014, 01:42:44 AM »
@mouser: Could you consider this please?
Based on this: Inside Microsoft OCR Libraries.

- I would really like to see if CHS could accommodate this:
...Perform OCR on any text in images as they are clipped ...
(i.e., similar to OneNote.)

- so that CHS would be able to do this with the captured images - i.e., just like with ordinary text capture clips:
...Look at this:
I have set up a child group in the CHS "tree" called "Auto-Tags". ...

Ideally, it might be most useful if the OCR'd text was attached somehow to the image file in the database, say to the CHS "Clip Text" part of the clip, so it would be searchable and copyable within CHS.
Or - just thinking aloud - this might (say) imply saving such images as .JPG files with the OCR'd text saved/appended as Alternative Text(?) or to (say) the Caption field in the IPTC section of that file. The idea would be to also enable things like Windows Search and image management tools (e.g., Picasa) to pick up the OCR'd text, though I am unsure whether that would even be possible with Windows Search without some kind of iFilter (e.g., as is required to index/search for text in .TIFF files).

cranioscopical

  • Friend of the Site
  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,358
    • View Profile
    • Donate to Member
Re: Feature request: automatic OCR of captured images.
« Reply #1 on: November 24, 2014, 08:33:37 AM »
^ +1  :Thmbsup:

kunkel321

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 464
    • View Profile
    • Donate to Member
Re: Feature request: automatic OCR of captured images.
« Reply #2 on: November 24, 2014, 02:43:07 PM »
Hell yes, +1.  The only reason I keep SnagIt installed is for the "Text Grab" feature in v10. 
If CHS had OCR capabilities, it would blow that out of the water!

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 6,048
  • Slartibartfarst
    • View Profile
    • Donate to Member
Re: Feature request: automatic OCR of captured images.
« Reply #3 on: January 04, 2015, 10:41:18 PM »
Thought I'd cross-post this relevant item from: Re: free ABBYY Screenshot Reader.
Quote
This relates to the above discussion and some separate discussions:
As a result of pursuing the idea of getting OCR data out of any text-containing images in my CHS database (per this request here: Feature request: automatic OCR of captured images.), I "Ducked" (DuckDuckgo) for things relevant to the subject, and happened upon this interesting post:
Quote
FREE OCR software: a survey of desktop and online tools - freewaregenius.com
Jun 18, 2013 By Priit 35 Comments
...
16. ABBYY Screenshot Reader
ABBYY Screenshot Reader is a screen capture software that can do screenshot OCR on the fly. Excellent recognition quality, amazing number of 160+ input languages can be selected, also multiple languages at a time. It can nicely handle data tables. ABBYY Screenshot Reader is reviewed here.
...
OCR - ABBYY Screen Clip Pros and Cons (data in image).jpg

Out of interest, I downloaded and installed the free ABBYY OCR clipping tool (it is now v9.0.0.1331) and then ran a comparison between it and OneNote's OCR clipping tool using an image containing a table.
The result? Very interesting. A hands-down win by the ABBYY tool:

OCR - comparison ABBYY Screen Clip v OneNote (table data in image).jpg
« Last Edit: January 06, 2015, 07:33:11 PM by IainB, Reason: Replaced the 2nd image with a better one. »

kunkel321

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 464
    • View Profile
    • Donate to Member
Re: Feature request: automatic OCR of captured images.
« Reply #4 on: January 06, 2015, 11:39:59 AM »
I've used ABBYY lots of times.   It's decent.  I finally got myself a copy of Acrobat XI  which has built in OCR.  Otherwise I'd still be using the ABBYY one.  fyi the ABBYY people have a product called PDF Transformer.  Every once in a while it appears as a freebie on the internets.  Has the same engines as the screenshot reader, I'm sure.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 6,048
  • Slartibartfarst
    • View Profile
    • Donate to Member
Re: Feature request: automatic OCR of captured images.
« Reply #5 on: January 06, 2015, 07:05:27 PM »
Yes, the ABBYY software seems really rather good at what it does.
As described in EPSON Perfection V330 Photo Scanner + ABBYY and ArcSoft software, I first came across it in the bundled software that came with that scanner.
The last time I had Acrobat was in its version 7, but I don't use it now and currently get .PDF OCR processing via a FREE software - see PDF-XChange Viewer ($FREE version) - Mini-Review.

My thought with the ABBYY ScreenshotReader was that it might be worth exploring to see whether it could be incorporated into the CHS process somehow, to meet the requirement for automatic OCR of captured images (those captured by CHS). This could be (say) upon the capture of each individual image, or perhaps as a post-capture batch process, or something. I had effectively been doing the latter - albeit manually - using OneNote, but the OCR capability of ABBYY ScreenshotReader seems to be superior to OneNote's OCR capability.

Added note: By the way, this is not to forget the very relevant point that any images in .TIF/.TIFF format can be automatically  OCR'd for text and indexed/searched by Windows Desktop Search, if you have the .TIFF iFilter installed. In my view, for client-based databases, this in itself could be a good reason for duplicating text-bearing images into .TIFF format.
Similarly, I gather that any/most images - i.e., not just those in .TIFF formats - which are stored in the Evernote "cloud" are OCR'd and indexed for searching, and .PDF imaged documents stored in Google Drive can be OCR'd and the text searched/extracted.
« Last Edit: January 06, 2015, 07:29:42 PM by IainB »