ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

OCR - comparisons of different software/capability

<< < (4/5) > >>

IainB:
The quality of any given OCR result is likely to depend on the quality of the scanned image input to the OCR engine. The better OCR scanning software will generally have done some pre-processing before the actual OCR scan.
The comparison of OCR scan results in the opening post of this thread showed some output of errors.

As a result of reading about ScanTailor in the DC Forum and elsewhere, I had downloaded and installed ScanTailor v0.9.11.1 (2012-02-27) 64bit. (Note that this is not the same as ScanTailor Enhanced that is referred to by @Nod5 in one of his posts.)

I had a sample image - a scan of two pages from a book. I straightened it up and tried to improve the B/W contrast using Picasa, but it OCRed in OneNote with quite a lot of errors that would be tedious to manually correct for. I wanted to see whether ScanTailor could help by preprocessing the image to enable a more accurate OCR scan.
So, take a look at this:
This is the input image (2 pages in one image): (click to enlarge)

OCR - comparisons of different software/capability


This is what ScanTailor saw had been input: (click to enlarge)

OCR - comparisons of different software/capability


After splitting the pages up and doing its work (including adjusting the DPI, deskewing, despeckling, etc.), this is what ScanTailor saw had been output: (click to enlarge)

OCR - comparisons of different software/capability


This is the output (2 separate pages) from ScanTailor showing the OCR result (with some errors highlighted) from OneNote: (click to enlarge)

OCR - comparisons of different software/capability


Pretty impressive!    :Thmbsup:

References:

* how to crop pdf
* QuickPicZone - tool to quickly make Scan Tailor picture zone
* N.A.N.Y 2013 Submission - BookCrop
* ScanTailor_multi_core
* Re: Review/Tips: "Scanning - VueScan and Associates" Pt.I: Intro & Bookscanning

Curt:
Last refreshed at 1:09 pm   ACE   DdShbOdrd   Next   refresh   at   1:14   pm
To display information as per your requirement on this dashboard please contact Coral Softwares
pos-20120225-SD BATCH DATA 230212
[NEW8]
Agent: No | Batch Yes | Location: No | VAT Yes
Datapath f:\Ace5\ACE5NEW8\
Cash balance as on date is 410 00
Todays Sale amount is 0 00
Todays Purchase amount is 0 00
This is NOT the final version.
Please check this version on proper data and
in all required combination.
Please inform for any anamolies to be fixed
and final version delivered
Debug mode enabled for XPPDEBUG LOG
□   Whats   New   inversion   10.0.019 Dated 18 Oct 2012 (13:13:11)
* Error in Transport Master, fixed.
See menu Help \ Whafs Newforfull list.
f:\Ace5\ACE5NEW8\IMAGE\user_Super/i sor jpg not fou nd-ABBYY Screenshot Reader about anandcoral's picture
--- End quote ---

-------------------
-------------------

Actually, I came here hoping to find a mentioning of today's Bits du Jour -offer: ABBYY PDF Transformer+

Does anyone here know if the ABBYY OCR part is the same in both PDF Transformer+ and Screenshot Reader ?  :tellme:
There is quite a difference in their prices...

IainB:
I don't know for sure, but I would expect that the OCR engine would probably be a common core process throughout the ABBYY range of software that involves image OCR scanning.
What my post above shows is that the post-processing of the scanned image that is fed into the OCR engine can make quite a bit of difference to the quality/accuracy of the OCR scan outputs.
ABBY Screenshot Reader clearly does some pretty smart stuff - e.g., it will scan for and output columnar information if you tell it that is what you want.

Nod5:
IainB: ScanTailor and ScanTailor Enhanced produce the same output in many cases, Enhanced has some extra settings and a few tweaks useful for command line processing. ScanTailor is great and preprocessing with it helps a lot for OCR. Though development has been on halt for quite some time now. There is a new maintainer but no feature updates yet. The small helper tools I've made aim to automate or speed up some steps. If you only want black and white text output and already have well cropped inputs (or precrop them with BookCrop or SoloCrop ) then you don't need to open the ScanTailor GUI.

Edit: for ScanTailor_multi_core , see this later version which support more cores.

wraith808:
I don't know for sure, but I would expect that the OCR engine would probably be a common core process throughout the ABBYY range of software that involves image OCR scanning.
What my post above shows is that the post-processing of the scanned image that is fed into the OCR engine can make quite a bit of difference to the quality/accuracy of the OCR scan outputs.
ABBY Screenshot Reader clearly does some pretty smart stuff - e.g., it will scan for and output columnar information if you tell it that is what you want.
-IainB (May 28, 2015, 12:48 AM)
--- End quote ---

It is, and it isn't.  This is from someone who just did an implementation of the ABBYY Finereader Engine in a custom app.  For low yield OCR, you won't really notice the difference.  But as you get higher, there are several different capabilities, settings, and languages that are left out.  Even at the level where we are, having the ability to use 4 cores to scan, there is still a higher level, that we won't get to until we scan 1,000,000 pages- the ability to use 8 cores.

So they build it with the ability to nickel-and-dime you for features, and believe me, they do nickel-and-dime you.  We were very close to just scrapping the whole contract they're so parasitic, but when they saw that, they just gave in.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version