topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday April 25, 2024, 12:38 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: make scan PDF into text searchable PDF  (Read 3967 times)

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
make scan PDF into text searchable PDF
« on: September 19, 2022, 05:24 PM »
Hi Forum!

Trying to do a 20 megabyte scanned book.

Once I did a book with PDF2GO, but the online tools all seem to have limits or glitch out on memory or something.

Willing to buy Shareware if need be.
Wondershare PDFElement will not do that feature in shareware mode and the cost is about $100 depending on license.
Willing to do it but only if there are not good alternatives.

My Acrobat Reader is great for reading PDFs but does not have that feature.
Similar with my Soda PDF Desktop

Your thoughts?

Thanks!




IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: make scan PDF into text searchable PDF
« Reply #1 on: September 21, 2022, 08:56 AM »
Hello Steven Avery.
OCRing (Optical Character Reading) a 20Mb image-scanned book shouldn't present any problems. There exists plenty of technology to do it for $FREE without needing to be scammed by Adobe and others.

I've been very interested in MICR, digital imaging and OCR technologies for years, as they provide an essential primary automated text data capture functionality.

In response to your above query, I'd suggest you check out these for starters:
* PDF-XChange Viewer ($FREE version) - Mini-Review
I did that review riding on the shoulders of various other erstwhile denizens of DCF who had preceded me.

* OCR - comparisons of different software/capability

* Qiqqa - Reference Management System - Mini-Review - a brilliant library management tool, it indexes existing .PDF OCRed documents, and scans and OCRs existing .PDF imaged documents and then indexes them. You can read your library of .PDF documents in Qiqqa.

Hope that helps or is of use.
I probably should update those reviews/notes, because the technology will have improved and what may have been perceived as shortcomings or niggles then will probably have been cleared away by now...

Could you please add to the knowledge base here (in this forum) by noting and cross-referencing whatever you discover whilst trying to meet your PDF OCR and text capture needs?

Thanks.
« Last Edit: September 22, 2022, 05:49 AM by IainB »

kunkel321

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 597
    • View Profile
    • Donate to Member
Re: make scan PDF into text searchable PDF
« Reply #2 on: September 22, 2022, 08:57 AM »
My experience with OCRing scanned stuff is that, if there are tables and you want to convert those into Word tables or Excel sheets, then you almost have to have something that uses the ABBYY OCR engine.    Unfortunately, to convert entire PDFs, you need the expensive subscription ware (which I'm not willing to pay for).    I do have the $10 ABBYY Screenshotter.  It only does one page at a time though.

I use PDF-XChange Editor.   I think it might use Tesseract technology.   It's pretty good too.  It messes up tables, but otherwise is good.

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Re: make scan PDF into text searchable PDF
« Reply #3 on: November 26, 2022, 09:30 PM »
Thanks!  Good answers. I have used PDF-Xchange Viewer, but not on my current puter and not for OCR.

It will be my first change.

Most of my books are already OCR and searchable, but I am ready to use the tips above.

Steven Avery

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,038
    • View Profile
    • Donate to Member
Re: make scan PDF into text searchable PDF
« Reply #4 on: November 27, 2022, 08:23 AM »
I think PDF Xchange Viewer was replaced by Editor and Editor Plus, although it seems to still be downloadable.
https://www.tracker-.../download?fileid=445

"The PDF-XChange Viewer has been replaced by the all NEW PDF-XChange Editor which extends the power of the Viewer PRO with many new features, headlining, Direct Content Editing of text based PDF files (Not PDFs created from images or scans).  A PDF-XChange Editor License will directly license the Viewer as well as the included PDF-XChange Lite virtual PDF printer."

Editor has a free version, but it does have restrictions.

Not sure yet what is best.

brahman

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 239
    • View Profile
    • Donate to Member
Re: make scan PDF into text searchable PDF
« Reply #5 on: November 27, 2022, 12:42 PM »
VueScan will be able to do the job of scanning a book and at the same time convert the book into a searchable PDF, without needing an extra step.

Furthermore VueScan will allow you to use a timer to start a new scan, so you don't have to press a scan button or make a mouse click every time you scan a new page. Also VueScan will most likely scan significantly faster than other 3rd party software.

PDF capabilities have expanded significantly in the last updates of Vuescan, though they were very good to begin with.

Here you will find a review of an old version, though keep in mind this review is 12 years old and Vuescan has improved since then, because development has been consistently moving forward.

https://www.donation...ex.php?topic=23658.0

When you buy, you need the professional version for OCR capabilities. Unless you want to use it only for one month, do the one time payment over the subscription model.

Possibly there may be a discount available when you do a google search. Maybe code OCT25 will still work for a 25% discount - not sure. If not, please do a search for "VueScan discount".

BTW: Correct - PDF Xchange Viewer has been replaced by Editor and the Viewer hasn't been updated for a long time. Editor is still free, though the free version may be hidden. I would use the free portable version at www.PortableApps.com.
Regards, Brahman