topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 7:43 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: extracting info from pdf  (Read 44411 times)

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
extracting info from pdf
« on: September 12, 2010, 12:58 PM »
hello

I am starting to work with PDF files and I would like to extract a table and save it as a graphics file or as a MS Office table (not as Excel, because it has symbols, lines, etc, it is not only numerical data)

what is the best way to do this? I will incorporate it in a PowerPoint or MS Office document

I need it to be in the original quality

I also need not to use a crop tool, because I need the optimum margins, etc, is there a way to extract that table in the default way?

thanks
« Last Edit: September 12, 2010, 01:01 PM by kalos »

ha14

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 276
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #1 on: September 12, 2010, 01:50 PM »

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #2 on: September 13, 2010, 07:25 AM »
Nitro's new free reader might do it.

http://www.nitroreader.com/

Claiming the ability to extract certain things.

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #3 on: September 13, 2010, 12:42 PM »
Thanks for telling about this new reader, cmpm. Nitro's free reader is surprisingly good - especially at the price! It will however not load quite as fast as the advert makes you imagine, and for what I know, it can "merely" extract text and images, not 'tables'.

ha14

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 276
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #4 on: September 13, 2010, 01:44 PM »
Try this FreeOCR
http://www.freeocr.net/

FreeOCR free software can recover text in the image of a printed text, but also a scanned sheet and even a PDF
this tutorial is in french but illustrated and easy to follow
http://www.pcastuces...utique/ocr/page5.htm

daddydave

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 867
  • test
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #5 on: September 13, 2010, 02:52 PM »
I need it to be in the original quality

IMO this requirement will push you to spend money (and even then, still no guarantee it will be in the original quality). PDF Converter 7 (the $49.99 version) appears to have the ability to "Convert PDF and XPS documents into all Microsoft Office formats in a click."  I have never used it, but I know some people swear by the Pro version of it.

Unfortunately, I am not seeing a demo version on the site  :huh:

And of course Adobe Acrobat, the full version has the ability to save to Office formats as well, but it costs around $273 minimum in the U.S.
« Last Edit: September 13, 2010, 02:58 PM by daddydave »

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #6 on: September 13, 2010, 03:03 PM »
PDF Converter 7

This is one of the Nuance ones.  I have Able2Extract.  Darwin kindly did a test on his Nuance Pro (more expensive) on the complicated front page of a World patent, for me to compare.  Nuance did a slightly better job, but there wasn't a lot in it.  Able2Extract in fact uses technology licensed from Nuance.

daddydave

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 867
  • test
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #7 on: September 13, 2010, 03:11 PM »
PDF Converter 7

This is one of the Nuance ones.  I have Able2Extract.  Darwin kindly did a test on his Nuance Pro (more expensive) on the complicated front page of a World patent, for me to compare.  Nuance did a slightly better job, but there wasn't a lot in it.  Able2Extract in fact uses technology licensed from Nuance.

Sweet. The test was with the free version of Able2Extract, I take it?

cyberdiva

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,041
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #8 on: September 13, 2010, 03:45 PM »
Sweet. The test was with the free version of Able2Extract, I take it?
I didn't see any free version of Able2Extract when I went to their website.  Or was your statement meant tongue-in-cheek?

daddydave

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 867
  • test
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #9 on: September 13, 2010, 03:53 PM »
Sweet. The test was with the free version of Able2Extract, I take it?
I didn't see any free version of Able2Extract when I went to their website.  Or was your statement meant tongue-in-cheek?

It wasn't meant as tongue-in-cheek but now it appears I was wrong. It looks like they have a $99.95 version and a $129.95 version, and you can also buy a 30 day demo for $34.95.  :lol: (Seriously!)

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #10 on: September 13, 2010, 06:33 PM »
I need it to be in the original quality

IMO this requirement will push you to spend money (and even then, still no guarantee it will be in the original quality). PDF Converter 7 (the $49.99 version) appears to have the ability to "Convert PDF and XPS documents into all Microsoft Office formats in a click."  I have never used it, but I know some people swear by the Pro version of it.

-I believe the $100 PRO version is needed if .XPS must be included.

Nuance, compare features, (each screenshot approx 960 pixels wide)
Edited: not a good width without a widescreen! >The original pdf document< may suite your monitor better! [Pictures removed.]

Edit #2:
Actually, I imagine the $150 Enterprise version is needed, for the jobs kalos described -
Nuance is a converter, not an editor.
« Last Edit: September 13, 2010, 07:08 PM by Curt »

daddydave

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 867
  • test
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #11 on: September 13, 2010, 07:45 PM »
I believe the $100 PRO version is needed if .XPS must be included.

Then why does the web site say otherwise? I quoted directly from the description of the $49.99 product. At any rate, the original poster said nothing about XPS.
« Last Edit: September 13, 2010, 07:46 PM by daddydave »

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #12 on: September 14, 2010, 10:43 AM »
is there a free PDF editor/converter that works well? many converters fail to convert properly to doc, even Acrobat
if not, any paid one that is really good?

thanks

ha14

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 276
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #13 on: September 14, 2010, 01:30 PM »
try universal document converter http://www.print-driver.com/howto/

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #14 on: September 14, 2010, 02:52 PM »
The test was with the free version of Able2Extract, I take it?

No, painfully paid for.  Oh, and the reason why I chose Able2Extract over Nuance at the time - pre-test, as it happened - was because Nuance directed me to their UK shop, where they translate $US into £UK one-to-one.  I might have missed a slightly better program, but they missed my business through greed and taking UK residents for fools.

daddydave

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 867
  • test
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #15 on: September 14, 2010, 03:09 PM »
Oh, and the reason why I chose Able2Extract over Nuance at the time - pre-test, as it happened - was because Nuance directed me to their UK shop, where they translate $US into £UK one-to-one.  I might have missed a slightly better program, but they missed my business through greed and taking UK residents for fools.

Ah...got it. Yes, I did wonder about that, thanks for clarifying.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #16 on: September 14, 2010, 03:15 PM »
is there a free PDF editor/converter that works well? many converters fail to convert properly to doc, even Acrobat
if not, any paid one that is really good?

Like others, I doubt you will find a free tool to do what you want, especially with tables.  I've used XPDF for extracting text from PDFs.  It will also extract images and (I think) do a few other things.  You might try the free online file conversion service Zamzar, which I found quite good when I tried it.  There's at least one more similar service, Media-Convert.

daddydave

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 867
  • test
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #17 on: September 14, 2010, 03:24 PM »
I should add I have experiemented with freeware tools to convert PDFs to HTML (a potential intermediate format for Word), but the results were abysmal so I don't really recommend that.

Perry Mowbray

  • N.A.N.Y. Organizer
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 1,817
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #18 on: September 14, 2010, 10:05 PM »
I have, at times, used OCR Terminal home page: Online OCR extensively, and like most (I think probably all) other software the results can be a little flaky and definitely need editing before using.

One job I was using for was converting data printouts into Excel spreadsheets (which had to go via word), and apart from mixing some of the combined cells up, it did pretty well. Still needed an edit as the original document was not perfect quality and in a small font size... but it saved quite a bit of time in the end.
« Last Edit: September 14, 2010, 10:08 PM by Perry Mowbray »

StuR

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 4
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #19 on: September 17, 2010, 08:26 AM »
kalos,

I've had very good luck a number of times with http://www.pdftoword.com/default.aspx . Upload your pdf to their site, they convert it and return by email. Occasionally within a few minutes, always within 24 hours. And free!

Just two days ago at work I had them convert a 9-pg pdf with tables, highlighting, and a screenshot: the .doc version they returned was perfect.

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #20 on: September 17, 2010, 10:27 AM »
-your first post after FIVE years' membership?!
Wow, you're not a man of too many words. Respect!  :Thmbsup:


... http://www.pdftoword.com/default.aspx ...
... the .doc version they returned was perfect.

-as it also would be if you use their desktop program: Nitro PDF Professional :up:

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #21 on: September 17, 2010, 10:33 AM »
-your first post after FIVE years' membership?!
Wow, you're not a man of too many words. Respect!  :Thmbsup:


that's an honour to post in my thread

StuR

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 4
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #22 on: September 17, 2010, 10:38 AM »
Curt,

Yeah, I've been mostly a lurker here. I imagine there are a bunch of DC people like me: active and interested computer user both at home and at work, but not a coder, not a serious computer hobbyist, not too much interested in mucking about with hardware or modifying software or the like.

But I follow DC enthusiastically, and it's given me FARR and Screenshot Captor (which I use and talk up regularly) and a bunch of other interesting things.
Mouser is a god.

So I've held back not through reticence, but because most active posters have levels of knowledge and skill well above mine. When I feel I have something useful to add, I will.

(and now two posts in one day. It's a Trend!)


steveorg

  • Participant
  • Joined in 2007
  • *
  • Posts: 24
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #23 on: September 17, 2010, 04:19 PM »
...extract a table and save it as a graphics file...

...I will incorporate it in a PowerPoint or MS Office document...

...I need it to be in the original quality...

...not to use a crop tool, because I need the optimum margins, etc
-kalos

I'm focusing on the graphics approach that you mentioned because that just seems like the path of least resistance. Use Screenshot Captor to grab the image and your favorite graphics editor to adjust the margins to your liking.

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #24 on: September 20, 2010, 12:57 PM »
disadvantages of Screenshot Captor (and any other screenshot tool):

1) it captures the graphics not in the original/default resolution (if you zoom in/out the pdf, it will capture the graphics in different resolution) and this may result in a graphics without optimum resolution (too zoomed out may distort, too zoomed in may become unusable if you want it bigger)

2) it captures the graphics not in the original/default borders (since you drag the rectangular on your own) and this may result in a not well proportioned image

look this pdf editor how nice it recognizes and selects the image (see the blue borders):
205017.png
but it cannot copy and paste in an MS Paint file! it cannot extract it!