topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 10:06 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: extracting info from pdf  (Read 44402 times)

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #25 on: September 20, 2010, 04:16 PM »
This may depend on the file - but with adobe reader I just selected an image in a pdf (via drag + select the area around it, then "copy image" in the context menu). I was able to paste the image in Evernote and MS Paint.

I dont think even the pdf reader will give you the option to show images at their original resolution - I suspect there could even be images with different resolutions within the one file. So I think you cant expect Screenshot captor (or other) to do that (I mean considering it's the pdf reader has the file open/displayed)
Tom
« Last Edit: September 20, 2010, 04:18 PM by tomos »

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #26 on: September 20, 2010, 04:46 PM »
This may depend on the file - but with adobe reader I just selected an image in a pdf (via drag + select the area around it, then "copy image" in the context menu). I was able to paste the image in Evernote and MS Paint.

this is screenshot capture tool, with the above mentioned disadvantages

steveorg

  • Participant
  • Joined in 2007
  • *
  • Posts: 24
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #27 on: September 20, 2010, 04:48 PM »
I dont think even the pdf reader will give you the option to show images at their original resolution - I suspect there could even be images with different resolutions within the one file. So I think you cant expect Screenshot captor (or other) to do that (I mean considering it's the pdf reader has the file open/displayed)

I was going to make a similar point, but wanted to test it first. I'm far from an expert, but it has been my understanding (partly from experience) that a pdf rarely has enough data to extract components that are as detailed as the original source. On the contrary, the more efficient the pdf creation program, the smaller the file size. The pdf program should provide the least amount of data that is required to create the desired appearance.

For a bit mapped graphic, what you see is probably the best you'll get. I guess in theory, scalar graphics are more flexible, but you may need appropriate software. Fonts may also scale under the right circumstances.

This is the document version of "You can't go home again." :P


tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #28 on: September 20, 2010, 05:32 PM »
This may depend on the file - but with adobe reader I just selected an image in a pdf (via drag + select the area around it, then "copy image" in the context menu). I was able to paste the image in Evernote and MS Paint.

this is screenshot capture tool, with the above mentioned disadvantages

ah okay, (you said 'pdf editor' in your post, above the screenshot)

My point stands though: if you want to get the best quality image, copy it out of the pdf reader. You cannot expect the screenshot app to manipulate the pdf reader to give the best possible display - especially if the pdf reader itself cannot even do this.

on the other hand, as steveorg says, most pdf creators are focused on making the file smaller so the original image quality in the pdf might not be so good anyways...
Tom

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #29 on: September 21, 2010, 10:09 AM »
My point stands though: if you want to get the best quality image, copy it out of the pdf reader. You cannot expect the screenshot app to manipulate the pdf reader to give the best possible display - especially if the pdf reader itself cannot even do this.
let's say ok
now let's be a bit practical
what is the best time to copy in order to have the graphics in best quality? when pdf file is zoomed at 150% ? at 200 % ? at 300 % ?
at 400% graphics starts to pixelized
at 70% graphics starts is too small
...

on the other hand, as steveorg says, most pdf creators are focused on making the file smaller so the original image quality in the pdf might not be so good anyways...

this doesn't matter, I just want the best image quality in the pdf file, not the image quality of the initial graphics file

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #30 on: September 21, 2010, 10:52 AM »
If you could give an example pdf to do this operation, perhaps we all could experiment with the various tools each of us has.

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #31 on: September 21, 2010, 11:06 AM »
now let's be a bit practical
what is the best time to copy in order to have the graphics in best quality? when pdf file is zoomed at 150% ? at 200 % ? at 300 % ?

-neither.

1) Extract the pictures, if they actually are pictures and not just PDF generated background.
2) If they aren't genuine pictures, I would make a simple screenshot at 100%.


PdfEDITOR.gif



PdfTo.gif


« Last Edit: September 21, 2010, 11:10 AM by Curt »

TomD101

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 48
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #32 on: September 22, 2010, 04:49 AM »
Hello all,

When I was in need of a PDF->Office converter, I found after some testing the program SolidConverterPDF from www.soliddocuments.com.

I have no idea, how they do it, but the results are simply stunning. Of course, not everything is possible, but I converted manuals for devices like TV sets, DVD-recording machines, scientific books and whatnot. Just incredible.

The trial lets you convert 10 percent of the original document, max. 10 pages and adds a watermark.

I think, this is ok for testing. Prices start with $ 80 for a single user license.
The support is great, very personal and really able to solve problems.

Give it a try and no, I am not connected to them.

Thomas
Berlin, Germany
The more things stay, the more they change the sane.

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #33 on: September 22, 2010, 07:29 AM »
-hello Thomas, and welcome back (again again!) :-)


Solid Doc's "WYSIWYG Content Extraction" really is quite impressive, (it sure made me consider a license for yet another program to be used once or twice a year... haha), it may actually be what was asked for by the starter of the thread. But another problem is that the same person seems to want everything for nothing, so I expect even the very mentioning of the price, was a turn-off!

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #34 on: September 22, 2010, 12:18 PM »
I don't have a problem with price, if a program can do what I want

zooming an A4 PDF at 100% makes the PDF not to fill the whole 15" screen
and then taking a screenshot at that zoom, results in a small low resolution photo, it doesnt maximize the info that the graphics file can contain

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #35 on: September 22, 2010, 02:03 PM »
I don't have a problem with price, if a program can do what I want

zooming an A4 PDF at 100% makes the PDF not to fill the whole 15" screen
and then taking a screenshot at that zoom, results in a small low resolution photo, it doesnt maximize the info that the graphics file can contain

I dont think you can say what is the best zoom. I personally make PDF's with 300-400dpi images and you will get many PDF's with 72pdi images.
I'd go as large as possible before the screenshot - unless you think the images looks better smaller which would probably rarely happen.

If you've no problem with price I'd try TomD101's suggestion cause I think you can (hopefully) do a lot better than going the screenshot route - especially if you will be doing this often. Then if something occasionally doesnt work you could grab a screenshot and insert it into the converted file.
Tom

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #36 on: September 22, 2010, 02:54 PM »
I don't have a problem with price, if a program can do what I want

-that is good.
I am sure more people than me are looking forward to read what you think of Solid Doc'


Good luck on your way to your post number 300  :)
Can DonationCoder's forum do what you want?

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #37 on: October 03, 2010, 05:49 AM »
ok I test Solid PDF Tools (it is the most complete software from that company)
where is actually the WYSIWYG extractor???
so far I see just what other PDF editors have

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #38 on: October 03, 2010, 12:30 PM »
anyone???
I am in a hurry!!

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #39 on: October 05, 2010, 09:19 AM »
??????

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #40 on: October 05, 2010, 10:35 AM »
If you could give an example pdf to do this operation, perhaps we all could experiment with the various tools each of us has.
-plus of course a much more precise description of what the job is.

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #41 on: October 07, 2010, 12:11 PM »
but I already mentioned this

it is about extracting a photo, diagram, index, etc from a pdf file, but not by taking a screenshot that is not precise (since it varies with zoom value)

you told me that Solid PDF Tools offer this, to automatically recognize/select a table, graphics etc (all pdf editors do this) and to extract/save it as image file (none pdf editor does this, they only do it if you take a screenshot)

there is no way to work properly with pdf files, i wonder why they created such format, it is very frustrating

TO SUM UP:
i just need to be able to extract graphics, but to do so properly, which means:

1) in the optimum resolution (which means best possible quality, without distortion resulting from too big zoom, or without loss of quality resulting from too small zoom)
2) with the optimum borders (which means optimumly proportioned and not missing any area of the graphics, even if that area is empty)

also, i would like to be able to extract tables, diagrams etc in a format that i can easily replace their text, without damaging the format, architecture, etc of the graph, diagram, table, etc, but i bet this is too much for pdf format
« Last Edit: October 07, 2010, 01:18 PM by kalos »

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #42 on: October 07, 2010, 03:24 PM »
Here is part of the manual for pdfimages, part of the XPDF suite:

------------------------------------------------------------------------------
pdfimages(1)                                                      pdfimages(1)



NAME
       pdfimages  -  Portable  Document  Format (PDF) image extractor (version
       3.02)

SYNOPSIS
       pdfimages [options] PDF-file image-root

DESCRIPTION
       Pdfimages saves images from a Portable Document Format  (PDF)  file  as
       Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files.

       Pdfimages  reads  the  PDF file, scans one or more pages, PDF-file, and
       writes one PPM, PBM, or JPEG file for each  image,  image-root-nnn.xxx,
       where  nnn  is  the image number and xxx is the image type (.ppm, .pbm,
       .jpg).

       NB: pdfimages extracts the raw image data from the  PDF  file,  without
       performing  any  additional  transforms.  Any rotation, clipping, color
       inversion, etc. done by the PDF content stream is ignored.
------------------------------------------------------------------------------

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #43 on: October 07, 2010, 05:26 PM »
Because of your request only, I have now tested Solid PDF Tools, and I must say that I cannot help thinking you may not yet have fully understood how to use the program. It can do all you asked for. If you still have the program installed, please watch the online tutorials, and read the manual. Remember that the program not will edit picture, Excel or Word files, it will only create them. (Look for a new folder!)

http://www.soliddocu...ect=CreatePDFtoExcel etcetera.

My Nitro PDF PRO OCR will also do what you ask for. My AnyBizSoft 5-in-1 PDF, as well.

2010-10-08_002549-1031.gif
« Last Edit: October 08, 2010, 03:06 AM by Curt »

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #44 on: October 08, 2010, 11:04 AM »
it was because of you that I tested Solid PDF Tools

wait, what procedure do you follow in Nitro PDF?

1)
Click EDIT, then click on the graphics you want to copy in the pdf file, then right click COPY, then paste in MS Paint?
if so, it doesn't work always, to be honest, it doesn't work with most graphics, maybe because the graphics are 'protected'

2)
Click "Snapshot" then drag to select an area then paste in MS Paint?
this way ALL the above mentioned problems occur (not optimum resolution, not optimum borders)

I am curious to reading your way with this

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #45 on: October 08, 2010, 12:50 PM »
no "copy" or "save as" or ..., but "extract"!

Extract tables, extract images, extract this and that:

createselective_1.png



kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #46 on: October 08, 2010, 01:28 PM »
oh, i have already tested this!!!

it does nothing from this pdf, it doesn't extract anything!!!

http://ifile.it/78znxpc

Curt

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 7,566
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #47 on: October 08, 2010, 02:29 PM »
your test file does not contain any tables or pictures at all, so there is nothing to extract, except text. It MAY have been Excel tables when it was created, not now, but it was more likely made in Word and Emax Draw, or similar. The big figure is made of many small parts; each column is a figure, each letter is a figure, etcetera. I am sorry for you, but there is no way you can extract all this as a unit.

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #48 on: October 08, 2010, 03:24 PM »
I know, that's why I need a screenshot-like tool that will take advantage of pdf editor's abilities to mark the appropriate borders and recognize the exact area to be copied

i am also in search of a way to estimate the optimum resolution before taking the snapshot

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
Re: extracting info from pdf
« Reply #49 on: October 08, 2010, 04:07 PM »
i am also in search of a way to estimate the optimum resolution before taking the snapshot

The optimum resolution for a screenshot would simply be the screen resolution [I presume that's what screenshot tools choose(?)]. If you want to take a screenshot of a vector image/graph/etc (as in your sample pdf) you could enlarge the image as much as you can before taking the screenshot. That's your best quality there.

It different with pixel images (jpg png gif etc) as I mentioned before (you're better off extracting them if possible).
Tom