Welcome Guest.   Make a donation to an author on the site October 25, 2014, 02:48:02 AM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
View the new Member Awards and Badges page.
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: Prev 1 [2] 3 Next   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: extracting info from pdf  (Read 16924 times)
tomos
Charter Member
***
Posts: 8,612



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #25 on: September 20, 2010, 04:16:18 PM »

This may depend on the file - but with adobe reader I just selected an image in a pdf (via drag + select the area around it, then "copy image" in the context menu). I was able to paste the image in Evernote and MS Paint.

I dont think even the pdf reader will give you the option to show images at their original resolution - I suspect there could even be images with different resolutions within the one file. So I think you cant expect Screenshot captor (or other) to do that (I mean considering it's the pdf reader has the file open/displayed)
« Last Edit: September 20, 2010, 04:18:04 PM by tomos » Logged

Tom
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #26 on: September 20, 2010, 04:46:42 PM »

This may depend on the file - but with adobe reader I just selected an image in a pdf (via drag + select the area around it, then "copy image" in the context menu). I was able to paste the image in Evernote and MS Paint.

this is screenshot capture tool, with the above mentioned disadvantages
Logged
steveorg
Participant
*
Posts: 21


View Profile Give some DonationCredits to this forum member
« Reply #27 on: September 20, 2010, 04:48:02 PM »

I dont think even the pdf reader will give you the option to show images at their original resolution - I suspect there could even be images with different resolutions within the one file. So I think you cant expect Screenshot captor (or other) to do that (I mean considering it's the pdf reader has the file open/displayed)

I was going to make a similar point, but wanted to test it first. I'm far from an expert, but it has been my understanding (partly from experience) that a pdf rarely has enough data to extract components that are as detailed as the original source. On the contrary, the more efficient the pdf creation program, the smaller the file size. The pdf program should provide the least amount of data that is required to create the desired appearance.

For a bit mapped graphic, what you see is probably the best you'll get. I guess in theory, scalar graphics are more flexible, but you may need appropriate software. Fonts may also scale under the right circumstances.

This is the document version of "You can't go home again." tongue

Logged
tomos
Charter Member
***
Posts: 8,612



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #28 on: September 20, 2010, 05:32:03 PM »

This may depend on the file - but with adobe reader I just selected an image in a pdf (via drag + select the area around it, then "copy image" in the context menu). I was able to paste the image in Evernote and MS Paint.

this is screenshot capture tool, with the above mentioned disadvantages

ah okay, (you said 'pdf editor' in your post, above the screenshot)

My point stands though: if you want to get the best quality image, copy it out of the pdf reader. You cannot expect the screenshot app to manipulate the pdf reader to give the best possible display - especially if the pdf reader itself cannot even do this.

on the other hand, as steveorg says, most pdf creators are focused on making the file smaller so the original image quality in the pdf might not be so good anyways...
Logged

Tom
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #29 on: September 21, 2010, 10:09:29 AM »

My point stands though: if you want to get the best quality image, copy it out of the pdf reader. You cannot expect the screenshot app to manipulate the pdf reader to give the best possible display - especially if the pdf reader itself cannot even do this.
let's say ok
now let's be a bit practical
what is the best time to copy in order to have the graphics in best quality? when pdf file is zoomed at 150% ? at 200 % ? at 300 % ?
at 400% graphics starts to pixelized
at 70% graphics starts is too small
...

on the other hand, as steveorg says, most pdf creators are focused on making the file smaller so the original image quality in the pdf might not be so good anyways...

this doesn't matter, I just want the best image quality in the pdf file, not the image quality of the initial graphics file
Logged
cmpm
Charter Member
***
Posts: 2,025

View Profile Give some DonationCredits to this forum member
« Reply #30 on: September 21, 2010, 10:52:09 AM »

If you could give an example pdf to do this operation, perhaps we all could experiment with the various tools each of us has.
Logged
Curt
Supporting Member
**
Posts: 6,338

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #31 on: September 21, 2010, 11:06:03 AM »

now let's be a bit practical
what is the best time to copy in order to have the graphics in best quality? when pdf file is zoomed at 150% ? at 200 % ? at 300 % ?

-neither.

1) Extract the pictures, if they actually are pictures and not just PDF generated background.
2) If they aren't genuine pictures, I would make a simple screenshot at 100%.









« Last Edit: September 21, 2010, 11:10:28 AM by Curt » Logged
TomD101
Supporting Member
**
Posts: 41


see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #32 on: September 22, 2010, 04:49:35 AM »

Hello all,

When I was in need of a PDF->Office converter, I found after some testing the program SolidConverterPDF from www.soliddocuments.com.

I have no idea, how they do it, but the results are simply stunning. Of course, not everything is possible, but I converted manuals for devices like TV sets, DVD-recording machines, scientific books and whatnot. Just incredible.

The trial lets you convert 10 percent of the original document, max. 10 pages and adds a watermark.

I think, this is ok for testing. Prices start with $ 80 for a single user license.
The support is great, very personal and really able to solve problems.

Give it a try and no, I am not connected to them.

Thomas
Berlin, Germany
Logged

The more things stay, the more they change the sane.
Curt
Supporting Member
**
Posts: 6,338

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #33 on: September 22, 2010, 07:29:26 AM »

-hello Thomas, and welcome back (again again!) :-)


Solid Doc's "WYSIWYG Content Extraction" really is quite impressive, (it sure made me consider a license for yet another program to be used once or twice a year... haha), it may actually be what was asked for by the starter of the thread. But another problem is that the same person seems to want everything for nothing, so I expect even the very mentioning of the price, was a turn-off!
Logged
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #34 on: September 22, 2010, 12:18:08 PM »

I don't have a problem with price, if a program can do what I want

zooming an A4 PDF at 100% makes the PDF not to fill the whole 15" screen
and then taking a screenshot at that zoom, results in a small low resolution photo, it doesnt maximize the info that the graphics file can contain
Logged
tomos
Charter Member
***
Posts: 8,612



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #35 on: September 22, 2010, 02:03:15 PM »

I don't have a problem with price, if a program can do what I want

zooming an A4 PDF at 100% makes the PDF not to fill the whole 15" screen
and then taking a screenshot at that zoom, results in a small low resolution photo, it doesnt maximize the info that the graphics file can contain

I dont think you can say what is the best zoom. I personally make PDF's with 300-400dpi images and you will get many PDF's with 72pdi images.
I'd go as large as possible before the screenshot - unless you think the images looks better smaller which would probably rarely happen.

If you've no problem with price I'd try TomD101's suggestion cause I think you can (hopefully) do a lot better than going the screenshot route - especially if you will be doing this often. Then if something occasionally doesnt work you could grab a screenshot and insert it into the converted file.
Logged

Tom
Curt
Supporting Member
**
Posts: 6,338

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #36 on: September 22, 2010, 02:54:43 PM »

I don't have a problem with price, if a program can do what I want

-that is good.
I am sure more people than me are looking forward to read what you think of Solid Doc'


Good luck on your way to your post number 300  smiley
Can DonationCoder's forum do what you want?
Logged
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #37 on: October 03, 2010, 05:49:05 AM »

ok I test Solid PDF Tools (it is the most complete software from that company)
where is actually the WYSIWYG extractor???
so far I see just what other PDF editors have
Logged
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #38 on: October 03, 2010, 12:30:11 PM »

anyone???
I am in a hurry!!
Logged
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #39 on: October 05, 2010, 09:19:34 AM »

??????
Logged
Curt
Supporting Member
**
Posts: 6,338

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #40 on: October 05, 2010, 10:35:15 AM »

If you could give an example pdf to do this operation, perhaps we all could experiment with the various tools each of us has.
-plus of course a much more precise description of what the job is.
Logged
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #41 on: October 07, 2010, 12:11:08 PM »

but I already mentioned this

it is about extracting a photo, diagram, index, etc from a pdf file, but not by taking a screenshot that is not precise (since it varies with zoom value)

you told me that Solid PDF Tools offer this, to automatically recognize/select a table, graphics etc (all pdf editors do this) and to extract/save it as image file (none pdf editor does this, they only do it if you take a screenshot)

there is no way to work properly with pdf files, i wonder why they created such format, it is very frustrating

TO SUM UP:
i just need to be able to extract graphics, but to do so properly, which means:

1) in the optimum resolution (which means best possible quality, without distortion resulting from too big zoom, or without loss of quality resulting from too small zoom)
2) with the optimum borders (which means optimumly proportioned and not missing any area of the graphics, even if that area is empty)

also, i would like to be able to extract tables, diagrams etc in a format that i can easily replace their text, without damaging the format, architecture, etc of the graph, diagram, table, etc, but i bet this is too much for pdf format
« Last Edit: October 07, 2010, 01:18:15 PM by kalos » Logged
rjbull
Charter Member
***
Posts: 2,778

View Profile Give some DonationCredits to this forum member
« Reply #42 on: October 07, 2010, 03:24:27 PM »

Here is part of the manual for pdfimages, part of the XPDF suite:

------------------------------------------------------------------------------
pdfimages(1)                                                      pdfimages(1)



NAME
       pdfimages  -  Portable  Document  Format (PDF) image extractor (version
       3.02)

SYNOPSIS
       pdfimages [options] PDF-file image-root

DESCRIPTION
       Pdfimages saves images from a Portable Document Format  (PDF)  file  as
       Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files.

       Pdfimages  reads  the  PDF file, scans one or more pages, PDF-file, and
       writes one PPM, PBM, or JPEG file for each  image,  image-root-nnn.xxx,
       where  nnn  is  the image number and xxx is the image type (.ppm, .pbm,
       .jpg).

       NB: pdfimages extracts the raw image data from the  PDF  file,  without
       performing  any  additional  transforms.  Any rotation, clipping, color
       inversion, etc. done by the PDF content stream is ignored.
------------------------------------------------------------------------------
Logged
Curt
Supporting Member
**
Posts: 6,338

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #43 on: October 07, 2010, 05:26:01 PM »

Because of your request only, I have now tested Solid PDF Tools, and I must say that I cannot help thinking you may not yet have fully understood how to use the program. It can do all you asked for. If you still have the program installed, please watch the online tutorials, and read the manual. Remember that the program not will edit picture, Excel or Word files, it will only create them. (Look for a new folder!)

http://www.soliddocuments...;subject=CreatePDFtoExcel etcetera.

My Nitro PDF PRO OCR will also do what you ask for. My AnyBizSoft 5-in-1 PDF, as well.

« Last Edit: October 08, 2010, 03:06:13 AM by Curt » Logged
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #44 on: October 08, 2010, 11:04:14 AM »

it was because of you that I tested Solid PDF Tools

wait, what procedure do you follow in Nitro PDF?

1)
Click EDIT, then click on the graphics you want to copy in the pdf file, then right click COPY, then paste in MS Paint?
if so, it doesn't work always, to be honest, it doesn't work with most graphics, maybe because the graphics are 'protected'

2)
Click "Snapshot" then drag to select an area then paste in MS Paint?
this way ALL the above mentioned problems occur (not optimum resolution, not optimum borders)

I am curious to reading your way with this
Logged
Curt
Supporting Member
**
Posts: 6,338

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #45 on: October 08, 2010, 12:50:29 PM »

no "copy" or "save as" or ..., but "extract"!

Extract tables, extract images, extract this and that:




Logged
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #46 on: October 08, 2010, 01:28:29 PM »

oh, i have already tested this!!!

it does nothing from this pdf, it doesn't extract anything!!!

http://ifile.it/78znxpc
Logged
Curt
Supporting Member
**
Posts: 6,338

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #47 on: October 08, 2010, 02:29:09 PM »

your test file does not contain any tables or pictures at all, so there is nothing to extract, except text. It MAY have been Excel tables when it was created, not now, but it was more likely made in Word and Emax Draw, or similar. The big figure is made of many small parts; each column is a figure, each letter is a figure, etcetera. I am sorry for you, but there is no way you can extract all this as a unit.
Logged
kalos
Member
**
Posts: 1,071

View Profile Give some DonationCredits to this forum member
« Reply #48 on: October 08, 2010, 03:24:32 PM »

I know, that's why I need a screenshot-like tool that will take advantage of pdf editor's abilities to mark the appropriate borders and recognize the exact area to be copied

i am also in search of a way to estimate the optimum resolution before taking the snapshot
Logged
tomos
Charter Member
***
Posts: 8,612



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #49 on: October 08, 2010, 04:07:53 PM »

i am also in search of a way to estimate the optimum resolution before taking the snapshot

The optimum resolution for a screenshot would simply be the screen resolution [I presume that's what screenshot tools choose(?)]. If you want to take a screenshot of a vector image/graph/etc (as in your sample pdf) you could enlarge the image as much as you can before taking the screenshot. That's your best quality there.

It different with pixel images (jpg png gif etc) as I mentioned before (you're better off extracting them if possible).
Logged

Tom
Pages: Prev 1 [2] 3 Next   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.045s | Server load: 0.19 ]