Main Area and Open Discussion > General Software Discussion
extracting info from pdf
Curt:
If you could give an example pdf to do this operation, perhaps we all could experiment with the various tools each of us has.
-cmpm (September 21, 2010, 10:52 AM)
--- End quote ---
-plus of course a much more precise description of what the job is.
kalos:
but I already mentioned this
it is about extracting a photo, diagram, index, etc from a pdf file, but not by taking a screenshot that is not precise (since it varies with zoom value)
you told me that Solid PDF Tools offer this, to automatically recognize/select a table, graphics etc (all pdf editors do this) and to extract/save it as image file (none pdf editor does this, they only do it if you take a screenshot)
there is no way to work properly with pdf files, i wonder why they created such format, it is very frustrating
TO SUM UP:
i just need to be able to extract graphics, but to do so properly, which means:
1) in the optimum resolution (which means best possible quality, without distortion resulting from too big zoom, or without loss of quality resulting from too small zoom)
2) with the optimum borders (which means optimumly proportioned and not missing any area of the graphics, even if that area is empty)
also, i would like to be able to extract tables, diagrams etc in a format that i can easily replace their text, without damaging the format, architecture, etc of the graph, diagram, table, etc, but i bet this is too much for pdf format
rjbull:
Here is part of the manual for pdfimages, part of the XPDF suite:
------------------------------------------------------------------------------
pdfimages(1) pdfimages(1)
NAME
pdfimages - Portable Document Format (PDF) image extractor (version
3.02)
SYNOPSIS
pdfimages [options] PDF-file image-root
DESCRIPTION
Pdfimages saves images from a Portable Document Format (PDF) file as
Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files.
Pdfimages reads the PDF file, scans one or more pages, PDF-file, and
writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx,
where nnn is the image number and xxx is the image type (.ppm, .pbm,
.jpg).
NB: pdfimages extracts the raw image data from the PDF file, without
performing any additional transforms. Any rotation, clipping, color
inversion, etc. done by the PDF content stream is ignored.
------------------------------------------------------------------------------
Curt:
Because of your request only, I have now tested Solid PDF Tools, and I must say that I cannot help thinking you may not yet have fully understood how to use the program. It can do all you asked for. If you still have the program installed, please watch the online tutorials, and read the manual. Remember that the program not will edit picture, Excel or Word files, it will only create them. (Look for a new folder!)
http://www.soliddocuments.com/info.htm?product=SolidPDFTools&id=233&frame=4&subject=CreatePDFtoExcel etcetera.
My Nitro PDF PRO OCR will also do what you ask for. My AnyBizSoft 5-in-1 PDF, as well.
kalos:
it was because of you that I tested Solid PDF Tools
wait, what procedure do you follow in Nitro PDF?
1)
Click EDIT, then click on the graphics you want to copy in the pdf file, then right click COPY, then paste in MS Paint?
if so, it doesn't work always, to be honest, it doesn't work with most graphics, maybe because the graphics are 'protected'
2)
Click "Snapshot" then drag to select an area then paste in MS Paint?
this way ALL the above mentioned problems occur (not optimum resolution, not optimum borders)
I am curious to reading your way with this
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version