ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

command line tool for pdf

(1/1)

kalos:
hello!

I am looking for a command line tool that will:

1) place a specific image (logo) at a specific resolution/size, at a specific place in pdf pages (according to the top left corner, etc)
2) extract all the text of pdf pages (and save it in a variable or a file)
3) search and replace specific text in pdf pages (optimally using regex)
4) crop and save the cropped area or delete the cropped area of pdf pages

do you know any??

thanks!

rjbull:
2) extract all the text of pdf pages (and save it in a variable or a file)-kalos (November 26, 2011, 08:02 AM)
--- End quote ---
XPDF.  Freeware.
3) search and replace specific text in pdf pages (optimally using regex)-kalos (November 26, 2011, 08:02 AM)
--- End quote ---
PDF Text Replace Tool.  It isn't command-line, but it's the only one I know that can do that.  Freeware and more advanced payware editions.

Shades:
It is highly improbable that you will find such a (commandline) tool. However, there are tools partially do what you request.

Likely you have an original PDF and you want to create a new PDF using your logo and (most of) the content from the original PDF.
In that case it might be smarter to rethink the order in which you want to do things.

01.) It would be better to first extract the content from the original PDF to file(s) in a separate folder: 
     http://pdftohtml.sourceforge.net/
02.) Again using a different (sub-)folder to strip the html content, leaving you with text file(s) only:
     http://kmachine.home.xs4all.nl/html2txt.htm
03.) Use a graphical editor in cli mode to resize/crop and save graphical content from step 1:
     http://gd.tuwien.ac.at/graphics/xv/html-docu/command-line-options.html
04.) Using a text editor in cli mode to store the differences between HTML content and the stripped text (to retain a simple HTML layout):
     http://sed.sourceforge.net/ (this appears to be the most capable text editor for command line operations)
05.) Using a text editor in cli mode to replace desired specific content inside the stripped text:
     http://sed.sourceforge.net/
06.) Using a text editor in cli mode to store the content from step 4 into a previously created HTML template (with your positioned logo):
     http://sed.sourceforge.net/
07.) Using a text editor in cli mode to store the content from step 5 into a previously created HTML template (with your positioned logo):
     http://sed.sourceforge.net/
08.) Using a text editor in cli mode to store the content from step 7 in a different (sub-)folder as HTML file(s):
     http://sed.sourceforge.net/
09.) Convert the HTML file(s) to PDF:
     http://code.google.com/p/anytopdf/ - uses the OpenOffice/LibreOffice (portable version) PDF conversion capabilities (which are awesome in my point of view)
10.) Cleanup the mess created during the conversion process from your hard drive:
     Use a BATCH script or something similar

All the above is great on simple PDF's, if you have to use complicated PDF's (heavy on layout) you will have to do things manually as preserving the layout will be practically impossible when automatizing.

Have fun puzzling with the command line options of each and every tool...

kalos:
thanks for your reply, but that would be overkill for my task, eg. to monitor a folder named "logo them all" and when a pdf is pasted or created in that folder, to automatically add a graphics file to each page

any script or program that performs actions to pdf files, like those I mention at the first post?

Navigation

[0] Message Index

Go to full version