topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday December 12, 2024, 8:28 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: command line tool for pdf  (Read 3476 times)

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,824
    • View Profile
    • Donate to Member
command line tool for pdf
« on: November 26, 2011, 08:02 AM »
hello!

I am looking for a command line tool that will:

1) place a specific image (logo) at a specific resolution/size, at a specific place in pdf pages (according to the top left corner, etc)
2) extract all the text of pdf pages (and save it in a variable or a file)
3) search and replace specific text in pdf pages (optimally using regex)
4) crop and save the cropped area or delete the cropped area of pdf pages

do you know any??

thanks!

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: command line tool for pdf
« Reply #1 on: November 26, 2011, 01:37 PM »
2) extract all the text of pdf pages (and save it in a variable or a file)
XPDF.  Freeware.
3) search and replace specific text in pdf pages (optimally using regex)
PDF Text Replace Tool.  It isn't command-line, but it's the only one I know that can do that.  Freeware and more advanced payware editions.

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
Re: command line tool for pdf
« Reply #2 on: November 27, 2011, 07:05 AM »
It is highly improbable that you will find such a (commandline) tool. However, there are tools partially do what you request.

Likely you have an original PDF and you want to create a new PDF using your logo and (most of) the content from the original PDF.
In that case it might be smarter to rethink the order in which you want to do things.

01.) It would be better to first extract the content from the original PDF to file(s) in a separate folder: 
     http://pdftohtml.sourceforge.net/
02.) Again using a different (sub-)folder to strip the html content, leaving you with text file(s) only:
     http://kmachine.home.xs4all.nl/html2txt.htm
03.) Use a graphical editor in cli mode to resize/crop and save graphical content from step 1:
     http://gd.tuwien.ac.at/graphics/xv/html-docu/command-line-options.html
04.) Using a text editor in cli mode to store the differences between HTML content and the stripped text (to retain a simple HTML layout):
     http://sed.sourceforge.net/ (this appears to be the most capable text editor for command line operations)
05.) Using a text editor in cli mode to replace desired specific content inside the stripped text:
     http://sed.sourceforge.net/
06.) Using a text editor in cli mode to store the content from step 4 into a previously created HTML template (with your positioned logo):
     http://sed.sourceforge.net/
07.) Using a text editor in cli mode to store the content from step 5 into a previously created HTML template (with your positioned logo):
     http://sed.sourceforge.net/
08.) Using a text editor in cli mode to store the content from step 7 in a different (sub-)folder as HTML file(s):
     http://sed.sourceforge.net/
09.) Convert the HTML file(s) to PDF:
     http://code.google.com/p/anytopdf/ - uses the OpenOffice/LibreOffice (portable version) PDF conversion capabilities (which are awesome in my point of view)
10.) Cleanup the mess created during the conversion process from your hard drive:
     Use a BATCH script or something similar

All the above is great on simple PDF's, if you have to use complicated PDF's (heavy on layout) you will have to do things manually as preserving the layout will be practically impossible when automatizing.

Have fun puzzling with the command line options of each and every tool...

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,824
    • View Profile
    • Donate to Member
Re: command line tool for pdf
« Reply #3 on: November 30, 2011, 03:39 PM »
thanks for your reply, but that would be overkill for my task, eg. to monitor a folder named "logo them all" and when a pdf is pasted or created in that folder, to automatically add a graphics file to each page

any script or program that performs actions to pdf files, like those I mention at the first post?