ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

rename pdf, word, etc files based on their text content

<< < (3/3)

IainB:
is there any tool that will rename files like pdf, word, etc based on their content text?
ie, if a specific match of a regex exists in the content of a pdf file, then rename the file with that match
-kalos (December 09, 2011, 02:45 AM)
--- End quote ---
Without knowing more of what lies behind your request, it may be difficult to see how to help.

I think you might be able to get this done in some shape or form by using a reference management system that can handle PDF files. From that perspective, I would suggest that Qiqqa might be worth a look. I haven't played with the latest version, so I can' t really say any more about it.
Qiqqa  could maybe even be modified to provide a feature that you have asked the author to put in, if it wasn't there already.
Also, you might find that the capabilities of something like Qiqqa enabled you to discover a new/changed requirement to this current requirement.
Adobe may have something to help here too, of course.

rjbull:
a commandline utility that could take a pdf file, grab its title, author, and date and rename it to some combination of those, would be EXTREMELY useful.-mouser (December 10, 2011, 09:37 AM)
--- End quote ---

@mouser: closest I know of to that would be pdfinfo from the XPDF package.

pdfinfo Help:pdfinfo version 3.02
  Copyright 1996-2007 Glyph & Cog, LLC
  Usage: pdfinfo [options] <PDF-file>
    -f <int>       : first page to convert
    -l <int>       : last page to convert
    -box           : print the page bounding boxes
    -meta          : print the document metadata (XML)
    -enc <string>  : output text encoding name
    -opw <string>  : owner password (for encrypted files)
    -upw <string>  : user password (for encrypted files)
    -cfg <string>  : configuration file to use in place of .xpdfrc
    -v             : print copyright and version info
    -h             : print usage information
    -help          : print usage information
    --help         : print usage information
    -?             : print usage information

The command line
pdfinfo 2011_velbon_catalog.pdf > info.txt
on the Velbon (tripod) catalogue gives:

Title:          2011_catalog.indd
Creator:        Adobe InDesign CS3_J (5.0.4)
Producer:       Adobe PDF Library 8.0
CreationDate:   03/17/11 17:31:25
ModDate:        03/24/11 11:30:11
Tagged:         no
Pages:          16
Encrypted:      yes (print:yes copy:yes change:no addNotes:no)
Page size:      1274.43 x 273.584 pts
File size:      9668147 bytes
Optimized:      yes
PDF version:    1.6

I used to use this for re-naming PDFs of patents at work, with something like an AWK script to extract the relevant information and write a temporary batch file that did the work.  The trouble is you can't rely on all those fields being filled in for any particular PDF.

Navigation

[0] Message Index

[*] Previous page

Go to full version