Main Area and Open Discussion > General Software Discussion
rename pdf, word, etc files based on their text content
IainB:
is there any tool that will rename files like pdf, word, etc based on their content text?
ie, if a specific match of a regex exists in the content of a pdf file, then rename the file with that match
-kalos (December 09, 2011, 02:45 AM)
--- End quote ---
Without knowing more of what lies behind your request, it may be difficult to see how to help.
I think you might be able to get this done in some shape or form by using a reference management system that can handle PDF files. From that perspective, I would suggest that Qiqqa might be worth a look. I haven't played with the latest version, so I can' t really say any more about it.
Qiqqa could maybe even be modified to provide a feature that you have asked the author to put in, if it wasn't there already.
Also, you might find that the capabilities of something like Qiqqa enabled you to discover a new/changed requirement to this current requirement.
Adobe may have something to help here too, of course.
rjbull:
a commandline utility that could take a pdf file, grab its title, author, and date and rename it to some combination of those, would be EXTREMELY useful.-mouser (December 10, 2011, 09:37 AM)
--- End quote ---
@mouser: closest I know of to that would be pdfinfo from the XPDF package.
pdfinfo Help:pdfinfo version 3.02
Copyright 1996-2007 Glyph & Cog, LLC
Usage: pdfinfo [options] <PDF-file>
-f <int> : first page to convert
-l <int> : last page to convert
-box : print the page bounding boxes
-meta : print the document metadata (XML)
-enc <string> : output text encoding name
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-cfg <string> : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
The command line
pdfinfo 2011_velbon_catalog.pdf > info.txt
on the Velbon (tripod) catalogue gives:
Title: 2011_catalog.indd
Creator: Adobe InDesign CS3_J (5.0.4)
Producer: Adobe PDF Library 8.0
CreationDate: 03/17/11 17:31:25
ModDate: 03/24/11 11:30:11
Tagged: no
Pages: 16
Encrypted: yes (print:yes copy:yes change:no addNotes:no)
Page size: 1274.43 x 273.584 pts
File size: 9668147 bytes
Optimized: yes
PDF version: 1.6
I used to use this for re-naming PDFs of patents at work, with something like an AWK script to extract the relevant information and write a temporary batch file that did the work. The trouble is you can't rely on all those fields being filled in for any particular PDF.
Navigation
[0] Message Index
[*] Previous page
Go to full version