topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Monday March 18, 2024, 11:08 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: rename pdf, word, etc files based on their text content  (Read 21559 times)

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,820
    • View Profile
    • Donate to Member
rename pdf, word, etc files based on their text content
« on: December 09, 2011, 02:45 AM »
hello

is there any tool that will rename files like pdf, word, etc based on their content text?

ie, if a specific match of a regex exists in the content of a pdf file, then rename the file with that match

thanks!

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,820
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #1 on: December 10, 2011, 09:22 AM »
there isnt really such thing?
it would be so convenient!

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,610
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #2 on: December 10, 2011, 09:27 AM »
skwire is still busy I guess ;D ;D

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,896
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #3 on: December 10, 2011, 09:37 AM »
is there any tool that will rename files like pdf, word, etc based on their content text?

i have often thought that a commandline utility that could take a pdf file, grab its title, author, and date and rename it to some combination of those, would be EXTREMELY useful.

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,610
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #4 on: December 10, 2011, 09:40 AM »
I'd be going nuts trying to find that file I stored and it got renamed to something totally different :huh:

yksyks

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 476
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #5 on: December 10, 2011, 09:51 AM »
Peter's Flexible Renaming Kit (PFrank) has a plugin for inserting title metadata from PDF files. Haven't tried by myself, though.

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,820
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #6 on: December 10, 2011, 10:09 AM »
Peter's Flexible Renaming Kit (PFrank) has a plugin for inserting title metadata from PDF files. Haven't tried by myself, though.

but pdf metadata does not contain pdf text content, only author, creation date, etc, AFAIK

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #7 on: December 10, 2011, 10:15 AM »
skwire is still busy I guess ;D ;D

Based on my own experience working with PDF files in this manner, it's a huge pain in the arse.  Kalos, have you ever taken a look at Calibre to help manage your PDF collection?  I use it to convert stuff to read on my Android phone.

http://calibre-ebook.com/

yksyks

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 476
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #8 on: December 10, 2011, 12:24 PM »
but pdf metadata does not contain pdf text content, only author, creation date, etc, AFAIK
You're right, I was thinking more about mouser's request.

sbertolucci

  • Participant
  • Joined in 2006
  • *
  • default avatar
  • Posts: 1
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #9 on: December 11, 2011, 07:21 AM »
is there any tool that will rename files like pdf, word, etc based on their content text?

i have often thought that a commandline utility that could take a pdf file, grab its title, author, and date and rename it to some combination of those, would be EXTREMELY useful.

In the past I've written a little command line tool which sets the date of a PDF file to the one recorded in its metadata and it was a difficult task because there's not a standard. Every program which generates a PDF writes that data in it's own way

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #10 on: December 11, 2011, 08:16 AM »
is there any tool that will rename files like pdf, word, etc based on their content text?
ie, if a specific match of a regex exists in the content of a pdf file, then rename the file with that match
Without knowing more of what lies behind your request, it may be difficult to see how to help.

I think you might be able to get this done in some shape or form by using a reference management system that can handle PDF files. From that perspective, I would suggest that Qiqqa might be worth a look. I haven't played with the latest version, so I can' t really say any more about it.
Qiqqa  could maybe even be modified to provide a feature that you have asked the author to put in, if it wasn't there already.
Also, you might find that the capabilities of something like Qiqqa enabled you to discover a new/changed requirement to this current requirement.
Adobe may have something to help here too, of course.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
Re: rename pdf, word, etc files based on their text content
« Reply #11 on: December 11, 2011, 01:44 PM »
a commandline utility that could take a pdf file, grab its title, author, and date and rename it to some combination of those, would be EXTREMELY useful.

@mouser: closest I know of to that would be pdfinfo from the XPDF package.

pdfinfo Help:
pdfinfo version 3.02
  Copyright 1996-2007 Glyph & Cog, LLC
  Usage: pdfinfo [options] <PDF-file>
    -f <int>       : first page to convert
    -l <int>       : last page to convert
    -box           : print the page bounding boxes
    -meta          : print the document metadata (XML)
    -enc <string>  : output text encoding name
    -opw <string>  : owner password (for encrypted files)
    -upw <string>  : user password (for encrypted files)
    -cfg <string>  : configuration file to use in place of .xpdfrc
    -v             : print copyright and version info
    -h             : print usage information
    -help          : print usage information
    --help         : print usage information
    -?             : print usage information


The command line
pdfinfo 2011_velbon_catalog.pdf > info.txt

on the Velbon (tripod) catalogue gives:

Title:          2011_catalog.indd
Creator:        Adobe InDesign CS3_J (5.0.4)
Producer:       Adobe PDF Library 8.0
CreationDate:   03/17/11 17:31:25
ModDate:        03/24/11 11:30:11
Tagged:         no
Pages:          16
Encrypted:      yes (print:yes copy:yes change:no addNotes:no)
Page size:      1274.43 x 273.584 pts
File size:      9668147 bytes
Optimized:      yes
PDF version:    1.6


I used to use this for re-naming PDFs of patents at work, with something like an AWK script to extract the relevant information and write a temporary batch file that did the work.  The trouble is you can't rely on all those fields being filled in for any particular PDF.