topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Saturday October 5, 2024, 12:57 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: rename pdf files  (Read 4852 times)

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,824
    • View Profile
    • Donate to Member
rename pdf files
« on: January 15, 2011, 02:25 PM »
hello!

I am trying to make some kind of script that will do this:

I have opened a pdf file and have selected some text, which is the title of the article. The script will copy that text, clean it up, which means it will :
  • replace the newlines with spaces
  • replace double spaces with single spaces
  • capitalize sentences appropriately
  • replace characters like ':' that cannot be part of a filename in WindowsOS with '.'
  • if text is too long to be a filename, it will trim the characters accordingly
then it will rename the filename with that text (even if the file is 'in use' and thus cannot be renamed)

this way, it will be convenient to archive the pdf articles!

can one do this easily?

thanks!

lanux128

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 6,277
    • View Profile
    • Donate to Member
Re: rename pdf files
« Reply #1 on: January 17, 2011, 12:38 AM »
mouser's Clipboard Help+Spell would be of help here..

https://www.donation...pandspell/index.html

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: rename pdf files
« Reply #2 on: January 18, 2011, 03:09 PM »
Does your text have any structure?  If so, and your PDF isn't locked, you could do something like

1) Use the pdftotext utility of XPDF to convert to plain text
2) Make an AWK script to parse the text file so made and apply your other corrections.

I used to do something like this at work, but, the files were safety data sheets and they had enough structure to be able to capture the relevant titles.

Otherwise, lanux is right about CHS, but other utilities you might find helpful to semi-automate the process include the following.  The first three are convenient ways of manipulating text in the clipboard.

  • Clippy (freeware)
    You can re-align the '>' characters that have been
    concatenated into a single line; Strip HTML tags; Convert the case of
    letters; Remove line breaks; Count the number of lines and words; And
    more.

    Clippy works by taking the text that you have copied to the clipboard,
    sending the text through it's conversion engine and pushing the
    reformatted text to the clipboard.

    Partial list of features:
    * Align center
    * Align fill
    * Align left
    * Align quote
    * Align right
    * Case capitalise
    * Case invert
    * Case lower
    * Case upper
    * Count of characters
    * Count of lines
    * Count of words
    * Convert all spaces to tabs
    * Convert leading spaces to tabs
    * Convert tabs to spaces
    * Quote
    * Remove blank lines
    * Remove duplicate lines
    * Remove line
    * Remove line breaks
    * Strip HTML
    * Trim leading spaces
    * Trim trailing spaces
    * Unquote
    * Remove duplicate blank lines
    * Convert DOS to Unix
    * Convert Unix to DOS
    * Delete to end-of-line from column
    * Delete to end-of-line from string
    * Sort lines
    * Search and replace

    The web site is often down: I have attached a copy.  It's only 73k.

  • Free Data Capture Tool (FDC)
    Partial feature list:
        * Trim: spaces at the beginning and end of the line
        * Remove Puncutations: punctuations are removed
        * Strip Parentheticals: data within parentheses/brackets is removed.
        * Parse Words: words are split into new lines on white spaces.
        * Ignore Empty: blank lines are removed.
        * Remove HTML: HTML tags are removed from HTML text.
        * Change Case: the case of all data is changed as selected.
        * Parse: new lines are created at locations that match the specified strings. Note that no further actions are performed on the inserted lines during this phase of processing. Reprocess the capture box with new filters as needed.
        * Accept Lines: only those lines that contain the specifed substrings are placed in the captured data box.
        * Reject Lines: lines that do not contain the specifed substrings are ignored.
        * Replace/Remove: data that matches selections is replaced. Each row in this grid represents one find-replace pair.
        * Extract/Delete Text in Substrings: data that is found between the specified start and stop substring location is Extracted or Deleted.
        * Suffix & Prefix: data is added to the beginning and/end of a line as specified.
        * Merge Lines: groups of 2-5 lines are merged into single lines. If specified, separators are inserted between some/all of the merged groups of text.
    This one has been mentioned on DC before; I haven't tried it myself.

  • TextMonkey free Lite version, more powerful payware Pro.
    Email Cleanup

        * Sense when quoted email text is copied to the clipboard and clean automatically, or upon confirmation †
        * Sense and remove email quoting symbols, including those with prefix strings such as "Joe>" †
        * Set email quoting symbols
        * Delete extra spaces and blank lines †
        * Sense and preserve formatting of lines that appear not to be flowing text †
        * Convert line breaks to space † -or-
        * Reformat line breaks to user-defined paragraph width

    Web Document Cleanup

        * Delete HTML tags
        * Decode named and numbered HTML character constants, such as """ or """
        * Delete extra blank lines (ie, reduce two or more blank lines to one blank line)
        * Delete leading spaces and tabs
        * Convert line breaks to spaces

    Space Operations

        * Delete leading spaces and tabs
        * Delete trailing spaces and tabs
        * Reduce space runs to one space
        * Apply one space after sentence enders
        * Apply two spaces after sentence enders

    Line Operations

        * Delete extra blank lines (ie, reduce two or more blank lines to one blank line)
        * Delete all blank lines
        * Delete duplicate lines
        * Delete lines that contain user-specified text
        * Delete lines that do not contain user-specified text

    Indent Operations

        * Indent all lines with one space †
        * Indent all lines with one tab
        * Indent all lines with user-specified text
        * Unindent all lines by one character †
        * Unindent all (remove all leading spaces and tabs)

    Case Operations

        * Convert to uppercase †
        * Convert to lowercase †
        * Capitalize each word
        * Capitalize each sentence
        * Swap case of each character

    HTML Operations

        * Delete HTML tags
        * Decode named characters such as """
        * Decode numbered characters such as """
        * Convert text to HTML document
              o Apply all relevant tags to make a legal HTML document
              o Option to retain line breaks, or convert line breaks to spaces
              o Encode sensitive characters to HTML character constants
        * Encode text to &#nnn; sequences (can be useful to obfuscate email addresses that are to be posted on a website and reduce spam harvesting)

    Sort Operations

        * Sort lines alphabetically, numerically or according to line length
        * Designate the column on which sort comparison should occur
        * Case sensitive or insensitive
        * Ascending, descending or random order
        * Sort according to ANSI value or Locale sort table (for proper handling of accented characters)

    Auto-Number Operations

        * Set starting number
        * Set numbering increment
        * Set trailing symbol, if any
        * Set trailing spaces, if any
        * Set leading zero, if any
        * Number using digits, or roman numerals
        * Left justified or right justified

    Conversions

        * Tabs to spaces
        * Spaces to tabs
        * Set tab display value
        * OEM to ANSI
        * ANSI to OEM

    Strip Operations

        * Strip low ASCII characters
        * Strip high ASCII characters
        * Strip OEM graphics (line drawing) characters
        * Convert OEM graphics (line drawing) characters to +, - and |
        * Strip user-specified characters

    Replace Operations

        * Designate up to four search and replace string pairs
        * Case sensitive or insensitive

    Miscellaneous Operations

        * Compute total, average, median, mode and standard deviation
        * Count characters, words and lines
        * Count occurrences of a user-specified text string
        * Convert text to hex dump format (useful to examining character values)
        * Convert normal text to "Teen Text," as is common in chat environments
        * Convert normal text to "Crazy Text," using alternative but readable substitutes for most characters

    Clipboard Viewer / Editor

        * Examine clipboard content in a resizeable editor window †
        * Set clipboard viewer display font
        * Undo allows previous clipboard content to be restored †
        * View in browser makes it easy to preview clipboard content in a web browser
        * Save clipboard text to a file
        * Visual wrap options allows long lines to wrap to window width †
    Features marked with "dagger" character "†" are present in Lite version.

  • Oscar's File Renamer
    You can't rename a file while it's open.  Oscar's File Renamer is unlike most renamers; it's more like an editor.  You prepare the changes you want to make, close your PDF file(s), then tell Renamer to do its thing.

    The Renamer takes and enhances the idea of editing files in directory in a full featured Text editor and then writing all the changes at once into the files.
    It works simple: Open Renamer, select directory and the files will appear in the File Name Editor which is a normal full featured text editor with some additional changes (f.e. you can't add or delete line).
    You can use all the editor functions like Quick find, Replace, multiple Undo/Redo, Macros and of course normal editing. Each file is on a new line and can easily move with arrows like in a normal text.
    When the files you wanted to rename are done simply click Apply Changes and all files will be physically renamed.

    Some benefits:

        * Fast Editing of long list of file names - exactly like in a text editor
        * All changes to files themselves are done at the very end when you press Apply Changes, not during editing.
        * During Editing you can use Undo/Redo, and various tools like Upper Case/Lower Case or numbering.
        * All changed lines are visibly marked
        * It doesn't let you enter wrong characters
        * A File List shows the original names on the disk.
        * You can record a keyboard macro and apply to the file names.
        * You can undo changes even after you write to disk
        * It can integrate into windows Shell.
Alternatively, use a normal text editor to make a batch file and run that to do the renaming.


kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,824
    • View Profile
    • Donate to Member
Re: rename pdf files
« Reply #3 on: January 27, 2011, 10:30 AM »
thanks but is it easy for you to implement the solution?

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: rename pdf files
« Reply #4 on: January 27, 2011, 04:15 PM »
The smoothest system I had was when I needed to rename safety data sheets generated by the system used by the company I worked for.  These had structure, so I converted them to text with XPDF, made an AWK script that matched the resulting text file, and ran a batch file that used the output to generate a second temporary batch file that did the renaming.  Then I could just click on the batch file and have the renaming done automatically.  This is, obviously, critically dependent on a particular structure.  If I was dealing with incoming files with no consistent structure, I had to use the kind of tools I listed above.  I have no idea what structure if any your file have, so I can't be more specific, and in any case I'm not a coder.  I just lashed things together.  All I can do is give you an idea of what worked for me.