topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Saturday January 18, 2025, 3:34 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: NANY 2013 Release: pdfautomv 0.3  (Read 20274 times)

phitsc

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 1,198
    • View Profile
    • Donate to Member
NANY 2013 Release: pdfautomv 0.3
« on: November 29, 2012, 03:04 PM »
NANY 2013 Entry Information

Application Name pdfautomv (or pdfautomv Robot, haven't decided yet ;))
Version 0.3
Short Description Moves PDF files into directories depending on embedded text
Supported OSes Windows (and possibly Linux)
Web Page https://bitbucket.org/phitsc/pdfautomv
Download Link source available following above link. but I'll make a zip too.
System Requirements
  • Ruby runtime
Version History
  • 0.3 - Fixed a crash.
  • 0.2 - Rule files now have to be UTF-8. Fixed crash with -v 3 option.


Description
pdfautomv will be a simple command line utility (although perfectly usable with a simple double-click on a desktop shortcut) for the paperless office aficionado. Its purpose is to move PDF files from one directory to another based on the text embedded in the PDF file. My own primary use case is as follows:

1. Put invoice, receipt, letter, bank statement, whatever on scanner
2. Start scanning process => this will produce a PDF file in directory A
3. Repeat 1 - 2 until everything is scanned and directory A is full of files like Document.pdf, Document001.pdf, Document002.pdf, etc.
4. Double-click shortcut to pdfautomv.rb => marvel how all the Document bla bla.pdf files get nicely and neatly renamed and sorted into the directories where they belong

Usage
Installation
The application will be written in Ruby. So the Ruby runtimes have to be installed if they are not already. The application itself is just one Ruby file.

Using the Application
The application will rely on some "rule" files which have to be supplied by the user. A rule file specifies what pdfautomv should look for in a PDF file and where to move it and how to rename it if it finds a matching PDF file.
« Last Edit: July 25, 2013, 11:33 AM by phitsc »

phitsc

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 1,198
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #1 on: November 29, 2012, 03:04 PM »
I've now and again tried to find an application that does just what pdfautomv will do but failed to find any. If someone knows such an application, even commercial (if it's not absurdly expensive), please let me know.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,206
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #2 on: November 29, 2012, 03:34 PM »
I used to have to rename health & safety data sheet PDFs.  I had a batch file that used pdftotext from xpdf to convert the PDF to text, with an AWK script to parse the file and find what should be the name, and make a temporary .BAT that actually renamed the file.  That only worked because the files had predictable structure.

phitsc

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 1,198
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #3 on: November 29, 2012, 03:40 PM »
That is exactly what I'm doing. It's also supposed to work with documents with predictable structure (actually, predictable content), hence the rule files.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,922
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #4 on: November 29, 2012, 03:54 PM »
will renaming be part of this?

if so, are we saying it might be possible to rename a pdf file based on the text content in the file -- like on the article title and author?  i've long wanted such a thing for downloaded academic articles, which are often named things like fdj4893dfjk48.pdf

phitsc

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 1,198
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #5 on: November 29, 2012, 04:01 PM »
Yes, just renaming is also possible. I'll post an example tomorrow for you to see if it covers your case (though I think it should)

wraith808

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 11,192
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #6 on: November 29, 2012, 04:25 PM »
If it does work- that's cool!  When I buy a lot of pdfs, they're named some incomprehensible series of numbers and letters and I have to manually rename them...

TaoPhoenix

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 4,642
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #7 on: November 29, 2012, 05:18 PM »
When I buy a lot of pdfs, ...

Where are you buying lots of PDFs?  :o

phitsc

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 1,198
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #8 on: November 30, 2012, 06:56 AM »
Here's an example rule file:

[match]
079 123 45 67

[variables]
dateLong=(\d\d)\. (Januar|Februar|März|April|Mai|Juni|July|August|September|Oktober|November|Dezember) (\d\d\d\d)
dateShort:5=(\d\d)\.(\d\d)\.(\d\d)

[move]
\\server\pl-office\bills\<dateLong:3>\<dateLong:3>-<dateShort:2> Telco - mobile.PDF

Both the match and the variables are regular expressions. The variables can then be referenced in the move expression (in angle brackets). :n references regex captures. The matched text is available via the implicit <match> variable. The :5 after dateShort specifies that the 5th match should be assigned to the dateShort variable.

wraith808

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 11,192
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #9 on: November 30, 2012, 07:56 AM »
When I buy a lot of pdfs, ...

Where are you buying lots of PDFs?  :o

DriveThruRPG, Oreilly, Apress, Manning, etc, etc.  Books, you know.

phitsc

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 1,198
    • View Profile
    • Donate to Member
Re: NANY 2013 Pledge: pdfautomv
« Reply #10 on: January 03, 2013, 03:01 PM »
Am I already too late? The ruby script is available (see Web Page link in first post). I still have to document the rule file format though.

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,922
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: NANY 2013 Release: pdfautomv
« Reply #11 on: January 03, 2013, 03:12 PM »
You are ok, I'm adding it now.  :Thmbsup:

phitsc

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 1,198
    • View Profile
    • Donate to Member
Re: NANY 2013 Release: pdfautomv
« Reply #12 on: January 03, 2013, 03:45 PM »
Thanks mouser.

I added some documentation now in a README displayed on the above web site.

phitsc

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 1,198
    • View Profile
    • Donate to Member
Re: NANY 2013 Release: pdfautomv 0.2
« Reply #13 on: January 09, 2013, 04:15 PM »
Rule files now have to be UTF-8 encoded. Allows filenames such as 'Pämienabrechnung.PDF'.