DonationCoder.com Forum

DonationCoder.com Software => Older DC Contests and Challenges => N.A.N.Y. 2013 => Topic started by: phitsc on November 29, 2012, 03:04 PM

Title: NANY 2013 Release: pdfautomv 0.3
Post by: phitsc on November 29, 2012, 03:04 PM
NANY 2013 (https://www.donationcoder.com/forum/index.php?board=304.0) Entry Information

Application Name pdfautomv (or pdfautomv Robot, haven't decided yet ;))
Version 0.3
Short Description Moves PDF files into directories depending on embedded text
Supported OSes Windows (and possibly Linux)
Web Page https://bitbucket.org/phitsc/pdfautomv
Download Link source available following above link. but I'll make a zip too.
System Requirements
  • Ruby runtime
Version History
  • 0.3 - Fixed a crash.
  • 0.2 - Rule files now have to be UTF-8. Fixed crash with -v 3 option.


Description
pdfautomv will be a simple command line utility (although perfectly usable with a simple double-click on a desktop shortcut) for the paperless office aficionado. Its purpose is to move PDF files from one directory to another based on the text embedded in the PDF file. My own primary use case is as follows:

1. Put invoice, receipt, letter, bank statement, whatever on scanner
2. Start scanning process => this will produce a PDF file in directory A
3. Repeat 1 - 2 until everything is scanned and directory A is full of files like Document.pdf, Document001.pdf, Document002.pdf, etc.
4. Double-click shortcut to pdfautomv.rb => marvel how all the Document bla bla.pdf files get nicely and neatly renamed and sorted into the directories where they belong

Usage
Installation
The application will be written in Ruby. So the Ruby runtimes have to be installed if they are not already. The application itself is just one Ruby file.

Using the Application
The application will rely on some "rule" files which have to be supplied by the user. A rule file specifies what pdfautomv should look for in a PDF file and where to move it and how to rename it if it finds a matching PDF file.
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: phitsc on November 29, 2012, 03:04 PM
I've now and again tried to find an application that does just what pdfautomv will do but failed to find any. If someone knows such an application, even commercial (if it's not absurdly expensive), please let me know.
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: rjbull on November 29, 2012, 03:34 PM
I used to have to rename health & safety data sheet PDFs.  I had a batch file that used pdftotext from xpdf to convert the PDF to text, with an AWK script to parse the file and find what should be the name, and make a temporary .BAT that actually renamed the file.  That only worked because the files had predictable structure.
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: phitsc on November 29, 2012, 03:40 PM
That is exactly what I'm doing. It's also supposed to work with documents with predictable structure (actually, predictable content), hence the rule files.
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: mouser on November 29, 2012, 03:54 PM
will renaming be part of this?

if so, are we saying it might be possible to rename a pdf file based on the text content in the file -- like on the article title and author?  i've long wanted such a thing for downloaded academic articles, which are often named things like fdj4893dfjk48.pdf
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: phitsc on November 29, 2012, 04:01 PM
Yes, just renaming is also possible. I'll post an example tomorrow for you to see if it covers your case (though I think it should)
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: wraith808 on November 29, 2012, 04:25 PM
If it does work- that's cool!  When I buy a lot of pdfs, they're named some incomprehensible series of numbers and letters and I have to manually rename them...
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: TaoPhoenix on November 29, 2012, 05:18 PM
When I buy a lot of pdfs, ...

Where are you buying lots of PDFs?  :o
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: phitsc on November 30, 2012, 06:56 AM
Here's an example rule file:

[match]
079 123 45 67

[variables]
dateLong=(\d\d)\. (Januar|Februar|März|April|Mai|Juni|July|August|September|Oktober|November|Dezember) (\d\d\d\d)
dateShort:5=(\d\d)\.(\d\d)\.(\d\d)

[move]
\\server\pl-office\bills\<dateLong:3>\<dateLong:3>-<dateShort:2> Telco - mobile.PDF

Both the match and the variables are regular expressions. The variables can then be referenced in the move expression (in angle brackets). :n references regex captures. The matched text is available via the implicit <match> variable. The :5 after dateShort specifies that the 5th match should be assigned to the dateShort variable.
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: wraith808 on November 30, 2012, 07:56 AM
When I buy a lot of pdfs, ...

Where are you buying lots of PDFs?  :o

DriveThruRPG, Oreilly, Apress, Manning, etc, etc.  Books, you know.
Title: Re: NANY 2013 Pledge: pdfautomv
Post by: phitsc on January 03, 2013, 03:01 PM
Am I already too late? The ruby script is available (see Web Page link in first post). I still have to document the rule file format though.
Title: Re: NANY 2013 Release: pdfautomv
Post by: mouser on January 03, 2013, 03:12 PM
You are ok, I'm adding it now.  :Thmbsup:
Title: Re: NANY 2013 Release: pdfautomv
Post by: phitsc on January 03, 2013, 03:45 PM
Thanks mouser.

I added some documentation now in a README displayed on the above web site.
Title: Re: NANY 2013 Release: pdfautomv 0.2
Post by: phitsc on January 09, 2013, 04:15 PM
Rule files now have to be UTF-8 encoded. Allows filenames such as 'Pämienabrechnung.PDF'.