ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Desktop tool for programmatically removing URL's from PDF files

<< < (3/3)

IainB:
He explained it very well, I thought.  Sometimes, people link the words http://www.google.com to http://www.malwareRus.com in PDFs.  He wants to strip out the links.
________________________
-wraith808 (August 25, 2016, 10:24 AM)
--- End quote ---
Duh!
Thankyou, and yes, he does explain it very well.
I should have read his entire post rather than just the end bit and 4wd's response.
My apologies. More haste, less speed...    :-[

questorfla:
Thanks to all for the suggestions.  I am looking into both Dr. Andus and Shades links.  Surely one of them can do something this simple.   If not maybe one of the online tools has a desktop version.  The one i used PDF-du seems to but they sell every "piece" of their tool "A la Carte'.  It looks like DEBENU is a full suite and they even have a server version.  (Though this is not for a server OS, just a system with lots of storage).  And Shades offering comes from a company i have heard of and has been around for a log time.  so of course, it costs the most . 
Out of all of this, i hope i can find a permanent fix.  The one-time clean-up is necessary, but the full-time fix is to clean the files when they come in.  I hope his doesn't become a task that must be done on the file-server after the fact.  Much better as an add-in for Outlook so they get stripped clean before i have to deal with them.

Anyway, thanks for all the pointers.  I am kind of surprised that this hasn't ever happened to anyone else before. Or maybe it has just not to anyone who hangs out at DC.  AVAST told me that they have been getting a lot of calls from people running into this kind of thing so perhaps the scanning of embedded URL's against some sort of blacklist for websites is a more recent addition?

And before i forget, 4WD's idea is not so totally off the mark.  I just dont know how long it would take to do it that way.  I did see a few references to other people  sanitizing PDF's via printing them to a virtual PDF printer program such as BullZip.   I know for sure it it could be done 4WD is Script-King from everything he has ever sent me in the past.  This would be one of those "For (f) = 1 to 1 gazillion)... etc etc"  type runs.  Take each pdf, rename it slightly and print it to a new pdf file with the original filename and delete the renamed copy (or save it long enough to test a few and be sure the copy is really equal to the original), then move on the next.  Worst case, that may be my only way out.

With my luck, all of this will turn out to be caused by "False Positives" on an ever-changing Blacklist of URL sites :(. 

Which is why i would feel best if ALL Hidden URL's were removed.  I would prefer that any links go to exactly what people can see.  IF they choose to go to them anyway, by removing the ACTIVE part, it forces ADOBE to display that warning about "Being sure you know where this link is taking you" etc. At that point, i think i will have done as much as anyone could expect as to warning people to know where they are browsing to.

 

Shades:
Do you use DNS servers from your ISP or public ones?

A public one, such as OpenDNS (P: 208.67.222.222, S: 208.67.220.220) has (free and commercial) filter options that will help protect your users from themselves when they click on links to bad sites.

But if you are a diehard, run your own/company DNS server and have much more control over what sites your users are able to visit at any given moment.
Or if you channel internet access from all your users through a hardware or software router device, try to find out if it supports the use of "blacklists". If that is the case, add whatever site to that blacklist and your users are protected as well. Do keep that "blacklist" up-to-date though. All of these pointers do not require you to batch edit PDF files for bad/hidden links. You might want to take inventory of new bad links in new pdf files that come in by batch/manual processing these.

Some routers even let you make a custom "landing" HTML page that is served to a user who tries to visit any link in that blacklist.

For example: I use an old AMD dual core based white-box PC with 2GByte of RAM and 3 network cards in combination with OPNsense router software (FreeBSD). I have 2 different ISPs, each using one network card, the last network card is used to connect to a big switch that provides internet to all computers  hooked up to that switch (by cable or access points). A rather basic setup...but I like things simple. ;)

You manage this OPNsense router software in your browser and gives you a lot of control about what your users can or cannot do on the either the network or the internet at any given time. Routing, NAT, Firewall, DNS, DHCP, Blacklists, VPN, graphical overviews of (current and historical) traffic, it does everything. The amount of options it comes with, might be overwhelming at first, but once it is setup as you like it...you don't want to use any other system anymore. And there are free/open source/commercial expansions available if the standard functionality isn't enough for your intents and purposes.

The OPNsense router software is a fork of the pfSense router software, which is in turn a fork of the mO0nwall software. If you are interested in playing with these software routers, there is a lot of forums and instructional videos available for free support. Especially for pfSense. But if so inclined, you also are able to buy books and premium support for whatever exotic wish (regarding network setup) you might have.

My choice of OPNsense 16.1 over pfSense 2.3 is mainly its interface. Although pfSense version 2.3 has a drastically improved interface, I just like the OPNsense one better.

Navigation

[0] Message Index

[*] Previous page

Go to full version