I am trying to find a tool that can scan a document *(usually a pdf file) and do an 'in place' replacement changing any words or images with embedded hyperlinks so that the active link is replaced by the text it contains.
This may end up with some ugly documents when done but for now it is something I need to find a way to accomplish.
My original idea was to simply remove the link and leave the text but I found too often that people have been typing in a short part of the link and either embedding the link or Adobe etc is chopping it off to an acceptable length.
Most Antivirus software is flagging a lot of these as contain malware. I found out that while the PDF or DOC itself is clean, it may contain "hyperlinks' to sites that are currently Blacklisted. 90% or more of the time, they end up being false positives but I cannot risk telling people to ignore them because of the 10% that are not.
No one here goes to these sites but they do have to post these documents so that they are available to others and that is where the problem occurs. The people they send them to don't understand the full implications of the warnings. Either they toss the email and the attachments due to the warnings or they open them and maybe end up getting malware from a site we sent them the link to.
I would like to find a way to remove the ACTIVE part of the hyperlink replacing it with the text it contains. I have been told that if a site really is malware, that presumably the viewers own 'web-shield' would protect them from going there but at least the 90% "False positives" would not pop up on every PDF. Anyone who want to go there can copy the text and paste it to a browser or highlight and click to get google to do it for them. Most of the time, "no One" goes there , it is just provided as a reference.
I am sure there are tools that can do this but I have not yet found one other than maybe in the full Adobe Acrobat package.