Retrieving Hyperlinks from a Word Document
One of the…joys…of working at a place like Microsoft is that you get plenty of opportunities to brush up on your reading; hardly a day goes by without you being asked to read a white paper, a spec, a proposal, a draft chapter…. As you might expect, any time you read that many things you’re bound to encounter a clunker or two; in fact, there have been several occasions when we said to ourselves, “Well, the paper isn’t very good, but there sure are a lot of useful references scattered throughout it. If only there was a way to extract the hyperlinks and discard the rest of the document.”
But, unfortunately, there’s no way to – what’s that? You say that the Microsoft Word object model includes something called the Hyperlinks collection, a collection that contains all the hyperlinks found in the document? You say that we could write a script that extracts all these hyperlinks? You say that we could then take these hyperlinks and add them to our Internet Explorer Favorites or save them as an HTML page?
Wow. Wish we’d thought of that.
Turns out that you guys were right: we can write a script that extracts all the hyperlinks from a Word document. In fact, here’s a script that does that very thing:
Set objWord = CreateObject("Word.Application")
objWord.Visible = True
Set objDoc = objWord.Documents.Open("c:\scripts\test.doc")
Set colHyperlinks = objDoc.Hyperlinks
For Each objHyperlink in colHyperlinks
Wscript.Echo objHyperlink.Address
Wscript.Echo objHyperlink.TextToDisplay
Next
That’s right: on top of being very handy this script is amazingly simple as well. We begin by creating an instance of the Word.Application object, and then set the Visible property to True. (We do that just so you’ll see Word pop up on the screen and thus get some assurance that the script is hard at work.) When then use the Open method to open the document C:\Scripts\Test.doc.
The rest is easy. We use the Hyperlinks property to return a collection of all the hyperlinks found in the document; those hyperlinks get stashed in a variable named colHyperlinks. Next we set up a For Each loop and loop through all the hyperlinks in the collection; for each hyperlink we echo the value of two properties:
Address, which is the URL of the hyperlink.
TextToDisplay, which is the text you actually click on.
For example, suppose we had a link like this, a link that takes you to the Script Center home page (
http://www.microsoft...tcenter/default.mspx):
Script Center
If this is the only hyperlink in Test.doc then our script returns output like this:
http://www.microsoft...tcenter/default.mspxScript Center
Yes, very cool.
But you’re right: not as cool as it could be. It’s nice that we can echo back URLs in a command window or a message box; unfortunately, though, you can’t click a hyperlink in a command window and be transported to that Web page. (We tried.) What would really be cool would be the ability to add these URLs to our Internet Favorites folder or to an HTML document. And because the Scripting Guys and cool are practically synonymous (hey, we said practically), then let’s do something really cool and add these URLs to our Internet Favorites:
Set objWord = CreateObject("Word.Application")
objWord.Visible = True
Set objDoc = objWord.Documents.Open("c:\scripts\test.doc")
Set colHyperlinks = objDoc.Hyperlinks
For Each objHyperlink in colHyperlinks
objHyperlink.AddToFavorites
Next
Hey, we said this was cool; we didn’t say it was hard. As you can see we pretty much used the same script as before; the only difference is that we didn’t echo back the properties of each hyperlink. Instead we simply called the AddToFavorites method and had Word add each hyperlink to our Internet Favorites. (Really: try it for yourself and see.) Notice that we don’t even have to pass the AddToFavorites method any parameters; it does all the work for us.
Of course, this also means that AddToFavorites won’t always do things exactly the way we’d like them to. For example, when testing this method we had the following hyperlink (linking to
http://www.microsoft...s/qanda/default.mspx) in our document:
Hey, Scripting Guy!
When this was added to our Internet Favorites it looked like this:
Default.mspx (
www.microsoft.com)
The link worked fine; it just had a different name than we expected.
Of course, you might not want every hyperlink automatically added to your Internet Favorites. But that’s OK; it’s very easy to create an HTML page that includes all these links. All you’d have to do then is open that particular page and start clicking links. For example, here’s a script that creates a file named C:\Scripts\Links.htm. The script then grabs all the hyperlinks from our Word document and uses HTML tagging to create a corresponding link in Links.htm:
Set objWord = CreateObject("Word.Application")
objWord.Visible = True
Set objDoc = objWord.Documents.Open("c:\scripts\test.doc")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.CreateTextFile("c:\scripts\links.htm")
Set colHyperlinks = objDoc.Hyperlinks
For Each objHyperlink in colHyperlinks
objFile.WriteLine "<a href=" & chr(34) & objHyperlink.Address & Chr(34) & ">" & _
objHyperlink.TextToDisplay & "</a href><br>"
Next
objFile.Close
We won’t talk about the HTML tagging today; that’s a bit outside the scope of this column. However, you can find more information in the HTML and DHTML Reference on MSDN. In the meantime, here’s what your finished product might look like:
Internet Explorer
Now that’s cool.
Top of pagethis may be another answer i haven't tried. I keep tryng to locate the script i applied