Been a looong time since I last checked in to look for a solution... and even longer since I posted a question, but if there's one place on the web where I will find a bunch of interesting and relevant replies, ...

I've got a set of self-edited web pages which link to each other in various ways, where anchors may be hyperlinks, classes or names, and 'bookmarks' using divs with name/id.
<a href="">
<a href="#N0A12bf67">
<div id="tocN0A12bf67">
<a name="N0A12bf67">
<a class="file" href ="foo.txt">

I want to save that set of pages in a document, while keeping the links working, and it seems that I have to rework all the links so they stop pointing to the original web pages.

I have tried various ways of saving to PDF, have C&P into Word and OpenOffice, have even tried some pretty nifty e-book editing apps. So far, some of the links still point to the original URL.

I'm thinking that I may need to ...
a) combine all the pages into one document?
a) find every anchor.
b) decide if it's internal or external to the current document; then ignore internals as they'll continue to work.
c) modify the external links with new internal links, OR, if I have not combined all the documents into one, modify the links to erm, uh, dunno!

This last bit may depend on whether I'm making a PDF, .doc, or odf. as I suppose I need to make the revised links point to the correct filetype.

I'm pretty sure I need to combine all pages into one document so that when I make the PDF/doc/odf, each link looks within the one document rather than heading off in a failed search of a separate one.

Am thinking I can use cmd.exe to create the concatenated version and save it as HTML.
Thereafter, I would change the URLs to point at itself in the new concatenated format, then print it as a PDF.
Or C&P to word, OO etc.

Am I right so far?
If so, all I ask is whether someone knows a tool that will do all this for me!


Happy New Year!

I would make a copy of the complete file/folder construction you have in place for these self-created web-pages and start working on that copy. Make a dummy Word document (if that is the format you want to use) with all each type of link etc. that you use in your web-pages. Separate those clearly and 'Save as ...' this document as complete Webpage. Open that document in your favorite HTML and/or text editor to have the examples you need.

Then use InfoRapid (private use is free) to search-replace for everything you want changed in the complete file/folder structure in one go or one-by-one, whatever you prefer. These queries can be as simple or complex as you want (and can be stored).

Although the software is really old, from personal experience I can tell you that it works absolutely fine (and fast) on any version of Windows 2000 and up.

Combine all the edited web-pages into Word and store your document in .doc or docx format or use Word with your preferred PDF-printer to generate a PDF document. Actually, I use a portable version of LibreOffice 4.x to read Word documents and save these in PDF format directly. This standard LibreOffice PDF functionality is vastly superior to any PDF-printing solution, in my not so humble opinion. 

If you only want one big HTML formatted document, use any decent HTML editor. Most, if not all, come with an import function that will convert links etc. automatically.

Thanks Shades. I have made progress using a combination of MS Word and Libre Office Writer in reformatting, and then printing to PDF, as you suggested. Am very pleased with the native PDF settings available in LOW.

Am discovering that internal bookmarks are not a well-developed element in any text editor I've come across. It would be interesting to have a plugin of some sort for LO that listedonly bookmarks and allowed them to be edited in a separate table.

You could try the trial version of Adobe Acrobat to convert a website to PDF.

There's also Website2PDF if you want to give it a try.


