ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Reliable web page capture...

<< < (6/16) > >>

johnk:
How about mirroring the entire site and then picking out the page/pages you want?

"HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility...
-Cuffy (July 16, 2008, 11:20 AM)
--- End quote ---

If you're doing research on the web and darting around from site to site, mirroring entire sites isn't really practical. Take the sample page I used for the test in the first post -- I can't begin to imagine how many thousands of pages there are on the BBC News site.

No, for web research, grabbing single pages as you go is really the only efficient way to work.

Since doing the test, I've come at it from another angle -- I've been having a play with the web research "specialists", Local Website Archive (LWA) and WebResearch Pro (WRP) to see if they could compete with Ultra Recall and Surfulater as "data dumps".

WRP is the more fully-featured, and has the nicer, glossier UI. But it seems very low on the radar -- you see very little discussion about it on the web. And they don't have a user forum on their site, which always gets a black mark from me. One of the big features in the latest version is that its database format is supported by Windows Search, but I had problems getting the two to work reliably together. And my last two emails to their support department went unanswered...

LWA is a more spartan affair, but it does what it says on the tin. And with a bit of tweaking, you can add documents from a variety of sources including plain text and html, and edit them which makes it a possible all-round "data dump" for research purposes, on a small scale.

Of course LWA was never intended for such a purpose. It's a web page archive, pure and simple, and a good one. You can add notes to entries. The database is simply separate html and related files for each item. Files and content are not indexed for search. LWA doesn't have tags/categories (WRP does) so you're stuck with folders. And as LWA keeps file information (metadata) and file content in different places, it's problematic for desktop search programs to analyze it unless they know about the structure (I've already asked the Archivarius team if they'll support the LWA database structure).

LWA may be a keeper though...

cmpm:
I understand John.

Just posting for those interested in the other types of programs.

I'm glad you were specific as to what you want in 'clickable links' and the rest.
Of that I wasn't sure.
Not being familiar with the programs listed in your first post.
I just posted what I had on what I know of.

tomos:
LWA may be a keeper though...
-johnk (July 16, 2008, 12:58 PM)
--- End quote ---

I'm not actually familiar with Ultra Recall but I'm wondering - seeing as it now has good web capture, is it simply a case of you looking for the best out there, or is there something in particular missing in UR?

Tying in with that question, I'm also curious what you mean by "information management" as said in first post

cmpm:
You can save a complete webpage with the file menu in Firefox.
Save as-webpage.
I'm not sure you will get all you want.Plus it needs a internet connection.
(perhaps that is the problem)
But if you save it as a webpage the links are clickable.
Of course it will open in Firefox obviously.
But you could put the file in any folder.
Which may be the hindrance, being limited to saving in folders.

Just trying to understand.

No reply needed if irrelevant.

tomos:
You can save a complete webpage with the file menu in Firefox.
Save as-webpage.
-cmpm (July 16, 2008, 04:13 PM)
--- End quote ---

me, I'm only really familiar with Surfulater & Evernote:
the advantage of these programmes (I think) is that you can save part (or all) of a web page, you also have a link to the page, you can then organise it, maybe tag it, add a comment, maybe link it to another page, or link a file to it, etc, etc.
I was curious myself what John was after in that sense (apart from the basic web-capture)

Reliable web page capture...

Edit/ added attachment

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version