Show Posts - johnk

166

General Software Discussion / Re: Reliable web page capture...

« on: July 19, 2008, 07:08 AM »

John, if I knew how to properly cerate the ini files, I would. But Martin doesn't have any instructions for this on his web site. I guess he designs purely for programmer-types.
-J-Mac (July 18, 2008, 11:13 PM)

Jim -- I can assure you, I'm no programmer. But I know my way around a computer by now and I'm familiar with writing keyboard macros (which is the most difficult bit in creating LWA ini files). The ini files are not too difficult to put together. If you'd like some help, I'm happy to do it by PM. But I agree, Martin should at least offer a wizard to guide people through setting up an ini file. The section on ini files in the LWA help file is, well, not very helpful.

The ini files are actually LWA's trump card. While LWA's direct rival, WebResearch Pro, is much more powerful and advanced in many ways, it doesn't have an equivalent of LWA's ini files, so you can't create your own import filter. So, for example, WebResearch doesn't support Thunderbird natively, so you have to export to eml, blah, blah. Swings and roundabouts...

167

General Software Discussion / Re: Reliable web page capture...

« on: July 18, 2008, 06:17 PM »

One thing about all of Martin's applications - he doesn't seem to like adding any niceties at all. Most tasks have to be done the hard way or the long way.
-J-Mac (July 18, 2008, 02:18 PM)

I know what you mean -- I was quite amazed when I started using AM-Notebook that there were no shortcut keys either to start a new note or to restore the program from the system tray -- two of the most basic and most used functions (and this was version 4!). I had to use AutoHotkey to create the shortcuts (thank goodness for AHK). To be fair to Martin, he did add a global restore hotkey when the issue was raised in his forums.

There are two sides to this, though. On one level, I actually like the .ini file approach to capturing information in LWA. It means that you can generate semi-automated capture from all kinds of programs. In the last couple of days I've created ini files for Word and Thunderbird, and they work fine. At least "the hard way" is better than "no way".

168

General Software Discussion / Re: Reliable web page capture...

« on: July 16, 2008, 05:39 PM »

I'm not actually familiar with Ultra Recall but I'm wondering - seeing as it now has good web capture, is it simply a case of you looking for the best out there, or is there something in particular missing in UR?

Tying in with that question, I'm also curious what you mean by "information management" as said in first post
-tomos (July 16, 2008, 03:57 PM)

Good questions -- wish I could give a clear answer! I don't think my needs are very complicated. You're right -- now that Ultra Recall has sorted out web capture, it's a very strong contender. The only question mark over UR is speed, which I'd define here as "snappiness" (is that a word?). I used UR for quite a while in versions 1 and 2, and it always had niggling delays in data capture. Nothing horrendous, but saving web pages was a good example -- it would always take a few seconds longer than any other program to save a page.

I haven't used v3.5a long enough to make a decision, and I still have an open mind. But I have noticed, for example, that when you open some stored (archived) pages, loading them takes quite a few seconds. A little dialog pops up saying "please wait -- creating temporary item file". You have plenty of time to read it. Scrapbook or LWA load stored pages pretty much instantly (as they should).

I use information management as a slightly more elegant way of saying "data dump". Somewhere I can stick short, medium and long-term data, text and images, everything from project research to software registration data. I want that data indexed and tagged. I want the database to be scalable. Not industrial strength, but I want it to hold a normal person's data, work and personal, over several years without choking.

The more I search, the more I think that looking for one piece of software to do everything is silly, and maybe even counter-productive. When I think about the pieces of software I most enjoy using, they tend to do one simple task well. AM-Notebook as my note-taker, for example. Not flawless, but a nice, small focused program (and interestingly, by the same person/team as LWA).

Slightly off the beaten track, but may be of interest to some following this thread: one program that has been a "slow burn" for me in this area is Connected Text, a "personal wiki". That phrase alone will put some people off, and I know wiki-style editing is not for everyone. But it's a well-thought out piece of software. I've used it for some research on long-term writing projects, and it's been reliable, A good developer who fixes bugs quickly, and good forums.

169

General Software Discussion / Re: Reliable web page capture...

« on: July 16, 2008, 12:58 PM »

How about mirroring the entire site and then picking out the page/pages you want?

"HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility...
-Cuffy (July 16, 2008, 11:20 AM)

If you're doing research on the web and darting around from site to site, mirroring entire sites isn't really practical. Take the sample page I used for the test in the first post -- I can't begin to imagine how many thousands of pages there are on the BBC News site.

No, for web research, grabbing single pages as you go is really the only efficient way to work.

Since doing the test, I've come at it from another angle -- I've been having a play with the web research "specialists", Local Website Archive (LWA) and WebResearch Pro (WRP) to see if they could compete with Ultra Recall and Surfulater as "data dumps".

WRP is the more fully-featured, and has the nicer, glossier UI. But it seems very low on the radar -- you see very little discussion about it on the web. And they don't have a user forum on their site, which always gets a black mark from me. One of the big features in the latest version is that its database format is supported by Windows Search, but I had problems getting the two to work reliably together. And my last two emails to their support department went unanswered...

LWA is a more spartan affair, but it does what it says on the tin. And with a bit of tweaking, you can add documents from a variety of sources including plain text and html, and edit them which makes it a possible all-round "data dump" for research purposes, on a small scale.

Of course LWA was never intended for such a purpose. It's a web page archive, pure and simple, and a good one. You can add notes to entries. The database is simply separate html and related files for each item. Files and content are not indexed for search. LWA doesn't have tags/categories (WRP does) so you're stuck with folders. And as LWA keeps file information (metadata) and file content in different places, it's problematic for desktop search programs to analyze it unless they know about the structure (I've already asked the Archivarius team if they'll support the LWA database structure).

LWA may be a keeper though...

170

General Software Discussion / Re: Reliable web page capture...

« on: July 16, 2008, 05:29 AM »

http://www.alcenia.com/webswoon/index.php?page=what
open source
-cmpm (July 15, 2008, 08:52 PM)

cmpm -- Webswoon (and most of the other programs you have mentioned) are really completely different animals. They simply capture an image (picture) of the page. That obviously has some uses, but the programs I looked at in the first post have a different purpose. They actually capture all the page content from the web server (or local cache) and "re-build" the page locally. This has many advantages as I have mentioned before -- editing, printing, indexing, live (clickable) links, etc.

Messages - johnk [ switch to compact view ]

General Software Discussion / Re: Reliable web page capture...

General Software Discussion / Re: Reliable web page capture...

General Software Discussion / Re: Reliable web page capture...

General Software Discussion / Re: Reliable web page capture...

General Software Discussion / Re: Reliable web page capture...