DonationCoder.com Forum

Main Area and Open Discussion => General Software Discussion => Topic started by: johnk on July 11, 2008, 12:53 PM

Title: Reliable web page capture...
Post by: johnk on July 11, 2008, 12:53 PM
In my endless quest/obsession to find the perfect information manager, I've decided that one of the key features for me is reliable web page capture. Not pixel perfect. But close enough. There are lots of other features I'm willing to compromise on, but not that one.

Now you wouldn't think that would be a problem. But it is. Most of the information managers we know and love just are not as reliable as they should be. I have licences for three of the best -- Ultra Recall (http://www.kinook.com), Surfulater (http://http://www.surfulater.com/) and Evernote (http://http://www.evernote.com/). All claim that web page capture is part of their feature set.

And yet compared to the free Firefox add-on Scrapbook (http://amb.vis.ne.jp/mozilla/scrapbook/), their performance is variable, to say the least. Pictures speak louder than words, so here's a comparison of the three programs I mention above with Scrapbook, and web capture specialists Local Website Archive (http://www.aignes.com/lwa.htm) and WebResearch Pro (http://www.macropool.com/en/products/webresearch/professional/index.html).

I took a page from a mainstream site (BBC News) that I knew would present a decent challenge.

[ You are not allowed to view attachments ] (original page in Firefox)

[ You are not allowed to view attachments ] (Scrapbook) [ You are not allowed to view attachments ] (Local Website Archive) [ You are not allowed to view attachments ] (WebResearch Pro)

[ You are not allowed to view attachments ] (Ultra Recall) [ You are not allowed to view attachments ] (Surfulater) [ You are not allowed to view attachments ] (Evernote)

As you can see the three programs that major on web page capture do an excellent job. Scrapbook is faultless as ever.

Ultra Recall, Surfulater and Evernote are all ugly and broken. Yes, all the content is there, but it's not as pleasant or easy to read, and not recognizable as the original page.

If a free browser add-on can manage faultless web capture, I can't see why the power user information managers can't do the same. Web Research Pro takes a lazy (but very clever) route to perfect pages -- it uses the Scrapbook engine to capture pages. Why can't other programs do the same thing?

I'm trying to reduce the number of programs I use. I want to use one program for web capture and information management. Seems logical and should be achievable. But I'm still looking...

EDIT: A new version of Ultra Recall improves web page capture -- see further post (https://www.donationcoder.com/forum/index.php?topic=14027.msg120595#msg120595) below.
Title: Re: Reliable web page capture...
Post by: cmpm on July 11, 2008, 01:14 PM
FireShot add on for Firefox.
Retains excellent picture zooming in and out also.
Scrollable...

https://addons.mozilla.org/en-US/firefox/addon/5648

Title: Re: Reliable web page capture...
Post by: johnk on July 11, 2008, 01:22 PM
Thanks for your response, cmpm, but that's not quite what I'm after. Fireshot is a screen capture add-on.

I'm perfectly happy with Scrapbook (or Local Website Archive) as a reliable web page capture program. What I want is for one of the heavyweight information managers (named in my first post) to improve their programs and start providing bullet-proof web page capture (which they should be doing already).

What sparked this post was a thread in the Kinook forums where I and others raised this issue about Ultra Recall:
http://www.kinook.com/Forum/showthread.php?s=&postid=13653.
Title: Re: Reliable web page capture...
Post by: cmpm on July 11, 2008, 01:51 PM
Well..hmmmm...
You can save the shot to any folder you choose,
even if it's in another program.
If that helps.

I know of more that are stand alone programs.
Screenshot Capture for one.

But I don't know of any good ones built into a program like you want, sorry.

If you can find a program that lets you choose your own preferred capture program, then that might work.
Title: Re: Reliable web page capture...
Post by: Shades on July 11, 2008, 06:49 PM
Zotero - my favorite page grabber (plugin) for firefox. Also free.
Title: Re: Reliable web page capture...
Post by: johnk on July 11, 2008, 07:03 PM
Zotero is certainly interesting. However, although I am a dedicated Firefox user, I am trying to make sure that my long-term home for web page clippings is independent of any browser.

Also with Zotero/Scrapbook etc, it's difficult to mix and match other types of data if you're putting together a research project. That's where programs such as Ultra Recall show their strength.
Title: Re: Reliable web page capture...
Post by: J-Mac on July 11, 2008, 08:40 PM
Thanks for your response, cmpm, but that's not quite what I'm after. Fireshot is a screen capture add-on.

I'm perfectly happy with Scrapbook (or Local Website Archive) as a reliable web page capture program. What I want is for one of the heavyweight information managers (named in my first post) to improve their programs and start providing bullet-proof web page capture (which they should be doing already).

What sparked this post was a thread in the Kinook forums where I and others raised this issue about Ultra Recall:
http://www.kinook.com/Forum/showthread.php?s=&postid=13653.

John,

Notice that the last post there at the Kinook thread is from me.  And I also have licenses for Evernote and Local Website Archive.

I agree completely that none capture web pages well - at least not visually.  I'll be following this thread carefully as my needs seem to match up.

Jim
Title: Re: Reliable web page capture...
Post by: cmpm on July 12, 2008, 09:49 AM
I'm not sure about this one.
It's $50, but seems like a possible.

http://www.milenix.com/index.php
Title: Re: Reliable web page capture...
Post by: ashwken on July 12, 2008, 09:58 AM
John,

Also following over from the UR thread where I mentioned that the IE .mht format also does the job. It would appear that there are some methods of capture that go deeper into the browser than others - obviously any method originating from (within) the browser is going to have an advantage. I don't know enough about the inner working to offer anything other than observed results.

Thanks for the comparison.
Title: Re: Reliable web page capture...
Post by: cmpm on July 12, 2008, 10:18 AM
http://www.websitescreenshots.com/

The server edition of WebShot comes with a DLL that will allow you to embed WebShot technology in your own applications.

I keep finding more! Love the hunt when something is found.
Title: Re: Reliable web page capture...
Post by: johnk on July 12, 2008, 10:33 AM
I keep finding more! Love the hunt when something is found.
cmpm -- glad you're enjoying the thread. However programs such as Webshot, Fireshot and Screenshot Capture are very different from the ones I discussed in the first thread.

Webshot, Fireshot etc just take images of the pages -- screen grabs. They don't actually copy the page contents (i.e. they don't make a copy of the text, images, css files etc from the web server).

Programs such as Local Website Archive and WebResearch Pro actually make full copies of the page content -- the page content is copied on to your hard drive. This is much more useful. You can cut and paste the content, print it properly, edit it and index it (although one or two programs now use OCR to index screen grabs).
Title: Re: Reliable web page capture...
Post by: cmpm on July 12, 2008, 10:39 AM
Do you mean that the links on the web page image after the grab are clickable?

(i.e. they don't make a copy of the text, images, css files etc from the web server)

With FireShot they are not. But the image can be saved in different formats.

I really don't think I'm grasping what you are after, cause I haven't followed the other thread.
Title: Re: Reliable web page capture...
Post by: cmpm on July 12, 2008, 10:45 AM
Perhaps dragging and dropping the icon from the browser to the
 database/pim's would work if it's supported

This as close as I could come to something actually with the possibility.

http://www.cancellieri.org/index.htm
Title: Re: Reliable web page capture...
Post by: ashwken on July 12, 2008, 11:55 AM
John,

I realize that .mht is a MS/IE only format, but what's puzzling about UR's current state is that UR has always had such tight integration with MS products. It would appear that both browsers have a handle on page capture, but I will admit that there have been times when the Save Page As .mht has hung, failing to complete and forced shutdown of the browser is required. Sometimes you can re-launch the browser, try again and it will succeed, other times...

I would imagine that it's no small task to go out and grab all the related bits and pieces that determine how a page is rendered.
Title: Re: Reliable web page capture...
Post by: johnk on July 12, 2008, 12:08 PM
I would imagine that it's no small task to go out and grab all the related bits and pieces that determine how a page is rendered.

Perhaps. Yet Scrapbook does it perfectly, time after time. As I said above, WebResearch Pro (a commercial program) chooses to use the Scrapbook engine to save web pages, because it's so reliable. Presumably there's nothing to stop Ultra Recall doing the same thing.
Title: Re: Reliable web page capture...
Post by: johnk on July 12, 2008, 12:35 PM
Credit where it's due...

Kinook (http://www.kinook.com) has just announced v3.5a of Ultra Recall. Fixes include "improved capturing of styles and formatting when storing web pages".  And they're as good as their word (repeat of test in first post):

[ You are not allowed to view attachments ]  (Ultra Recall v3.5)   [ You are not allowed to view attachments ]  (Ultra Recall v3.5a)

This is why it's great to support the smaller software developers. They're far more likely to respond to requests for improvements. I mentioned this thread in the Kinook forums. The Kinook team obviously looked at the thread because they mentioned that v3.5a would solve the problems encountered on the page used in the test.

Ultra Recall version history: http://www.kinook.com/Forum/showthread.php?s=&threadid=3696
Title: Re: Reliable web page capture...
Post by: mouser on July 12, 2008, 02:40 PM
This is why it's great to support the smaller software developers. They're far more likely to respond to requests for improvements.

agreed  :up:
Title: Re: Reliable web page capture...
Post by: J-Mac on July 12, 2008, 05:33 PM
Wow!  Using Ultra Recall for a week now and already more impressed than at the start of the week...

Jim
Title: Re: Reliable web page capture...
Post by: cmpm on July 15, 2008, 08:52 PM
http://www.alcenia.com/webswoon/index.php?page=what

open source
Title: Re: Reliable web page capture...
Post by: J-Mac on July 15, 2008, 10:33 PM
Sounds nice, cmpm.  Have you used it yet?  I'm curious as to what kind of results you're seeing.

Thanks!

Jim
Title: Re: Reliable web page capture...
Post by: cmpm on July 15, 2008, 11:06 PM
No, I haven't tried it yet.
It was in my bookmarks.
And I was just browsing through them and saw it.
It does look interesting.
But it's late here and i work early.
Why am i still up?

Must of bookmarked for some reason?
Can't remember though..lol.
Title: Re: Reliable web page capture...
Post by: J-Mac on July 16, 2008, 01:46 AM
Oh, OK.  It's just that johnk went through so much in testing the others to determine just what they can and cannot do.  I thought that you wre trying to show others that work without the drawbacks that he mentioned.  But it looks like you're just throwing names out of any app that mentions web page capture in its feature set.

I don't think I would consider it in the same class as the ones mentioned in the original post.

Thanks!

Jim
Title: Re: Reliable web page capture...
Post by: cmpm on July 16, 2008, 03:09 AM
i think it's worth checking out

and john is not the only one interested in these programs
his specific needs may not be in this program
others might be interested in it


Title: Re: Reliable web page capture...
Post by: johnk on July 16, 2008, 05:29 AM
http://www.alcenia.com/webswoon/index.php?page=what
open source

cmpm -- Webswoon (and most of the other programs you have mentioned) are really completely different animals. They simply capture an image (picture) of the page. That obviously has some uses, but the programs I looked at in the first post have a different purpose. They actually capture all the page content from the web server (or local cache) and "re-build" the page locally. This has many advantages as I have mentioned before -- editing, printing, indexing, live (clickable) links, etc.
Title: Re: Reliable web page capture...
Post by: Cuffy on July 16, 2008, 11:20 AM
How about mirroring the entire site and then picking out the page/pages you want?

"HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility.

It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.

WinHTTrack is the Windows 9x/NT/2000/XP release of HTTrack, and WebHTTrack the Linux/Unix/BSD release. See the download page. "
Title: Re: Reliable web page capture...
Post by: johnk on July 16, 2008, 12:58 PM
How about mirroring the entire site and then picking out the page/pages you want?

"HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility...

If you're doing research on the web and darting around from site to site, mirroring entire sites isn't really practical. Take the sample page I used for the test in the first post -- I can't begin to imagine how many thousands of pages there are on the BBC News site.

No, for web research, grabbing single pages as you go is really the only efficient way to work.

Since doing the test, I've come at it from another angle -- I've been having a play with the web research "specialists", Local Website Archive (LWA) and WebResearch Pro (WRP) to see if they could compete with Ultra Recall and Surfulater as "data dumps".

WRP is the more fully-featured, and has the nicer, glossier UI. But it seems very low on the radar -- you see very little discussion about it on the web. And they don't have a user forum on their site, which always gets a black mark from me. One of the big features in the latest version is that its database format is supported by Windows Search, but I had problems getting the two to work reliably together. And my last two emails to their support department went unanswered...

LWA is a more spartan affair, but it does what it says on the tin. And with a bit of tweaking, you can add documents from a variety of sources including plain text and html, and edit them which makes it a possible all-round "data dump" for research purposes, on a small scale.

Of course LWA was never intended for such a purpose. It's a web page archive, pure and simple, and a good one. You can add notes to entries. The database is simply separate html and related files for each item. Files and content are not indexed for search. LWA doesn't have tags/categories (WRP does) so you're stuck with folders. And as LWA keeps file information (metadata) and file content in different places, it's problematic for desktop search programs to analyze it unless they know about the structure (I've already asked the Archivarius team if they'll support the LWA database structure).

LWA may be a keeper though...
Title: Re: Reliable web page capture...
Post by: cmpm on July 16, 2008, 03:04 PM
I understand John.

Just posting for those interested in the other types of programs.

I'm glad you were specific as to what you want in 'clickable links' and the rest.
Of that I wasn't sure.
Not being familiar with the programs listed in your first post.
I just posted what I had on what I know of.
Title: Re: Reliable web page capture...
Post by: tomos on July 16, 2008, 03:57 PM
LWA may be a keeper though...

I'm not actually familiar with Ultra Recall but I'm wondering - seeing as it now has good web capture, is it simply a case of you looking for the best out there, or is there something in particular missing in UR?

Tying in with that question, I'm also curious what you mean by "information management" as said in first post
Title: Re: Reliable web page capture...
Post by: cmpm on July 16, 2008, 04:13 PM
You can save a complete webpage with the file menu in Firefox.
Save as-webpage.
I'm not sure you will get all you want.Plus it needs a internet connection.
(perhaps that is the problem)
But if you save it as a webpage the links are clickable.
Of course it will open in Firefox obviously.
But you could put the file in any folder.
Which may be the hindrance, being limited to saving in folders.

Just trying to understand.

No reply needed if irrelevant.
Title: Re: Reliable web page capture...
Post by: tomos on July 16, 2008, 04:27 PM
You can save a complete webpage with the file menu in Firefox.
Save as-webpage.

me, I'm only really familiar with Surfulater & Evernote:
the advantage of these programmes (I think) is that you can save part (or all) of a web page, you also have a link to the page, you can then organise it, maybe tag it, add a comment, maybe link it to another page, or link a file to it, etc, etc.
I was curious myself what John was after in that sense (apart from the basic web-capture)

[ You are not allowed to view attachments ]

Edit/ added attachment
Title: Re: Reliable web page capture...
Post by: johnk on July 16, 2008, 05:39 PM
I'm not actually familiar with Ultra Recall but I'm wondering - seeing as it now has good web capture, is it simply a case of you looking for the best out there, or is there something in particular missing in UR?

Tying in with that question, I'm also curious what you mean by "information management" as said in first post

Good questions -- wish I could give a clear answer! I don't think my needs are very complicated. You're right -- now that Ultra Recall has sorted out web capture, it's a very strong contender. The only question mark over UR is speed, which I'd define here as "snappiness" (is that a word?). I used UR for quite a while in versions 1 and 2, and it always had niggling delays in data capture. Nothing horrendous, but saving web pages was a good example -- it would always take a few seconds longer than any other program to save a page.

I haven't used v3.5a long enough to make a decision, and I still have an open mind. But I have noticed, for example, that when you open some stored (archived) pages, loading them takes quite a few seconds. A little dialog pops up saying "please wait -- creating temporary item file". You have plenty of time to read it. Scrapbook or LWA load stored pages pretty much instantly (as they should).

I use information management as a slightly more elegant way of saying "data dump". Somewhere I can stick short, medium and long-term data, text and images, everything from project research to software registration data. I want that data indexed and tagged. I want the database to be scalable. Not industrial strength, but I want it to hold a normal person's data, work and personal, over several years without choking.

The more I search, the more I think that looking for one piece of software to do everything is silly, and maybe even counter-productive. When I think about the pieces of software I most enjoy using, they tend to do one simple task well.  AM-Notebook (http://www.aignes.com/notebook.htm) as my note-taker, for example. Not flawless, but a nice, small focused program (and interestingly, by the same person/team as LWA).

Slightly off the beaten track, but may be of interest to some following this thread: one program that has been a "slow burn" for me in this area is  Connected Text (http://connectedtext.com/), a "personal wiki". That phrase alone will put some people off, and I know wiki-style editing is not for everyone. But it's a well-thought out piece of software. I've used it for some research on long-term writing projects, and it's been reliable, A good developer who fixes bugs quickly, and good forums.
Title: Re: Reliable web page capture...
Post by: Shades on July 16, 2008, 07:21 PM
As far as I understood, the Zotero plugin stores anything it has downloaded for showing into the browser when a snapshot is made. When I looked for some specific page for a doctor here in Paraguay it didn't take too much time to collect all necessary files and put them on his laptop so his browser showed exactly the same data as mine.

No, Internet is definitely not everywhere available in this country...(lack of phone lines and cellular antennas) and that is mostly the terrain where this doctor has to use his specific skill (reconstructing bones so people are able to walk and/or use their hands again). This is an bad side effect that occurs when people that are too related have babies, but you have here small communities like that.

Title: Re: Reliable web page capture...
Post by: cmpm on July 16, 2008, 08:20 PM
Yes I can see the intent of being able to to do what John wants with what you posted shades.

In fact one could load a ton of info on a hard drive and mail it, and the receiver would have quite a bit of info ready to go.
Title: Re: Reliable web page capture...
Post by: J-Mac on July 17, 2008, 02:23 AM
How about mirroring the entire site and then picking out the page/pages you want?

For me, that would be grabbing a whole lot extra that I don't want nor need just to get the one page that I do!

Thanks!

Jim
Title: Re: Reliable web page capture...
Post by: rjbull on July 17, 2008, 03:24 AM
me, I'm only really familiar with Surfulater & Evernote:
the advantage of these programmes (I think) is that you can save part (or all) of a web page

If I click the EverNote icon in Firefox, it pops up this message:

No text is selected. Do you want to add
an entire page to EverNote?

EverNote seems surprised that I might want to capture a complete page.  Sometimes I do, of course, and then I generally use LWA.  Yet I think EverNote's implication is sensible.  Do I really want to keep the fluff as well as real content?  No, of course I don't.  In fact I mostly use EverNote at work, for capturing news items on work-related portals.  I only want the particular news article, not all the advertising or other uninteresting (to me) items.  Which makes me wonder, how many other people need compete capture all the time?

Another nice thing about EverNote is that it can output MHT files, so if I have to, I can send potted articles to other people, complete with images and clickable links.  I wish there were a universal standard for "compiled HTML" that Firefox and other browsers used, not just IE.

Title: Re: Reliable web page capture...
Post by: tomos on July 17, 2008, 03:53 AM
I use information management as a slightly more elegant way of saying "data dump". Somewhere I can stick short, medium and long-term data, text and images, everything from project research to software registration data. I want that data indexed and tagged. I want the database to be scalable. Not industrial strength, but I want it to hold a normal person's data, work and personal, over several years without choking.

The more I search, the more I think that looking for one piece of software to do everything is silly, and maybe even counter-productive. When I think about the pieces of software I most enjoy using, they tend to do one simple task well.  AM-Notebook (http://www.aignes.com/notebook.htm) as my note-taker, for example. Not flawless, but a nice, small focused program (and interestingly, by the same person/team as LWA).

I always used Surfulater for information management/"data dump"
Evernote I use for notes & short term web research (e.g. researching monitors)
I'm now using SQLNotes for information & the other two are by the wayside but still with loads of stuff in there

I think I have to take a long look at what I want to do myself & how/if I want to continue using all these programs
SQLNotes is sticking anyways - it okay at web capture but nothing like what you want but then it is in beta, especially in that respect
BTW, I agree with all your points. When you enter content, it should be this (simple) way.

Some of the complexities (which I'll resolve) is that the HTML pane can be used in other ways. For example, you can open an HTML file from disk. Then any changes to the content updates the disk file (EN does not have this feature). You work on what looks like SN [SQLNotes] content, but it is really a local file (and eventually an FTP or other web file). You can also open a URL and view it, in this case, editing is disabled. HTM, MHT (and PDF) files are handled differently too, etc. It has many modes and managing all of these... well... needs a bit of improvements  :(

With all three you can export - I havent use Evernote that way but as rj says it can export MHT files
Surfulater will mail selected articles for you (html) and exports html and MHT
SQLNotes currently exports to html
Title: Re: Reliable web page capture...
Post by: J-Mac on July 17, 2008, 10:48 PM
To be honest, many times a pure and simple screenshot is all that I need.  It is only occasionally that I need a true and complete capture of all aspects on the web page.  For a full page screen capture it depends on the page itself as to which application I use.

For web pages that can be captured with one screenshot, I always use mouser's Screenshot Captor - you really can't beat that!  However if the page is longer than one screenshot, and must be scrolled, I then use SnagIt.  (For some odd reason, I cannot capture a scrolling page with Screenshot Captor - when I try, during the capture the window goes blank and gets very light/bright.  Browser becomes unreponsive, requiring me to end the process for Screenshot Captor via the Windows Task Manager.  About half the times I tried I also had to restart the browser, and on a few occasions I had to actually reboot!  I suspect it may be an incompatibility between Screenshot Captor and nVidia graphics cards - and possibly AMD dual core processors).

When I need all objects on a web page, I use Local Website Archive.  More recently I have been trying to use Ultra Recall, but even with that latest fix I cannot capture most secure pages at site where I am logged in.   Rather than just grabbing it UR tries to refresh the page (never works, darn it!).

Jim
Title: Re: Reliable web page capture...
Post by: rjbull on July 18, 2008, 09:53 AM
When I need all objects on a web page, I use Local Website Archive.  More recently I have been trying to use Ultra Recall, but even with that latest fix I cannot capture most secure pages at site where I am logged in.   Rather than just grabbing it UR tries to refresh the page (never works, darn it!).

That happens to me when I try it on shareware registration sites and the like.  I assume it's because you have to be securely logged in with the current browser, and the site doesn't recognise UR as being that.  You might try using LWA with the "Send keystrokes" method, where it forces the browser to save a copy of the file to disk, then reads that, rather than trying to go directly to the original page.

Interesting note: Roboform recognises WebSite-Watcher as a mini-browser and attaches a Roboform taskbar when a WSW window appears.  WSW has an option to directly archive files to LWA - at least, I think it does - so you could log in with WSW and Roboform, then use WSW to transfer the page to LWA.  It doesn't look like Roboform sees LWA as a browser in itself, even though they're both from Martin Aignesberger, but I haven't checked thoroughly.

Title: Re: Reliable web page capture...
Post by: J-Mac on July 18, 2008, 02:18 PM
That happens to me when I try it on shareware registration sites and the like.  I assume it's because you have to be securely logged in with the current browser, and the site doesn't recognise UR as being that.  You might try using LWA with the "Send keystrokes" method, where it forces the browser to save a copy of the file to disk, then reads that, rather than trying to go directly to the original page.

I used to do that, but when I reinstalled Windows on this computer I lost the ability.  You have to create an .ini file in order to allow that, and the last I had checked Martin had not done anything with that for FF3.

One thing about all of Martin's applications - he doesn't seem to like adding any niceties at all.  Most tasks have to be done the hard way or the long way.  One example is just this - having to create .ini files for sending keystrokes.  Also, if you try to select a folder in LWA that you want your capture to be stored in, if the one you would like to use doesn't exist, there is no standard "New Folder" button.  You have to stop the capture and then open the main window of LWA, create the new folder and name it, and only then go and do the capture again. A lot of little touches like that are missing and he usually isn't real keen on adding them.

Which is one of the reasons I am looking for other ways to get this done.

Interesting note: Roboform recognises WebSite-Watcher as a mini-browser and attaches a Roboform taskbar when a WSW window appears.  WSW has an option to directly archive files to LWA - at least, I think it does - so you could log in with WSW and Roboform, then use WSW to transfer the page to LWA.  It doesn't look like Roboform sees LWA as a browser in itself, even though they're both from Martin Aignesberger, but I haven't checked thoroughly.

Ultra Recall is the same - listed on RF's browser page and the toolbar is there in UR.  Doesn't seem to help, though, regarding these capture issues.

Thanks!

Jim
Title: Re: Reliable web page capture...
Post by: johnk on July 18, 2008, 06:17 PM
One thing about all of Martin's applications - he doesn't seem to like adding any niceties at all.  Most tasks have to be done the hard way or the long way. 
I know what you mean -- I was quite amazed when I started using AM-Notebook that there were no shortcut keys either to start a new note or to restore the program from the system tray  -- two of the most basic and most used functions (and this was version 4!). I had to use AutoHotkey to create the shortcuts (thank goodness for AHK). To be fair to Martin, he did add a global restore hotkey when the issue was raised in his forums.

There are two sides to this, though. On one level, I actually like the .ini file approach to capturing information in LWA. It means that you can generate semi-automated capture from all kinds of programs. In the last couple of days I've created ini files for Word and Thunderbird, and they work fine. At least "the hard way" is better than "no way".
Title: Re: Reliable web page capture...
Post by: J-Mac on July 18, 2008, 11:13 PM
John, if I knew how to properly cerate the ini files, I would.  But Martin doesn't have any instructions for this on his web site.  I guess he designs purely for programmer-types.

BTW, AM-Notebook, which I have owned since, I think, V.2, still requires you to name the note before you write it. I can't handle that!

Thanks!

Jim
Title: Re: Reliable web page capture...
Post by: johnk on July 19, 2008, 07:08 AM
John, if I knew how to properly cerate the ini files, I would.  But Martin doesn't have any instructions for this on his web site.  I guess he designs purely for programmer-types.

Jim -- I can assure you, I'm no programmer. But I know my way around a computer by now and I'm familiar with writing keyboard macros (which is the most difficult bit in creating LWA ini files). The ini files are not too difficult to put together. If you'd like some help, I'm happy to do it by PM. But I agree, Martin should at least offer a wizard to guide people through setting up an ini file. The section on ini files in the LWA help file is, well, not very helpful.

The ini files are actually LWA's trump card. While LWA's direct rival, WebResearch Pro, is much more powerful and advanced in many ways, it doesn't have an equivalent of LWA's ini files, so you can't create your own import filter. So, for example, WebResearch doesn't support Thunderbird natively, so you have to export to eml, blah, blah. Swings and roundabouts...
Title: Better Web Page Capture coming in Surfulater Version 3
Post by: nevf on August 19, 2008, 07:24 PM
I have done extensive updates to the code that captures complete Web pages in Surfulater. The BBC News page for example, now captures without the problems shown in this thread. You can see the results in my blog post. Better Web Page Capture coming in Surfulater Version 3 (http://blog.surfulater.com/2008/08/19/better-web-page-capture-coming-in-surfulater-version-3/)

Surfulater Version 3 is a major upgrade with many important new features. See our Blog (http://blog.surfulater.com) for further information. V3 is planned for release in Sept 2008. Pre-release versions with the new Tagging capability are available for download on the blog.
Title: Re: Reliable web page capture...
Post by: cmpm on October 18, 2008, 10:01 AM
http://pagenest.com/index.html

a free complete web page capture utility
with many options
from freedownloadaday.com

might fit some criteria here
Title: Re: Reliable web page capture...
Post by: kartal on November 01, 2008, 04:34 PM
I am looking for a scrapbook solution that is designed mainly for images and multimedia, has nicer import export functions, that can work with FF. It should support drag and drop fully(both in an out)  I am not talking about taking picture snapshots of the pages.  This app should mainly focus on images and multimedia that is embedded or repesented in the pages. But image screenshot could be nice addition as well.

I love Scrapbook but I hate the way it exports file and you have really no control over the content of the export.

Title: Re: Reliable web page capture...
Post by: J-Mac on November 01, 2008, 09:49 PM
I am looking for a scrapbook solution that is designed mainly for images and multimedia, has nicer import export functions, that can work with FF. It should support drag and drop fully(both in an out)  I am not talking about taking picture snapshots of the pages.  This app should mainly focus on images and multimedia that is embedded or repesented in the pages. But image screenshot could be nice addition as well.

I love Scrapbook but I hate the way it exports file and you have really no control over the content of the export.



Are you looking for just the images? Do you even need to have a rendering of the web page itself?

For images alone, SnagIt has a built-in profile that pulls all images from a web page, but the latest version, 9, is terrible. If you decide to use it, try to get version 8.2 instead.

Jim
Title: Re: Reliable web page capture...
Post by: kartal on November 01, 2008, 10:13 PM
I am mainly interested in images. But what I really really want is that when the app takes images from the site, it puts all the info like web site, time of capture, possible hyperlinks etc into iptc or exif of the images(assuming that mainly jpeg images are captured). This would be great because I d onto want to keep seperate html for images. And pdf cannot be an option because i mainly browse images via thumbnails, pdf capture of the pages would be overkill.
Title: Re: Reliable web page capture...
Post by: J-Mac on November 01, 2008, 11:50 PM
I don’t know...

I generally use the Scrapbook extension in Firefox. I can save the images individually from the captured web page, and I can also use SnagIt to grab all the images from the Scrapbook capture of a web page.

Come to think of it, if I wanted to do what I believe you are looking to do, I'd just grab the page with Scrapbook and then perform a SnagIt capture using the "Images from Web page" profile. That's all I can think of right now, mainly because that works for me.

I understand that's possibly not what you are seeking though.

Jim
Title: Re: Reliable web page capture...
Post by: cmpm on November 02, 2008, 08:44 AM
a couple of webpage rippers
for portions of the web page

https://addons.mozilla.org/en-US/firefox/search/?q=Clipmarks

http://file2hd.com/
Title: Re: Reliable web page capture...
Post by: Darwin on November 02, 2008, 09:24 AM
I've used Evernote 2, NetSnippets, and now Ask Sam 7. All of them do a good job. Ask Sam is the best of the bunch WRT retaining the exact formatting of the page being saved. Of course, it's VERY expensive (I got it when it was on sale)...
Title: Re: Reliable web page capture...
Post by: kartal on November 02, 2008, 10:31 AM
I actually like evernote(even the lates one) But it is a pain in the neck to get multiple data out of it.   I tried Ultrarecall as well and that one was pretty cool. But again that app was more than capturing web pages. I am actually looking for something more specialized, something that just focueses on capturing content and exporting.
Title: Re: Reliable web page capture...
Post by: tomos on November 02, 2008, 03:08 PM
I actually like evernote(even the lates one) But it is a pain in the neck to get multiple data out of it.   I tried Ultrarecall as well and that one was pretty cool. But again that app was more than capturing web pages. I am actually looking for something more specialized, something that just focueses on capturing content and exporting.

have you tried Surfulater - I've no idea if it does what you want but
I see the new version 3 has been released and it sounds very interesting
http://blog.surfulater.com/2008/10/20/surfulater-version-3-released/
Title: Re: Reliable web page capture...
Post by: kartal on November 02, 2008, 03:33 PM
Some people seem to love surfulater.  Personally, I tried it but it is nowhere near what I am looking for. I could not drag and drop any image from any place. It accepts them as attachments only which is totally useless really.
Title: Re: Reliable web page capture...
Post by: PPLandry on November 02, 2008, 04:09 PM
I actually like evernote(even the lates one) But it is a pain in the neck to get multiple data out of it.   I tried Ultrarecall as well and that one was pretty cool. But again that app was more than capturing web pages. I am actually looking for something more specialized, something that just focueses on capturing content and exporting.

I know you've tried InfoQube (aka SQLNotes) a while back. It does reliable web-page capture. What would be missing to be the solution for your current needs:

1- FF and IE context menu (to copy content)?
2- Simplified UI?
3- Integrated editing pane (à la EN) where items and content can be created in a single window?
4- Other...
Title: Re: Reliable web page capture...
Post by: kartal on November 02, 2008, 04:44 PM
PPLandry, I love SQlNotes. But I just did not have time to integrate into my workflow. I am hoping that when I have time I will move to it at some point. I just have already established setup and you know it really takes time to move to some another pipeline.

The main thing I hate about most of the scrapbook and info managers is that they want to gobble up everything but when it comes to giving away data(exporting) things are not that easy. Exporting and sharing data is very overlooked, which makes it a very strict workflow, especially if you choose one that has virtually no export features. And that is why I like flat file database stuff because at least I can actually get in to the folders and delete stuff, check out files etc. For that reason I stick to Wikidpad for long time because it is just simple plain text with some advanced tagging. If I want to import into something else I do not need developers to program an exporter. I can just copy and paste the plain text or import as plain text.

What I want is to be able to drag and drop data(mainly image files maybe). Dragging and dropping should be both ways. For example I want to be able to drag images from FF to the app btu also drag images from the app to explorer or another program(if it supports). I am not interested in going through save dialogs folder s etc. Sometimes I need to capture 10s of images for a half day project. So my time is generally very limited when it comes to finding out enough refrence and data to start and finish my job. Can you imagine everytime I need to save an image , i need to right click-open and save( in a very tight project deadline) ? It really does not make sense and too much effort for quick stuff.

-Drag and drop(multiple directions)

-Option to embed the image as thumbnail or full image. For example if I drag with ctrl(pressed) it embeds into apps page as thumbnail(also keeps the original image), otherwise it embeds as full image(normal d$d)

-Ability to capture time of the capture, originating web page, originating document or name of the exe file, other possible information that can be extracted from the origin. And I really want this app to put all the info about the image into IPTC field of the image, if the format supports it. Optionally a snapshot of the page could be captured besdie the image so that the user has a sense of reference as where the image was coming from

-Ability to paste images from clipboard

-Ability to paste images to clipboard

-Ability to edit the image wiothout going through the save and import loop. Basically I will select right click-edit image. Open it in photoshop save it. When the image is saved the file will be updated in the scrapbook document


-Simple cropping and resizing (to keep things simple and file size smaller)

-Ability yo create thumbnail or storyboard page from the ref images

-Ability to export selected images and pages in various formats like, html, pdf, only images, only text, putting them in subfolders or not etc

-Capture mode. This mode will work like this. When I press capture mode, the app will ask me to choose a window, for example one FF window. When I select FF window, the app would resize both itself and selected FFwindow. SO I will have 2 windows opened, sitting next to eachother. In this mode I can drag from FF to the host app easily.

-Ability to drag and drop into different topics(if the app has a tree)

-Ability to drag and drop into the tree and the app would ask me if I want to create a new topic(node) or embed into the page or just attach as a file to the topic(node)


These are things that I basically look for. I can come up with a longer list, and better usability ideas if you are seriously interested in since takes time to make a list and discusss :)

Title: Re: Reliable web page capture...
Post by: J-Mac on November 02, 2008, 07:32 PM
I actually like evernote(even the lates one) But it is a pain in the neck to get multiple data out of it.   I tried Ultrarecall as well and that one was pretty cool. But again that app was more than capturing web pages. I am actually looking for something more specialized, something that just focueses on capturing content and exporting.

I know you've tried InfoQube (aka SQLNotes) a while back. It does reliable web-page capture. What would be missing to be the solution for your current needs:

1- FF and IE context menu (to copy content)?
2- Simplified UI?
3- Integrated editing pane (à la EN) where items and content can be created in a single window?
4- Other...

Pierre,

How can I capture web pages w/IQ (SQLNotes)?  The default hotkey, Win+N doesn’t work for me, and it can't be changed - or at least it couldn’t in the 9.23 versions. Don’t know if it was fixed in the latest version - has it?

Actually you could change the hotkey in Options but it doesn’t work after changing it.

Thanks!

Jim
Title: Re: Reliable web page capture...
Post by: PPLandry on November 02, 2008, 09:59 PM
It was fixed. You need to restart the program after changing the hot key. What hot key did you set it to?
Title: Re: Reliable web page capture...
Post by: J-Mac on November 02, 2008, 11:12 PM
It was fixed. You need to restart the program after changing the hot key. What hot key did you set it to?

Didn't yet this time. When we talked about it on the IQ forum, I tried several different combinations but none seemed to take. IIRC, Armando tried it also and found it wasn’t working at that time for him either. I hadn't tried again till now since I never saw anything posted anywhere that it had been fixed.

Jim

PS - I sure wish that there was a way in Windows to see all hotkeys that are already assigned to applications. Some apps grab hotkeys without even telling you about it; you don’t see then till you accidently invoke one. Hope I pick a "free" one.
Title: Re: Reliable web page capture...
Post by: J-Mac on November 02, 2008, 11:19 PM
Pierre,

Not working still. I tried a few: Shft+Alt+N, Ctrl+Alt+N. Nothing happens at all.

When did you try to fix this? It was reported as non-working on your forum on October 23.

Jim

OK, after a restart it worked. Kinda. It opens a dialog for me to paste text into, so I have to first highlight and copy content from a web page. However it also wants a URL pasted, so I have to bring up the page again, highlight and copy the URL, then focus back on the dialog and paste the URL. The result does not look anything like the web page; just a cell filled with plain text.

It appears to just be copy/paste of text into a cell. Not sure I would compare that to capturing web pages, unless there is more to it, though I cannot find any instruction or documentation of this anywhere except your post. Is there more somewhere?

Thank you.

Jim
Title: Re: Reliable web page capture...
Post by: tsaint on November 03, 2008, 12:53 AM

PS - I sure wish that there was a way in Windows to see all hotkeys that are already assigned to applications. Some apps grab hotkeys without even telling you about it; you don’t see then till you accidently invoke one. Hope I pick a "free" one.
Is qliner at http://qliner.com/hotkeys/overview.htm of any use?
Title: Re: Reliable web page capture...
Post by: J-Mac on November 03, 2008, 03:43 AM

PS - I sure wish that there was a way in Windows to see all hotkeys that are already assigned to applications. Some apps grab hotkeys without even telling you about it; you don’t see then till you accidently invoke one. Hope I pick a "free" one.
Is qliner at http://qliner.com/hotkeys/overview.htm of any use?

Don’t see how. Do you know something about that app that I am not seeing there?

Thanks!

Jim
Title: Re: Reliable web page capture...
Post by: tsaint on November 03, 2008, 04:38 AM
Qliner site:
"Just hold the Windows key for three seconds and up pops a on screen Keyboard with icons on the keys that are configured. This you can use , not only to remind you of hotkey combinations...." and "Use to quickly look up key combinations "
Your post:
"I sure wish that there was a way in Windows to see all hotkeys that are already assigned to applications"

I don't use qliner ... thought if you wanted to see key combos in use it may be relevant.... feel free to ignore it.



Title: Re: Reliable web page capture...
Post by: tomos on November 03, 2008, 07:46 AM
OK, after a restart it worked. Kinda. It opens a dialog for me to paste text into, so I have to first highlight and copy content from a web page. However it also wants a URL pasted, so I have to bring up the page again, highlight and copy the URL, then focus back on the dialog and paste the URL. The result does not look anything like the web page; just a cell filled with plain text.

It appears to just be copy/paste of text into a cell. Not sure I would compare that to capturing web pages, unless there is more to it, though I cannot find any instruction or documentation of this anywhere except your post. Is there more somewhere?

It is copy n paste at the moment (FF+IE plugins are being worked on) 
I dont ever copy much from a webpage so I cant say much about how layout gets copied, but all formatting comes through perfectly here including e.g. tables
You're not using Opera by any chance are you ? - you can only copy text only from Opera..
Title: Re: Reliable web page capture...
Post by: cranioscopical on November 03, 2008, 08:24 AM
Qliner site:
"Just hold the Windows key for three seconds and up pops a on screen Keyboard with icons on the keys that are configured. This you can use , not only to remind you of hotkey combinations...." and "Use to quickly look up key combinations "

Thanks tsaint, not long ago I was trying to squeeze out that name from my memory!
Title: Re: Reliable web page capture...
Post by: rjbull on November 03, 2008, 08:25 AM
It was fixed. You need to restart the program after changing the hot key. What hot key did you set it to?
I sure wish that there was a way in Windows to see all hotkeys that are already assigned to applications. "free" one.

Nir Sofer's ShortCutsMan (http://www.nirsoft.net/utils/shman.html) (Short Cuts Manager) is pretty good, but you're lost if you start defining things in a hotkey program.

Title: Re: Reliable web page capture...
Post by: PPLandry on November 03, 2008, 12:24 PM
It appears to just be copy/paste of text into a cell. Not sure I would compare that to capturing web pages, unless there is more to it, though I cannot find any instruction or documentation of this anywhere except your post. Is there more somewhere?

When you paste the HTML content, make sure you paste in the HTML area (click on HTML button or Alt-H to view it).

As for having to paste both content and URL, you are correct and this will be fixed when the FF/IE add-ons are done. It is impossible to know the URL from what is in the clipboard...

It is simpler to capture the entire page. Simply copy the URL (or drag-drop it) to the URL textbox and check "Copy Content". You can choose HTML or MHT formats
Title: Re: Reliable web page capture...
Post by: J-Mac on November 03, 2008, 04:34 PM
It appears to just be copy/paste of text into a cell. Not sure I would compare that to capturing web pages, unless there is more to it, though I cannot find any instruction or documentation of this anywhere except your post. Is there more somewhere?

When you paste the HTML content, make sure you paste in the HTML area (click on HTML button or Alt-H to view it).

As for having to paste both content and URL, you are correct and this will be fixed when the FF/IE add-ons are done. It is impossible to know the URL from what is in the clipboard...

It is simpler to capture the entire page. Simply copy the URL (or drag-drop it) to the URL textbox and check "Copy Content". You can choose HTML or MHT formats


Thanks Pierre.

Is there a way to paste directly to the HTML pane within the window that the hotkey brings up? I just pasted it in the indicated "Paste" pane on that window.

Thanks again,

Jim
Title: Re: Reliable web page capture...
Post by: J-Mac on November 03, 2008, 04:37 PM
Qliner site:
"Just hold the Windows key for three seconds and up pops a on screen Keyboard with icons on the keys that are configured. This you can use , not only to remind you of hotkey combinations...." and "Use to quickly look up key combinations "
Your post:
"I sure wish that there was a way in Windows to see all hotkeys that are already assigned to applications"

I don't use qliner ... thought if you wanted to see key combos in use it may be relevant.... feel free to ignore it.





I did download it and give it a brief try - not sure what it looks at but it doesn’t show existing hotkeys other than the ones defined by Windows. I'm afraid that the Windows default hotkeys are the only ones that can be viewed. I searched this thoroughly last year and was told quite definitively at the AHK and AutoIt forums that Windows doesn’t keep a record of third party hotkeys. Which is really a pain IMO!

Thanks for the help.

Jim
Title: Re: Reliable web page capture...
Post by: tomos on November 04, 2008, 03:35 AM
Is there a way to paste directly to the HTML pane within the window that the hotkey brings up? I just pasted it in the indicated "Paste" pane on that window.

When you paste the HTML content, make sure you paste in the HTML area (click on HTML button or Alt-H to view it).

Once the add new item window is open, the Alt+H shortcut will give the HTML area focus - then you can paste
When you open the new-item-window first, the focus is in the "item" field which I guess is the equivalent of the Title of the new item. I usually write something there and then proceed to html area and paste
Title: Re: Reliable web page capture...
Post by: J-Mac on November 04, 2008, 12:39 PM
Thanks Tom.

All things considered, I think I'll use my other web clipping tools until IQ has a true clipper.

Jim
Title: Re: Reliable web page capture...
Post by: MerleOne on November 05, 2008, 01:26 AM
I personnaly rely on Metaproducts Inquiry (http://www.metaproducts.com/mp/Inquiry_Standard_Edition.htm) which is excellent.
Title: Re: Reliable web page capture...
Post by: J-Mac on November 05, 2008, 01:46 AM
I have a few apps that clip just fine. I do want to get the data into SQLNotes/IQ, and when its clipper is on par with others like Evernote, OneNote 2007, or Ultra Recall I'll use it. In the meantime it is easier for me to clip with one or more of those - depending on the type of content - and then import, copy or cut & paste, or drag & drop it into SQLNotes/IQ.   :)

That'll work for now.

Jim
Title: Re: Reliable web page capture...
Post by: PPLandry on November 09, 2008, 11:13 PM
I have a few apps that clip just fine. I do want to get the data into SQLNotes/IQ, and when its clipper is on par with others like Evernote, OneNote 2007, or Ultra Recall I'll use it. In the meantime it is easier for me to clip with one or more of those - depending on the type of content - and then import, copy or cut & paste, or drag & drop it into SQLNotes/IQ.   :)

That'll work for now.

Jim

Expect the FF extension for InfoQube in 1 -2 days. It is working and I'm testing it right now.
Title: Re: Reliable web page capture...
Post by: J-Mac on November 10, 2008, 01:07 AM
Sounds good Pierre!

Jim
Title: Re: Reliable web page capture...
Post by: tomos on November 13, 2008, 04:07 AM
just installing it now

Version 0.9.24 Pre-release 3 is out

New in this (pre) release:

    * Firefox extension to ease web page clippings. Details here: http://sqlnotes.wikispaces.com/WebClipper

 :)

more info on installation and use here
http://sqlnotes.wikispaces.com/Webclipper
Title: Re: Reliable web page capture...
Post by: vizacc on November 14, 2008, 03:33 PM

the new kid on the block...

Big attachment. whole CNet site as 1 big PNG image
[attachthumb=#1][/attachthumb]

Big attachment. whole DonationCoder site as 1 big PNG image
[attachthumb=#2][/attachthumb]

coming soon