topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday December 13, 2024, 4:38 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: Reliable web page capture...  (Read 71772 times)

johnk

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 245
    • View Profile
    • Donate to Member
Reliable web page capture...
« on: July 11, 2008, 12:53 PM »
In my endless quest/obsession to find the perfect information manager, I've decided that one of the key features for me is reliable web page capture. Not pixel perfect. But close enough. There are lots of other features I'm willing to compromise on, but not that one.

Now you wouldn't think that would be a problem. But it is. Most of the information managers we know and love just are not as reliable as they should be. I have licences for three of the best -- Ultra Recall, Surfulater and Evernote. All claim that web page capture is part of their feature set.

And yet compared to the free Firefox add-on Scrapbook, their performance is variable, to say the least. Pictures speak louder than words, so here's a comparison of the three programs I mention above with Scrapbook, and web capture specialists Local Website Archive and WebResearch Pro.

I took a page from a mainstream site (BBC News) that I knew would present a decent challenge.

Firefox-500x455.pngReliable web page capture... (original page in Firefox)

Scrapbook-500x455.pngReliable web page capture... (Scrapbook) LWA-500x443.pngReliable web page capture... (Local Website Archive) WR-500x463.pngReliable web page capture... (WebResearch Pro)

UR-500x475.pngReliable web page capture... (Ultra Recall) Surf attach-500x455.pngReliable web page capture... (Surfulater) Evernote-493x500.pngReliable web page capture... (Evernote)

As you can see the three programs that major on web page capture do an excellent job. Scrapbook is faultless as ever.

Ultra Recall, Surfulater and Evernote are all ugly and broken. Yes, all the content is there, but it's not as pleasant or easy to read, and not recognizable as the original page.

If a free browser add-on can manage faultless web capture, I can't see why the power user information managers can't do the same. Web Research Pro takes a lazy (but very clever) route to perfect pages -- it uses the Scrapbook engine to capture pages. Why can't other programs do the same thing?

I'm trying to reduce the number of programs I use. I want to use one program for web capture and information management. Seems logical and should be achievable. But I'm still looking...

EDIT: A new version of Ultra Recall improves web page capture -- see further post below.
« Last Edit: July 14, 2008, 05:11 AM by johnk »

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #1 on: July 11, 2008, 01:14 PM »
FireShot add on for Firefox.
Retains excellent picture zooming in and out also.
Scrollable...

https://addons.mozil...S/firefox/addon/5648


johnk

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 245
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #2 on: July 11, 2008, 01:22 PM »
Thanks for your response, cmpm, but that's not quite what I'm after. Fireshot is a screen capture add-on.

I'm perfectly happy with Scrapbook (or Local Website Archive) as a reliable web page capture program. What I want is for one of the heavyweight information managers (named in my first post) to improve their programs and start providing bullet-proof web page capture (which they should be doing already).

What sparked this post was a thread in the Kinook forums where I and others raised this issue about Ultra Recall:
http://www.kinook.co...?s=&postid=13653.

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #3 on: July 11, 2008, 01:51 PM »
Well..hmmmm...
You can save the shot to any folder you choose,
even if it's in another program.
If that helps.

I know of more that are stand alone programs.
Screenshot Capture for one.

But I don't know of any good ones built into a program like you want, sorry.

If you can find a program that lets you choose your own preferred capture program, then that might work.

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #4 on: July 11, 2008, 06:49 PM »
Zotero - my favorite page grabber (plugin) for firefox. Also free.

johnk

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 245
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #5 on: July 11, 2008, 07:03 PM »
Zotero is certainly interesting. However, although I am a dedicated Firefox user, I am trying to make sure that my long-term home for web page clippings is independent of any browser.

Also with Zotero/Scrapbook etc, it's difficult to mix and match other types of data if you're putting together a research project. That's where programs such as Ultra Recall show their strength.

J-Mac

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 2,918
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #6 on: July 11, 2008, 08:40 PM »
Thanks for your response, cmpm, but that's not quite what I'm after. Fireshot is a screen capture add-on.

I'm perfectly happy with Scrapbook (or Local Website Archive) as a reliable web page capture program. What I want is for one of the heavyweight information managers (named in my first post) to improve their programs and start providing bullet-proof web page capture (which they should be doing already).

What sparked this post was a thread in the Kinook forums where I and others raised this issue about Ultra Recall:
http://www.kinook.co...?s=&postid=13653.

John,

Notice that the last post there at the Kinook thread is from me.  And I also have licenses for Evernote and Local Website Archive.

I agree completely that none capture web pages well - at least not visually.  I'll be following this thread carefully as my needs seem to match up.

Jim

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #7 on: July 12, 2008, 09:49 AM »
I'm not sure about this one.
It's $50, but seems like a possible.

http://www.milenix.com/index.php

ashwken

  • Participant
  • Joined in 2008
  • *
  • default avatar
  • Posts: 16
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #8 on: July 12, 2008, 09:58 AM »
John,

Also following over from the UR thread where I mentioned that the IE .mht format also does the job. It would appear that there are some methods of capture that go deeper into the browser than others - obviously any method originating from (within) the browser is going to have an advantage. I don't know enough about the inner working to offer anything other than observed results.

Thanks for the comparison.

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #9 on: July 12, 2008, 10:18 AM »
http://www.websitescreenshots.com/

The server edition of WebShot comes with a DLL that will allow you to embed WebShot technology in your own applications.

I keep finding more! Love the hunt when something is found.

johnk

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 245
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #10 on: July 12, 2008, 10:33 AM »
I keep finding more! Love the hunt when something is found.
cmpm -- glad you're enjoying the thread. However programs such as Webshot, Fireshot and Screenshot Capture are very different from the ones I discussed in the first thread.

Webshot, Fireshot etc just take images of the pages -- screen grabs. They don't actually copy the page contents (i.e. they don't make a copy of the text, images, css files etc from the web server).

Programs such as Local Website Archive and WebResearch Pro actually make full copies of the page content -- the page content is copied on to your hard drive. This is much more useful. You can cut and paste the content, print it properly, edit it and index it (although one or two programs now use OCR to index screen grabs).

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #11 on: July 12, 2008, 10:39 AM »
Do you mean that the links on the web page image after the grab are clickable?

(i.e. they don't make a copy of the text, images, css files etc from the web server)

With FireShot they are not. But the image can be saved in different formats.

I really don't think I'm grasping what you are after, cause I haven't followed the other thread.

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #12 on: July 12, 2008, 10:45 AM »
Perhaps dragging and dropping the icon from the browser to the
 database/pim's would work if it's supported

This as close as I could come to something actually with the possibility.

http://www.cancellieri.org/index.htm
« Last Edit: July 12, 2008, 10:47 AM by cmpm »

ashwken

  • Participant
  • Joined in 2008
  • *
  • default avatar
  • Posts: 16
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #13 on: July 12, 2008, 11:55 AM »
John,

I realize that .mht is a MS/IE only format, but what's puzzling about UR's current state is that UR has always had such tight integration with MS products. It would appear that both browsers have a handle on page capture, but I will admit that there have been times when the Save Page As .mht has hung, failing to complete and forced shutdown of the browser is required. Sometimes you can re-launch the browser, try again and it will succeed, other times...

I would imagine that it's no small task to go out and grab all the related bits and pieces that determine how a page is rendered.

johnk

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 245
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #14 on: July 12, 2008, 12:08 PM »
I would imagine that it's no small task to go out and grab all the related bits and pieces that determine how a page is rendered.

Perhaps. Yet Scrapbook does it perfectly, time after time. As I said above, WebResearch Pro (a commercial program) chooses to use the Scrapbook engine to save web pages, because it's so reliable. Presumably there's nothing to stop Ultra Recall doing the same thing.

johnk

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 245
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #15 on: July 12, 2008, 12:35 PM »
Credit where it's due...

Kinook has just announced v3.5a of Ultra Recall. Fixes include "improved capturing of styles and formatting when storing web pages".  And they're as good as their word (repeat of test in first post):

UR-500x475.pngReliable web page capture...  (Ultra Recall v3.5)   UR 3p5a-500x475.pngReliable web page capture...  (Ultra Recall v3.5a)

This is why it's great to support the smaller software developers. They're far more likely to respond to requests for improvements. I mentioned this thread in the Kinook forums. The Kinook team obviously looked at the thread because they mentioned that v3.5a would solve the problems encountered on the page used in the test.

Ultra Recall version history: http://www.kinook.co...s=&threadid=3696
« Last Edit: July 14, 2008, 05:19 AM by johnk »

mouser

  • First Author
  • Administrator
  • Joined in 2005
  • *****
  • Posts: 40,914
    • View Profile
    • Mouser's Software Zone on DonationCoder.com
    • Read more about this member.
    • Donate to Member
Re: Reliable web page capture...
« Reply #16 on: July 12, 2008, 02:40 PM »
This is why it's great to support the smaller software developers. They're far more likely to respond to requests for improvements.

agreed  :up:

J-Mac

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 2,918
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #17 on: July 12, 2008, 05:33 PM »
Wow!  Using Ultra Recall for a week now and already more impressed than at the start of the week...

Jim

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #18 on: July 15, 2008, 08:52 PM »

J-Mac

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 2,918
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #19 on: July 15, 2008, 10:33 PM »
Sounds nice, cmpm.  Have you used it yet?  I'm curious as to what kind of results you're seeing.

Thanks!

Jim

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #20 on: July 15, 2008, 11:06 PM »
No, I haven't tried it yet.
It was in my bookmarks.
And I was just browsing through them and saw it.
It does look interesting.
But it's late here and i work early.
Why am i still up?

Must of bookmarked for some reason?
Can't remember though..lol.

J-Mac

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 2,918
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #21 on: July 16, 2008, 01:46 AM »
Oh, OK.  It's just that johnk went through so much in testing the others to determine just what they can and cannot do.  I thought that you wre trying to show others that work without the drawbacks that he mentioned.  But it looks like you're just throwing names out of any app that mentions web page capture in its feature set.

I don't think I would consider it in the same class as the ones mentioned in the original post.

Thanks!

Jim

cmpm

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 2,026
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #22 on: July 16, 2008, 03:09 AM »
i think it's worth checking out

and john is not the only one interested in these programs
his specific needs may not be in this program
others might be interested in it



johnk

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 245
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #23 on: July 16, 2008, 05:29 AM »
http://www.alcenia.c.../index.php?page=what
open source

cmpm -- Webswoon (and most of the other programs you have mentioned) are really completely different animals. They simply capture an image (picture) of the page. That obviously has some uses, but the programs I looked at in the first post have a different purpose. They actually capture all the page content from the web server (or local cache) and "re-build" the page locally. This has many advantages as I have mentioned before -- editing, printing, indexing, live (clickable) links, etc.

Cuffy

  • Participant
  • Joined in 2007
  • *
  • default avatar
  • Posts: 392
    • View Profile
    • Donate to Member
Re: Reliable web page capture...
« Reply #24 on: July 16, 2008, 11:20 AM »
How about mirroring the entire site and then picking out the page/pages you want?

"HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility.

It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.

WinHTTrack is the Windows 9x/NT/2000/XP release of HTTrack, and WebHTTrack the Linux/Unix/BSD release. See the download page. "