ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > Living Room

Any ideas for a save website crawler for offline reading?

<< < (2/4) > >>

mwb1100:
Maybe WebSite-Watcher will work? 

   - http://aignes.net/features.htm

Carol Haynes:
I used to have a copy of WebSite Watcher - didn't know it could archive whole sites? I'll have another look.

Update: WSW uses an addon (Local Website Archive) to do this but from reading the description even the 'Pro' versionly seems to be able to archive individual pages. OK the Pro version lets you queue them for download but I need to collect hundreds of pages and retain the links for offline viewing. I don't see any way that Local Website Archive is set up to do that.

I could actually do that using OneNote.

mwb1100:
Have you tried these suggestions from the HTTrack FAQ:

Q: I can not access several pages (access forbidden, or redirect to another location), but I can with my browser, what's going on?
A: You may need cookies! Cookies are specific data (for example, your username or password) that are sent to your browser once you have logged in certain sites so that you only have to log-in once. For example, after having entered your username in a website, you can view pages and articles, and the next time you will go to this site, you will not have to re-enter your username/password.
To "merge" your personnal cookies to an HTTrack project, just copy the cookies.txt file from your Netscape folder (or the cookies located into the Temporary Internet Files folder for IE) into your project folder (or even the HTTrack folder)


Q: Can HTTrack perform form-based authentication?
A: Yes. See the URL capture abilities (--catchurl for command-line release, or in the WinHTTrack interface)


Q: Can I use username/password authentication on a site?
A: Yes. Use user:password@your_url (example: http://foo:[email protected]/private/mybox.html)
-http://www.httrack.com/html/faq.html
--- End quote ---

kfitting:
Local Website Archive is only for archiving single pages at a time.  I asked the author to handle multiple pages a year or two ago and he responded that he has no plans for it.

HTTrack... I am potentially an incompetent user, but it takes me forever to setup my crawl depth correctly.  I dont find it intuitive at all.  Not saying it isn't powerful, it certainly is!  But I just have not taken the time I require to figure it out sufficiently.

Carol Haynes:
I tried HTtrack - it was the first one I tried but I cna't get it to work on webform password protected sites. I don't think th esite uses a session cookie I think it is doing something with Javascript, but I am not sure.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version