Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • December 10, 2016, 08:41:03 AM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: spidering search results from a website  (Read 874 times)

Target

  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 1,606
    • View Profile
    • Donate to Member
spidering search results from a website
« on: October 28, 2011, 01:03:04 AM »
I use the local library extensively and I want to be able to browse the catalogue off line, so I'm looking for some means of spidering or otherwise saving the output of whatever my search might be for further study.

Could be that there are only a few, in which case I'll just browse it, but it could also be that there are a few hundred (or thousand) if I'm doing very generic searches (their classification process leaves a bit to be desired :()

I've used tools like HTTrack before, but I'm pretty sure they won't work with dynamically generated content like this

any suggestions?

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,768
    • View Profile
    • Donate to Member
Re: spidering search results from a website
« Reply #1 on: October 28, 2011, 06:58:19 AM »
You could always save a clipping of your search results pages for later use.

I'd suggest ReadItLater or Instapaper if you want to save pages to the cloud. The categorization and organizing tools are pretty lightweight however. Their biggest advantage is they both have extensive support for smartphones ("There's a bloody app for that!) which is their biggest plus IMO. Especially since I do a lot of research using my phone whenever I get a spare minute and I'm away from my desk - which is almost always.

Both ReadItLater and Instapaper will give you a free account so it makes sense to try them both to see which, if either, you prefer. Each works slightly differently and most people have a strong preference for one or the other despite the small differences between them. (Note: Instapaper currently has better integration (i.e. there's a button for it) in most iOS based RSS readers. But RiL is rapidly closing that gap.)

For your local machine you could take a look at Canaware NetNotes, Microsoft's OneNote, the web clipper feature in Evernote, The Scrapbook extension for Firefox, or the more academically oriented Zotero. Each has its strong points. Each has maddening limitations. All have their rabid fans.

If you go this route, resign yourself to the likelihood you'll only find something close to what you're actually are looking for unless you write your own.

Depending on which of the above you go with, you may be able to park a copy of your datafile up on DropBox or some other cloud storage provider and access it from there. If you use OneNote you can take advantage of the 25Gb of storage and sharing Microsoft will give you for the asking and sync your notebooks through SkyDrive. Clipping webpages isn't as fluid as it could be with OneNote. But if you're collaborating with others, the SkyDrive feature alone is worth it's weight in gold. Microsoft intends to go hog wild with it once Windows 8 is out. Look here for their official propaganda.

I personally prefer and use NetNotes in conjunction with ReadItLater for most of what I do. I use ReadItLater as my 'junk drawer' inbox. From there it goes into NetNotes, which handles my neatly organized collections. Used in conjunction with a good RSS reader and it's amazing the amount of "good stuff" you can stay on top of. (Fantastic if you author a blog and are always on he lookout for good news or topics to blather on about.)

Maybe the above is not the most elegant combo. But I've been using it for years so I've got it down cold by now and I'm happy with it.

Your mileage may vary...

Luck! :Thmbsup:

« Last Edit: October 28, 2011, 07:18:25 AM by 40hz »