Welcome Guest.   Make a donation to an author on the site April 19, 2014, 12:11:43 AM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
View the new Member Awards and Badges page.
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: Idea request: "Site discovery tool"  (Read 3840 times)
iphigenie
Supporting Member
**
Posts: 1,165


curiosity FTW!

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« on: November 05, 2007, 08:03:14 AM »

I'm trying to find a pragmatic solution to the following task:

Find new pages/sites/posts around a set of topics in as automated a way as possible, in an ongoing way.
By new I mean either is newly arrived/posted, or not seen before because too "deep" in the "dark" web so we havent gotten to the 21000th page on google where they are...

So in a way it would be a tool that
- is able to trawl a bunch of directories, blogs, feeds
- is also able to run a whole list of searches in google/yahoo/ask etc. or some aggregator sites, possibly starting at page 100 to avoid all the noisy stuff at the top
- gather all URLs referred in this
- then it should be able to discard any site/url previously seen aka 'greylist') so as to produce a list of new things to review
- update the greylist

In a way this combines features from a search tool, and update watcher tool, a web ferret, and URL catcher etc.

If I had to write it I would probably write a perl application with a topic manager, a web crawler, possibly even plug in a bayesian toy to help ignore spammy sites, a results list with preview... But I dont have time, and I am not sure this is something I can just commission and hope to get something workable (= better than using manual methods) at a reasonable cost.

So it got me thinking about how some of the benefits of this tool might be achieved by combining existing tools, say by plugging website watcher with search engine results URLs with the different search terms hard coded in each. Now this doesn't quite work, as WSW would highlight too many other changes in content rather than purely new stuff, but there just might be ways...

So it's not really a coding snack, it's more of a software jigsaw puzzle.
Logged
Veign
Charter Honorary Member
***
Posts: 993



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #1 on: November 05, 2007, 08:17:09 AM »

What about using something like Google Alerts:
http://www.google.com/alerts

I have 100's of alerts I monitor.  Mainly my stuff but also some competitor and client stuff too.
Logged

iphigenie
Supporting Member
**
Posts: 1,165


curiosity FTW!

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #2 on: November 05, 2007, 08:25:04 AM »

I can't remember why i didnt like it the last time, could have just been my mistrust about google or it didnt go "deep" enough. But I will try it again since you say it can work with hundreds of them  Cool.

Thanks smiley
Logged
justice
Supporting Member
**
Posts: 1,885



Solve issues simply.

View Profile WWW Give some DonationCredits to this forum member
« Reply #3 on: November 05, 2007, 08:42:34 AM »

What about http://technorati.com/ search for a subject and you see the updates regarding this subject on your screen, there's even a watcher. However note most if not all the results are from blogs, so that might not be suitable for your needs.
Logged

iphigenie
Supporting Member
**
Posts: 1,165


curiosity FTW!

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #4 on: November 05, 2007, 08:46:52 AM »

Technorati has quite a niche focus, but I see what you mean.

We already use a lot of feed aggregators and the like. Our guy already checks a lot of these, and hangs out in forums etc. But I am trying to make his job easier as it can easily takes hours out of the day just to find a few tidbits of missing informaton.

I guess what I want is something to help us find the bits that dont make it on to these. The pages that are too specialist, or have nothing funny or trendy, but have real useful information to our topic. Pages that are at the bottom of technorati or similar sites, sites that are on page 2000 on google, sites that havent yet made it, hidden on information rich pages on company websites.

Logged
icekin
Supporting Member
**
Posts: 263

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #5 on: November 05, 2007, 09:20:02 AM »

Sometime back, I saw this site on Searchbots (http://www.searchbots.net/) The idea is that you list some topics and build a searchbot which goes off and searches around for content. Over time, it gathers more content and then aggregates them. Every day, it returns fresh content.

Pubsub (http://www.pubsub.com) was another search engine that could return results based on current recently indexed news content. For finding those high quality, yet hard to locate sites, I suggest directories over search engines. Complete Planet (http://www.completeplanet.com) has a listing of several specialist directories on the internet.

I wrote an article about searching the internet about 2 years ago, but I failed to maintain my site and it was taken down. You can find its last version on the WaybackMachine : http://web.archive.org/we...f2o.org/look_at_searching

Some of the stuff is a bit outdated, but much of it can still be used.
Logged
iphigenie
Supporting Member
**
Posts: 1,165


curiosity FTW!

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #6 on: November 05, 2007, 10:57:47 AM »

Thanks icekin, I can see you spent a lot of time thinking about these things  Thmbsup
Logged
choicefresh
Participant
*
Posts: 20


View Profile Give some DonationCredits to this forum member
« Reply #7 on: November 19, 2007, 07:11:10 PM »

Have you heard of StumbleUpon??? That sounds like exactly what you're looking for...
Logged
iphigenie
Supporting Member
**
Posts: 1,165


curiosity FTW!

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #8 on: November 21, 2007, 05:24:36 PM »

Google alerts certainly cannot do the job - they illustrate everything that is wrong about google's approach when you go for more topical/targeted information.

I have heard of stumble upon, but... I want the bits that arent popular enough in topics "light" enough to make it through filters like stumble upon... we're talking professional topics, and not in web dev or something popular like that, and stumble upon stumbles a bit there...

Or in a way I am looking for a way to automate what we are doing now, which equates to trawling numerous blogs and forums, subscribe to newsfeeds, blogrools, press release services, do regular deep searches, watch 100 companies... to find tidbits of information, pages, white papers etc. which are in the dark/deep web. The opposite of what digg/technorati/stumble upon do, which tries to fload some things to the surface because they are fun/cool/intriguing to a large number of people.

Some bits I know how to improve, using watchers, feed readers, search tools - but especially when it comes to the search tool I would like something more clever than the usual search. Something that remembers what i have already found before and doesnt show it again unless it has changed a lot. Something that can blacklist things and ignore things and make it easier for me to see stuff that i havent seen before...

I guess it's a "i wish i still had time to code" moment :S
« Last Edit: November 21, 2007, 05:28:16 PM by iphigenie » Logged
app103
That scary taskbar girl
Global Moderator
*****
Posts: 5,020



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #9 on: November 22, 2007, 08:20:38 PM »

You might want to give Copernic Agent a try. Professional edition uses multiple search engines to find results, specialized search engines based on the type of info you are looking for (you can even do patent and ftp searches with it), tracks results, etc. This was THE way of finding info back when there wasn't a Google to use.

You could combine it with something like Newzie, which has features for alerting for certain keywords in rss feeds, which it can do at every feed update. It can also act as a site watcher, notifying you of not only changes on a site, but I believe it can notify of changes that trigger the same keywords alerts you have set up for feeds. (Newzie is pretty powerful!)
Logged

iphigenie
Supporting Member
**
Posts: 1,165


curiosity FTW!

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #10 on: November 26, 2007, 10:35:48 AM »

What I really really dream of is the Copernic Agent but with the extra feature of "hide sites I have already seen"
Logged
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.054s | Server load: 1 ]