Welcome Guest.   Make a donation to an author on the site October 21, 2014, 11:46:57 AM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
The N.A.N.Y. Challenge 2011! Download 30+ custom programs!
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: links collector  (Read 4915 times)
kalos
Member
**
Posts: 1,049

View Profile Give some DonationCredits to this forum member
« on: July 08, 2007, 12:20:57 AM »

hello

1)
is there a way to grab and store (in a txt file) all the "links" or "urls in the text" of all the webpages I visit, that contain a specific string eg urls like www.*.com/*.pdf ?

the program must scan the text and links of all the webpages I visit and if it finds an url of the above mask, it should store it (in a text file)

2)
I would like a program that will store (in a text file) the urls of the webpages I visit that match a specific mask eg www.google.com/*

thanks!
Logged
jgpaiva
Global Moderator
*****
Posts: 4,710



Artificial Idiocy

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #1 on: July 08, 2007, 05:43:54 AM »

I know this'd be pretty easy: the only problem is getting the links itself, the parsing and storing part is pretty trivial.

I don't know how toget that info, though. If anyone knows how to, please just post here.
Logged

steeladept
Supporting Member
**
Posts: 1,056



Fettucini alfredo is macaroni & cheese for adults

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #2 on: July 08, 2007, 10:27:45 AM »

If you are only asking about links, why not parse out only the <a> tags?  The anchor tags have to exist with the correct URL for it to link to another page, so that could quickly and easily filter out everything else.  The only problem that would need to be figured out is if someone posted it in plain text without a link.

The logical expression for the filter would be:  Find <a, then find href=, then return the url.  Match this to the mask and place it in xxxx file.

Unfortunately I don't know any coding for that beyond the code for the links in the html, but I hope that helps.
Logged
jgpaiva
Global Moderator
*****
Posts: 4,710



Artificial Idiocy

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #3 on: July 08, 2007, 11:57:32 AM »

steeladept: I think that kalos is referring to collecting the links as he browses the web, thus, the links would have to be collected directly from the webbrowser or something like that.
At least, that's what i understand smiley
Logged

steeladept
Supporting Member
**
Posts: 1,056



Fettucini alfredo is macaroni & cheese for adults

see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #4 on: July 08, 2007, 03:05:33 PM »

I see what you mean.  Still, isn't there a way to choose href from link "on click" or something like that?
Logged
kalos
Member
**
Posts: 1,049

View Profile Give some DonationCredits to this forum member
« Reply #5 on: July 08, 2007, 05:05:39 PM »

mm, I cant imagine what the event that will trigger the scanning and grabing would be

I suppose some kind of browser indegration or a way to see what webpages I visit from within the browser

Logged
steve_rb
Participant
*
Posts: 14

View Profile Give some DonationCredits to this forum member
« Reply #6 on: July 09, 2007, 07:34:33 AM »

URL Snooper can nicely grab and list all links on the pages you visit. Just click on sniff and go to your browser and start browsing. Even you can tell URL Snooper to filter listed ;links with any text you want. This is a great software but unfortunatly I wanted it to go through all pages and save links without me surfing those pages. Pitty it can't do this

 Sad
Logged
jgpaiva
Global Moderator
*****
Posts: 4,710



Artificial Idiocy

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #7 on: July 09, 2007, 08:43:31 AM »

Very good point, steve!
I haven't tried it, but that method does seem to work cheesy
Logged

kalos
Member
**
Posts: 1,049

View Profile Give some DonationCredits to this forum member
« Reply #8 on: July 10, 2007, 09:35:53 AM »

thanks steve

however I doubt if sniffing is accurate and reliable

as for what you need, a web spider/crawler would do that, but I dont know any good one
Logged
kalos
Member
**
Posts: 1,049

View Profile Give some DonationCredits to this forum member
« Reply #9 on: July 13, 2007, 05:22:57 PM »

I get this error Sad

Logged
mouser
First Author
Administrator
*****
Posts: 33,571



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #10 on: July 13, 2007, 07:07:44 PM »

someone else was posting about this error before too, what is causing this mystery npptools.dll error!
Logged
mouser
First Author
Administrator
*****
Posts: 33,571



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #11 on: July 13, 2007, 07:11:44 PM »

saw this post recently:
http://jwsecure.com/dan/2006/12/

kalos,
Can you try this:
Logged
kalos
Member
**
Posts: 1,049

View Profile Give some DonationCredits to this forum member
« Reply #12 on: October 30, 2007, 08:39:14 AM »

mmm

URL Snooper is a great program, but not exactly the way to go for this situation, for two reasons:

1) it doesn't integrate within the browser
2) it doesn't autosave files, text, links
3) it sniffs the network, trying to get "hidden" files, while I only need to just see what is seen whithin the browser, nothing more

I was looking for a firefox extension or an opera javascript that will see the text and links of each webpage I visit and it will save specific links/text/files

there is no need to sniff the network in order to catch "hidden" urls etc, it's overkill

the problem is that most of programs that attempt to do what I need (offline browsers, etc) require you to enter a starting web address, then specify retrieval options and then let the program do the job

but what I want is an integrated to the browser solution to do this within my browser "as I browse"

an auto-bookmarker, an auto-file-saver, an auto-text-saver that will save info as I browse the net automaticaly

thanks
« Last Edit: October 30, 2007, 08:41:31 AM by kalos » Logged
kalos
Member
**
Posts: 1,049

View Profile Give some DonationCredits to this forum member
« Reply #13 on: November 17, 2007, 06:20:09 PM »

first, I need something that will "monitor" every webpage I visit, as I browse the net
this monitoring has to be very accurate ofcourse, which means it must not miss any webpage even the webpages that are partially loaded etc

by "monitoring webpages" I mean to grab the text, links and files of every webpage I visit
by "the text of the webpage", I mean the text that is highlighted/selected when we click ctrl+A in a webpage (included any other "hidden" text, etc)
by "the links of the webpage", I mean the links that are grabbed when we hit ctrl+alt+L in Opera or any other method that shows all the links of the webpage (included any "hidden" links, javascript links, etc)
by "the files of the webpages", I mean the files that are included in the folder that is created when we save a webpage which created an html file and a folder (and any other hidden files, embedded files, etc)

as far as I know (and if you know something else, please inform me) the available methods that can monitor web browser traffic are these:
javascript can monitor webpages as I browse the net (opera, for example, has this javascript function: document.addEventListener('DOMContentLoaded',function() { where it does things when the webpages are loaded)
internet connection sniffer can monitor webpages as I browse the net, that can sniff urls
web proxy can monitor webpages as I browse the net, as it works as a cache proxy

then I need to apply filters to specify which of the text, links and files are useful and then we need to save the filtered information

any help would be much appreciated

thank you
Logged
Ralf Maximus
Supporting Member
**
Posts: 927



View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #14 on: November 17, 2007, 08:48:07 PM »

Does it need to be real-time?

If you're using IE6 or 7, the browser's cache is just a collection of files that can be accessed via the file system.  I imagine any file-search utility that does regular expressions (FileLocator Pro?) could suss out the patterns you've described.  For a fact I know UltraEdit's file-search feature will do this.

This is not real-time scanning, but you could kick off such a search after your browsing session is complete.
Logged
rjbull
Charter Member
***
Posts: 2,775

View Profile Give some DonationCredits to this forum member
« Reply #15 on: November 18, 2007, 11:35:00 AM »

I doubt it will satisfy kalos, but don't forget this:

Quote

VisitURL: Flexible, efficient, lightweight bookmark manager

VisitURL is not a fully-fledged bookmark manager. It is not a replacement for Netscape's bookmark file or Internet Explorer's Favorites. It does not organize bookmarks into categories.

VisitURL is designed to help maintain a handy (as in: at hand) list of URLs that you intend to visit. For instance, if a friend sends you an email recommending that a particular URL, Visit is a good place to store the URL until you are ready to launch your browser and go surfing. If you copy the URL to clipboard, Visit will automatically intercept it and save to its database. If you copy several URLs at once, Visit will get all of them. (You may also add URLs manually or directly from an open browser window.) If you copy the URL with some text around it, Visit will optionally treat that text as a description for the URL you copied.

To access the bookmarked site, you can either view the HTML page that Visit creates in your browser, or click a toolbar button to launch the browser directly from Visit. There is no limit to how many URLs you may store, though the program is primarily designed to hold, view and edit a short, temporary list. Netscape and Explorer tend to consume so much system resources that it's not practical to keep them loaded at all times - this is where Visit comes in.

Logged
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.037s | Server load: 0 ]