ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

links collector

(1/4) > >>

kalos:
hello

1)
is there a way to grab and store (in a txt file) all the "links" or "urls in the text" of all the webpages I visit, that contain a specific string eg urls like www.*.com/*.pdf ?

the program must scan the text and links of all the webpages I visit and if it finds an url of the above mask, it should store it (in a text file)

2)
I would like a program that will store (in a text file) the urls of the webpages I visit that match a specific mask eg www.google.com/*

thanks!

jgpaiva:
I know this'd be pretty easy: the only problem is getting the links itself, the parsing and storing part is pretty trivial.

I don't know how toget that info, though. If anyone knows how to, please just post here.

steeladept:
If you are only asking about links, why not parse out only the <a> tags?  The anchor tags have to exist with the correct URL for it to link to another page, so that could quickly and easily filter out everything else.  The only problem that would need to be figured out is if someone posted it in plain text without a link.

The logical expression for the filter would be:  Find <a, then find href=, then return the url.  Match this to the mask and place it in xxxx file.

Unfortunately I don't know any coding for that beyond the code for the links in the html, but I hope that helps.

jgpaiva:
steeladept: I think that kalos is referring to collecting the links as he browses the web, thus, the links would have to be collected directly from the webbrowser or something like that.
At least, that's what i understand :)

steeladept:
I see what you mean.  Still, isn't there a way to choose href from link "on click" or something like that?

Navigation

[0] Message Index

[#] Next page

Go to full version