ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

how to save all *.url links inside of an entire site?

(1/2) > >>

steve_rb:
I was trying for a long time to solve thois problem but until now I couldn't find any solution for this problem:

I want to find a way or software to go through all pages of entire website and find all links with *.rul format for example and save these links inside a text file for me. URL Snooper can do this but I have to go through all pages manually one by one and have to open all pages at least once. This is not possible for a site over thousands of pages. Anyidea how to do this?

 :(

jgpaiva:
Using httrack, you can download the whole site at once, removing non-important parts (like images and such), keeping only the html pages.
Then, there are programs that using a regular expression or such, can pick all the links from those html files. One example of such program would be powergrep.

steve_rb:
httrack is very complicated and seems download whole pages. The site I am dealing with is very big and probably thousands of pages and it is not possible to download all pages. I just want the software to surf all pages and copy certain links into a text file. I am sure there should be such a software around. I have to find it. If anyone has experience in this please comment.

 :(

jgpaiva:
But wasn't your idea not to have to manually surf through each page?

httrack would allow you to download only the html files, which would not take much space (this page for exampe, takes 70-80kb).
Then, powergrep would retrive the links all at once from all the pages. All you'd have to do would be to give it a regular expression (or a normal boolean expression) to match on the html files.

steve_rb:
I tried two days but couldn't get httrack to save pages I wanted. It saves every other pages but not the pages I want. I know this is too much to ask but If you have time please give this a try and see if you can make this work. I am after URLs in the following format for example from this site:

"http://gigapedia.org/redirect.id:e27d2065095006853f16fbd84d2dfaa3.url"

Only digit part is changing for other links. I want these links to be written in a text file.

Site is "h**p://gigapedia.org"
when you enter this site use following ID to enter or you can register with your own ID and password:

ID: steve_rb
pass: ******

Then click on browse, and then Articles & Whitepapers for example, and then click on the first item on page 1, then click on links tab and you will see a few download links on this page and if you put mouse pointer on those links you will see a link in the above format at the bottom of your browser or you can right click on it and use "copy url" to copy it and pase into a text file. I want these links to be extracted inside a text file. If you didn't see any link please go back and choose second item and open links page.

 :(

Navigation

[0] Message Index

[#] Next page

Go to full version