topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Saturday December 14, 2024, 6:31 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: how to save all *.url links inside of an entire site?  (Read 8866 times)

steve_rb

  • Participant
  • Joined in 2006
  • *
  • default avatar
  • Posts: 14
    • View Profile
    • Donate to Member
how to save all *.url links inside of an entire site?
« on: July 09, 2007, 07:29 AM »
I was trying for a long time to solve thois problem but until now I couldn't find any solution for this problem:

I want to find a way or software to go through all pages of entire website and find all links with *.rul format for example and save these links inside a text file for me. URL Snooper can do this but I have to go through all pages manually one by one and have to open all pages at least once. This is not possible for a site over thousands of pages. Anyidea how to do this?

 :(

jgpaiva

  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 4,727
    • View Profile
    • Donate to Member
Re: how to save all *.url links inside of an entire site?
« Reply #1 on: July 09, 2007, 08:17 AM »
Using httrack, you can download the whole site at once, removing non-important parts (like images and such), keeping only the html pages.
Then, there are programs that using a regular expression or such, can pick all the links from those html files. One example of such program would be powergrep.

steve_rb

  • Participant
  • Joined in 2006
  • *
  • default avatar
  • Posts: 14
    • View Profile
    • Donate to Member
Re: how to save all *.url links inside of an entire site?
« Reply #2 on: July 09, 2007, 10:03 AM »
httrack is very complicated and seems download whole pages. The site I am dealing with is very big and probably thousands of pages and it is not possible to download all pages. I just want the software to surf all pages and copy certain links into a text file. I am sure there should be such a software around. I have to find it. If anyone has experience in this please comment.

 :(

jgpaiva

  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 4,727
    • View Profile
    • Donate to Member
Re: how to save all *.url links inside of an entire site?
« Reply #3 on: July 09, 2007, 10:07 AM »
But wasn't your idea not to have to manually surf through each page?

httrack would allow you to download only the html files, which would not take much space (this page for exampe, takes 70-80kb).
Then, powergrep would retrive the links all at once from all the pages. All you'd have to do would be to give it a regular expression (or a normal boolean expression) to match on the html files.

steve_rb

  • Participant
  • Joined in 2006
  • *
  • default avatar
  • Posts: 14
    • View Profile
    • Donate to Member
Re: how to save all *.url links inside of an entire site?
« Reply #4 on: July 12, 2007, 05:38 AM »
I tried two days but couldn't get httrack to save pages I wanted. It saves every other pages but not the pages I want. I know this is too much to ask but If you have time please give this a try and see if you can make this work. I am after URLs in the following format for example from this site:

"http://gigapedia.org...3f16fbd84d2dfaa3.url"

Only digit part is changing for other links. I want these links to be written in a text file.

Site is "h**p://gigapedia.org"
when you enter this site use following ID to enter or you can register with your own ID and password:

ID: steve_rb
pass: ******

Then click on browse, and then Articles & Whitepapers for example, and then click on the first item on page 1, then click on links tab and you will see a few download links on this page and if you put mouse pointer on those links you will see a link in the above format at the bottom of your browser or you can right click on it and use "copy url" to copy it and pase into a text file. I want these links to be extracted inside a text file. If you didn't see any link please go back and choose second item and open links page.

 :(
« Last Edit: July 14, 2007, 12:38 AM by steve_rb »

jgpaiva

  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 4,727
    • View Profile
    • Donate to Member
Re: how to save all *.url links inside of an entire site?
« Reply #5 on: July 12, 2007, 06:07 AM »
Ok, i now understand your problem.
Those pages need a login to be accessed, and apparently httrack doesn't support that. (i actually am surprised, i thought it did)
There should be similar programs out there that do support autentication.

Anyone?

steve_rb

  • Participant
  • Joined in 2006
  • *
  • default avatar
  • Posts: 14
    • View Profile
    • Donate to Member
Re: how to save all *.url links inside of an entire site?
« Reply #6 on: July 12, 2007, 07:14 AM »
No. I tried httrack and it works ok. After loging, ID and password goes to cookies and I think thats why httrack works ok. It saved a lot of pages and they are all ok. It just doesn't save links pages. It even save descriptions pages OK. This shows httrack enters inside the site and start saving html pages but I think maybe proper choice of parameters and options could make it to save links pages too. I tried for two days but couldn't do it. Some experts in httrack should help here.

 :(
« Last Edit: July 12, 2007, 07:17 AM by steve_rb »

jgpaiva

  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 4,727
    • View Profile
    • Donate to Member
Re: how to save all *.url links inside of an entire site?
« Reply #7 on: July 12, 2007, 12:58 PM »
I see steve. It must be using the cokies of internet explorer.
It probably doesn't save those pages because they require you to click on that <link> tab, which appears to be javascipt.

I'm sorry but i see no simple way to retrieve those links :(