topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday December 12, 2024, 2:31 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Download managers, spiders/spyders, link mining/harvesting  (Read 6224 times)

new.46h7456

  • Participant
  • Joined in 2011
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Download managers, spiders/spyders, link mining/harvesting
« on: November 10, 2011, 11:57 AM »
I'v tried learning to use some download managers. Problem I face is a issue that I believe must be already be solved but in no place I look.
A good download manager can import a list of download's or get all from a web page already opened.
There seems to be plenty of software for downloading whole web sites. And often many users would not wish to download WHOLE web sites (specially if the job could take 5-15 times longer and fill your disk (maybe before finished) ), the spidering I tried was too automatic.
If you believe you understand what the next links in a big collection of content should be (i.e.
example.net/page1/File_Download_PDF_Part1.pdf (.doc, .jpg, .mp3, .mov, .flv)
example.net/page1/File_Download_PDF_Part2.pdf
example.net/page1/File_Download_PDF_Part3.pdf
example.net/page1/File_Download_PDF_Part4.pdf (web sites of course can have hundreds, and I've found refer to be homepage, or even none)
) then you could make link list, add them to a download manager, and see how many there is available by (quarrying size) and choose how many to que.

But for
example.net/pageName/download1.ext
example.net/pageSomeOtherPageName/download1.ext
example.net/pageYetSomeOtherPageName/download1.ext
you need to find the URL's first. If they are all on one page, it's easy.
If links are hundreds and not listed on one page, (instead page1, page2,..page89), then users would need something to mine/harvest/spider URL's
and make a txt file. It would be far better to link harvest without having to web browse dozens of pages. The user can then use the partial or full list, (by) importing this list to their installed download manager, choose how much/which parts to download, choose order, even throttle down, ( FreeDownLoadManager.org has option file>import list from clipboard or text file ).

Many download managers are out there, and are built to handle queing dozens (hundreds) of files, but appear to be missing this function to link mine/harvest, to get their link que from large freeware-mp3 and image&vid-promo collections.

I'v tried http://tools.seobook.com/link-harvester/ ( http://tools.seobook...klinks/backlinks.php ), but for some reason, I find no url works.

The FireFox add-on Link Gopher would be good, but it only works from the page you are on, it goes zerro pages deep, (and you can't feed it a list).
It would be far better to link harvest without having to web browse dozens of pages, it would be much better if plugin (or app) could be given the starting url(s), user choose number of pages deep, choose weather to follow both example.net and hosted.example.net in same quary, and leave downloading all the highly bulky content to dl managers.

OutWit Hub (light), is a much bigger app/plugin. I'v not seen it yet compile url list from multiple pages.

If a developer want's to make it complete, I'd say have the above selections in one box, adding in check/uncheck boxes to spider/harvest links

[]follow to other domains
[]down
example.net/SomeCategory/ may return
example.net/SomeCategory/somesubcategory/FILE.PDF and
example.net/SomeCategory/someothersubcategory/OtherFileName.PDF

[]sideways
example.net/SomeCategory/ may return
example.net/SomeOtherCategory/FILE.PDF

[]up and sideways
example.net/SomeCategory/ may return
example.net/FILE.PDF

(if user just types example.net/ and checks spider downward(s), then []sideways, and []up and sideways,would be unnecessary)

So many users have downloaded and installed download managers. I would think something a step-up from link gopher (one that goes at least one page deep, or accepts a start list) would be popular.

If anyone has something more than link gopher, please post it here. I also thought this would be a much easier program to create than the spiders that attempt to download whole sites.


new.46h7456

  • Participant
  • Joined in 2011
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Re: Download managers, spiders/spyders, link mining/harvesting
« Reply #1 on: November 11, 2011, 12:53 AM »
URL Snooper appears to usefull to capture the dl link of somethings, but as I read it, it appears it captures URL's of DL streams (only) in current bandwidth (and maybe something on one page, that is open). I do not see anything that says URL Snooper will spider 1 or more pages deep.

app103

  • That scary taskbar girl
  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 5,885
    • View Profile
    • Donate to Member
Re: Download managers, spiders/spyders, link mining/harvesting
« Reply #2 on: November 11, 2011, 05:26 AM »
How about something like HTTrack? It can filter by file type, so you can point it at a website and have it download just the pdf files, if that is all you want. And it can update your local copy, downloading only what is new since the last time you downloaded the site.

new.46h7456

  • Participant
  • Joined in 2011
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Re: Download managers, spiders/spyders, link mining/harvesting
« Reply #3 on: November 11, 2011, 08:07 AM »
How about something like HTTrack? It can filter by file type, so you can point it at a website and have it download just the pdf files, if that is all you want. And it can update your local copy, downloading only what is new since the last time you downloaded the site.
Glade to see ur suggestion. hmm.. , I'v read the partial screen capture of HTTrack options from homepage ..
I seen low limit in bytes (before aborting) can be chossen.
Very good to see number of connections can be choosen.. .I'v started to DL HTTrack. I hope to find I can copy/export the links it finds and/or down-throttle it's max throughput. course,..  a good network manager could throttle this & other applications individually. so I believe the copy/export the links it finds to be more useful than self down-throttling.
(hope I do not need to enter so many video extensions to get them, but i think .mov, .flv, .wmv, .wma, .mpg, .mpeg, are the most common)

I read of URL Snooper appears to usefull to capture the dl link of somethings, but as I read it, it appears it captures (only?) URL's of DL streams in current bandwidth (maybe? links on an open www page).
URL Snooper should really be usefull for some URL's, (but I did not see anything yet that says URL Snooper will spider 1 or more pages deep)

I'll be listing to this thread for a long while,..least till I find I can copy/export the links from spider, and post applications for users to follow.

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,859
    • View Profile
    • Donate to Member
Re: Download managers, spiders/spyders, link mining/harvesting
« Reply #4 on: November 11, 2011, 08:13 AM »
How about something like HTTrack? It can filter by file type, so you can point it at a website and have it download just the pdf files, if that is all you want. And it can update your local copy, downloading only what is new since the last time you downloaded the site.

+1 w/App103 on HTTrack. :Thmbsup:

Ten minutes perusing the docs, followed by a quick recon to see how your target website's structure works, and you can fine tune HTTrack to do just about anything you'd want it to. At worst, you'll download a little more (or a little less) than what you actually wanted. In which case, just tweak your settings and run it again.

Very sweet little utility program. Highly recommended. 8)

app103

  • That scary taskbar girl
  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 5,885
    • View Profile
    • Donate to Member
Re: Download managers, spiders/spyders, link mining/harvesting
« Reply #5 on: November 11, 2011, 09:05 AM »
I read of URL Snooper appears to usefull to capture the dl link of somethings, but as I read it, it appears it captures (only?) URL's of DL streams in current bandwidth (maybe? links on an open www page).
URL Snooper should really be usefull for some URL's, (but I did not see anything yet that says URL Snooper will spider 1 or more pages deep)

URL Snooper is not designed to spider sites. It is for obtaining the real URLs of streaming media content, such as a video or audio that might be playing in a flash player where the URL of the actual media file might be hidden, not obvious or obtainable any other way. You start playing it and URL Snooper tells you the website's secrets so you can download the file.  ;)