DonationCoder.com Forum
Main Area and Open Discussion => General Software Discussion => Topic started by: kalos on August 27, 2008, 10:20 PM
-
hello
first, I am talking about webpages that have 1,2,3,next etc links (like google results)
I need a 'bot' that will click 'next' in a webpage, go to the next webpage, save it and then click next, go to the next, save it, etc
is there anything like this?
thanks
PS: in any browser, I have no specific preference
-
I think you can use httrack (http://www.httrack.com/) for that. Just pass it the page's address and configure it to download all the pages linked by that one at a depth of '1'.
If the page has other links not related to the search, they will be downloaded too, but I suppose you could delete those manually or something. I think httrack can ignore domains, so if those other pages are all in the same domain (the domain of the original page), you could just ignore that one and you'd get only the interesting pages ;)
-
the problem is that I need to do this inside the web browser, because the website needs authentication, which is not easy to achieve in webpage offline downloaders (it is not webpages in http://user:[email protected] format, but it requires web form authentication)
-
the problem is that I need to do this inside the web browser, because the website needs authentication, which is not easy to achieve in webpage offline downloaders (it is not webpages in http://user:[email protected] format, but it requires web form authentication)
-kalos
Sounds like a job for either GreaseMonkey, AutoIt and AutoHK but unless you're willing to provide some details I don't think anyone will be able to help:
eg.
GreaseMonkey - you need to provide access to the site so as to be able to create a userscript to do the actions you want.
AutoIt/AutoHK - you might get away with providing a screenshot of the site so as to give reference to mouse movement/actions and/or key input.
I think these are the most likely automated options barring a dedicated program.
If the website is using a form for verification then it most likely sets a cookie and you could use a website downloader that can use the cookie.
Try FireFox with DownThemAll! (https://addons.mozilla.org/en-US/firefox/addon/201) - it can supposedly download all links on a page.
-
unless you're willing to provide some details I don't think anyone will be able to help
-4wd
let's say I search in google for 'something' and it returns a webpage that it displays the google results, where at the bottom there is 1,2,3,4,next
each of the number of the google results webpages has this url:
http://www.google.com/search?q=something&start=10
http://www.google.com/search?q=something&start=20
etc
what I want to do is to save the google results webpage (the one with the numbers at the bottom), then click to go to the next google results webpage, save, go to next, save, etc (in other words I need to save all the webpages of the above mentioned urls)
all the above must be done within the web browser, because the website needs me to first authenticate via a web form
-
you can use Repagination (https://addons.mozilla.org/en-US/firefox/addon/2099) to combine all the pages into one and then save. just a thought. :)
[ You are not allowed to view attachments ]
• https://addons.mozilla.org/en-US/firefox/addon/2099
-
very interesting!
you can do miracles with JAVAscript and greasemonkey, but unfortunately it's hard to code it and there are not many JS developers
I will test it asap, thanks
-
Related:
https://addons.mozilla.org/en-US/firefox/addon/4925
http://antipagination.googlepages.com/index.html
-
it works, but for the 400+ webpages results that I need to save... it will crash the browser
a web navigation automate script or bot would be the ultimate solution
is there any?
-
it works, but for the 400+ webpages results that I need to save... it will crash the browser-kalos
wow, that is a lot of pages. :) there is one other add-on that i have in my bookmarks but haven't tried it before.
[ You are not allowed to view attachments ]
• https://addons.mozilla.org/en-US/firefox/addon/3262/
-
i totally forgot about this - iMacros for Firefox. :)
[ You are not allowed to view attachments ]
• http://www.iopus.com/imacros/firefox/
-
unfortunately macros won't work, because when I try to save each webpage of the results, the name of the filename is the same
is there a way to auto-rename them?
-
I'd think you would have to use Foxmarks to sync your bookmarks.
Then go to your Foxmarks web site where all your links are and work with them from there.
Of course you need Firefox also which I guess you have.
Would the addon, 'Download Them All', work?
Or you can use a download manager and the addon 'Copy all Links'.
Copy and paste them into the manager, which ever one is built into Firefox, and there are a few. Which one to choose would depend on it's options that you need.
-
Scrapbook (https://addons.mozilla.org/en-US/firefox/addon/427 (https://addons.mozilla.org/en-US/firefox/addon/427)), maybe?
If the sequential pages have some sort of numbering rule in their URL (most do, I think), then you could copy the starting URL, duplicate it as many times as required in an editor, change the numbering as required for each URL (with 400+ items, I would probably do this step in Excel or something similar), and ask Scrapbook to down them all in a folder.
I did a small test with one of the long thread on this forum:
[ You are not allowed to view attachments ]
If you can't or don't want to produce the URLs in advance, you can still do it with Scrapbook, but this time with the help of a Scrapbook Addon called AutoSave (http://amb.vis.ne.jp/mozilla/scrapbook/addons.php?lang=en#AutoSave (http://amb.vis.ne.jp/mozilla/scrapbook/addons.php?lang=en#AutoSave)) and iMacro mentioned above or something similar. I didn't try this approach though.
-
thanks
these are interesting, but I wonder if it is possible the program to know when the webpage is 100% loaded and afterwards to save it (so that there will be no incomplete webpages saved)
-
If you use the first method I mentioned (giving Scrapbook a list of URLs to save), it saves the web pages in the background, meaning it doesn't load the pages into Firefox. There's a small pop up showing the progress:
[ You are not allowed to view attachments ]
It saves one page at a time, with a small delay (a couple of seconds) in between, so it won't overwhelm the server. You may safely ignore the progress dialog (which would take some time if you give it a long list) and continue to use Firefox.
When it's done, the progress dialog goes away and another small message box pup up from the lower-right corner telling you "capture completed".
[ You are not allowed to view attachments ]
-
if the next webpage with results has an url that cannot be shown? eg. if to go there you click a button and the new url is not shown? then I cannot find the list of urls
is there any javascript bot that can auto-browse under specific commands, wait pages to load and then save them?
-
The auto saving part can be taken care by Scrapbook (with AutoSave plugin), as I mentioned above. There are other extensions that do this as well.
As to the auto clicking part, you'll probably need the help of iMicro (also mentioned above) or something like that. I've never tried it though, so can't help you there.
-
Just out of curiosity, why do you need 400+ Google results pages?
-
Just out of curiosity, why do you need 400+ Google results pages?
-Paul Keith
it's not about google results, google was just for example
-
Oh ok. Thanks for clarifying that.