... it gives me errors 426, 401 for some websites especially HTTPS.
-kalos
426 is "
upgrade required" (the destination server refuses to accept the current protocol).
401 is "
unauthorized", since it needs authentication.
You may want to try another scraper but since
cloud based means using
someone else's server resources, they are likely to ask for some $ at some point. There is a list of "cloud based web scrapping solutions" at:
You can find some free tiers there
Do notice cloud scraper prices can be way overpriced when compared with hosting your own.
The most efficient way is of course having your own web scraper configured in a server
under your control -even if the hosting/server fee itself is free. The important part is having the ability to modify the scraper to account for HTML changes and
being flexible in the ways to interact with the destination server as time goes by.
Relevant questions:
What
frequency do you need to have? (daily, hourly, other)
When you say "monitor a webpage for changes" do you mean
any change on the page or is it a certain
portion of the contents within the page?
...You can actually get away with using only the "
Last-Modified" header in the first case.
If it has to do with monitoring specific contents, then yes, a regular web scraper is due.
BTW I'm coding a "Webpage to Address book" program right now, so I can be of help/assistance since the first part of parsing data from the web is essentially scraping
.
Cheers!
Vic