1
General Software Discussion / Re: Scraper too expensive at 20 bucks
« on: January 18, 2015, 02:58 PM »
It took a couple of minutes to create a project file for A1 Website Download that could download and allow for offline surfing of the wanted pages and images including the big images. (And exclude the rest of the website / images) I figured it would be easier if you saw the project in case there was a misunderstanding (either by me or you) if A1WD would be able to handle the scenario you outlined.
I think we are talking past each other: Reading your posts again, I think you may be more interested in scraping specific data/images than downloading websites / large parts. Since A1 Website Download is not meant to scrape bits of each page - e.g. summaries or whatever people like to scrape - it is not something A1WD will ever do as its not what it is meant to solve.
I think we are talking past each other: A1WD would not download files excluded... But yes, it would have to download the content pages to discover the links/references to the images. If that is your criticism, I understand - i.e. if you in your proposed solution do not want to crawl pages to discover the files to download, but instead download files from a pre-generated list. For those who have the time to write custom downloaders for specific websites - that is of course always a good solution if you are not interested in downloading websites themselves.
For reference, I wrote my first "proof of concept" (but working) crawler and sitemapper in less than two days in 2003. It generated nice HTML sitemaps for my test websites... In 2005 I picked it up again and started working on the core crawler engine and all around it. Never stopped developing on it ever since. (Powers multiple sibling tools, among those A1 Website Download, so that is partly why.)
Anyhow, it seems you have a different idea for users to define what they want scraped/downloaded. And as a general statement: If you or anyone else have a plan on releasing a product that somehow improves or complements what is already out there - then I think you should go for it and release it!
I believe more in my own design (which I think is pretty unparalleled in flexibility and configuration compared to whatelse is currently out there of website downloaders), but it is a moot point to discuss from my point of view. (And you are certainly entitled to believe your proposed design is better.) I was mainly interested in hearing if anyone could present a practical real world problem out there - if you recall, I immediately agreed that websites that rely heavily on AJAX can be a problem.
When I started, I joined http://asp-software.org/ but I do not really engage much in developer forums anymore except those specific to the tools and problems I encounter.
I wish you well on all your future endeavours!
I think we are talking past each other: Reading your posts again, I think you may be more interested in scraping specific data/images than downloading websites / large parts. Since A1 Website Download is not meant to scrape bits of each page - e.g. summaries or whatever people like to scrape - it is not something A1WD will ever do as its not what it is meant to solve.
2) e.g. limit it to only download and keep images."Again, that would often download much too much, and I prefer downloading hygiene to lots of discard later on. Ultimate irony: That discarding unwanted pics could then possibly be done within a file manager allowing for regex selection.
I think we are talking past each other: A1WD would not download files excluded... But yes, it would have to download the content pages to discover the links/references to the images. If that is your criticism, I understand - i.e. if you in your proposed solution do not want to crawl pages to discover the files to download, but instead download files from a pre-generated list. For those who have the time to write custom downloaders for specific websites - that is of course always a good solution if you are not interested in downloading websites themselves.
For reference, I wrote my first "proof of concept" (but working) crawler and sitemapper in less than two days in 2003. It generated nice HTML sitemaps for my test websites... In 2005 I picked it up again and started working on the core crawler engine and all around it. Never stopped developing on it ever since. (Powers multiple sibling tools, among those A1 Website Download, so that is partly why.)
Anyhow, it seems you have a different idea for users to define what they want scraped/downloaded. And as a general statement: If you or anyone else have a plan on releasing a product that somehow improves or complements what is already out there - then I think you should go for it and release it!
I believe more in my own design (which I think is pretty unparalleled in flexibility and configuration compared to whatelse is currently out there of website downloaders), but it is a moot point to discuss from my point of view. (And you are certainly entitled to believe your proposed design is better.) I was mainly interested in hearing if anyone could present a practical real world problem out there - if you recall, I immediately agreed that websites that rely heavily on AJAX can be a problem.
When I started, I joined http://asp-software.org/ but I do not really engage much in developer forums anymore except those specific to the tools and problems I encounter.
I wish you well on all your future endeavours!