ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

how to download the text of webpages?

<< < (2/3) > >>

kalos:
Are you looking to pay for this?  Or are you looking for something free?
-wraith808 (November 21, 2013, 04:51 PM)
--- End quote ---

something free would be ideal, but if I have to pay a reasonable amount, it would be okay

patteo:
Are you looking to pay for this?  Or are you looking for something free?
-wraith808 (November 21, 2013, 04:51 PM)
--- End quote ---

something free would be ideal, but if I have to pay a reasonable amount, it would be okay
-kalos (November 22, 2013, 04:31 AM)
--- End quote ---

Browser Scripting, Data Extraction and Web Testing by iMacros
Use iMacros® 9 to create solutions for web automation, web scraping or web testing in just five minutes.
http://www.iopus.com/imacros/

There's a free Firefox Add-in
https://addons.mozilla.org/en-US/firefox/addon/imacros-for-firefox/

and Chrome Add-in
https://chrome.google.com/webstore/detail/imacros-for-chrome/cplklnmnlbnpmjogncfgfijoopmnlemp?hl=en

wraith808:
Mozenda was favorably reviewed in our trials, and has a free account to try it out with (and that might be good for low throughput).  We still actually use it for some stuff.

iRobotSoft is free... it was nowhere what we needed for our uses, but it might work for you.

Automation Anywhere - again, didn't serve our purposes, but it is feature rich.  We still actually use it for some low intensity things.

IainB:
Consider Scrapbook: I'm not sure if it is exactly what you want, but the Firefox Scrapbook extension is quite good at scraping individual web pages and those web-pages nested underneath a page. You can tell it how "deep" to go in the nested site, and what files to pick up/ignore as it goes. It tidies all the links up when done, so you have a relatively self-contained copy of the downloaded pages.
I have highlighted in the screenshot below - using red boxes/arrow - the relevant bits of the download option (which pops up on a drag/drop save to a Scrapbook folder):



Scrapbook is useful where you want to be able to search/retrieve the content easily. Not only can you use Scrapbook to index/search all that it downloads, but also, since it saves the content in a non-proprietary format as html, you can index/search it with the standard Windows Desktop Search and - in my case - xplorer² (a Windows Explorer replacement tool). Any files it creates or downloads also immediately show up in the file/folder search tool Everything.

IainB:
You could use the Mozilla Archive Format extension to save individual pages, or several/all tabs, in either MAFF or MHTML format.
Depending on the constraints, using search/index tools on these files may be problematic as:

* The MAFF format files are based on the .ZIP format.
* The MHTML format files are based on the MIME format.


When opened, the output of these files is not always likely to be as "exact" a copy as (say) the Scrapbooked pages, and may also differ from browser to browser.
By the way, note that the Scrapbook extension currently only works in Firefox.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version