Main Area and Open Discussion > General Software Discussion
a program to batch-convert html to plain text?
urlwolf:
This is something that I needed long time ago and now I'm needing again.
I wish there were a program to batch-process html to text in an easy way,
stripping tables, ads, sitemaps, frames, etc and getting only article text.
Since there is a huge variability in how people use html and each site has a
different formatting, a general solution is difficult I guess. I used to use
perl + some parsing modules, but the solutions would break easily (a bit more
robust than simple regular expressions, but still).
Then, I think I moved to use dumps from text-based browers. 'Links' actually
supports frames, which is nice. The most popular one is 'lynx'.
http://links.sourceforge.net/download/binaries/
Still, it is a lot of work and it needs fine-tuning.
Do you know of any program that offers that, together with a GUI and some other
niceties? There must be one out there...
Thanks a lot
369:
http://www.jetman.dircon.co.uk/software/web2text.html ?
gjehle:
maybe try this one
http://html2text.sourceforge.net/
besides there should be enough perl modules too
and yes, links2 does it too
easy to get a batch script using those
tinyvillager:
Source:http://www.highdots.com/html-code-export/
"Copy and Paste"
Looking for a simple and fast way to indent and export your HTML code into various file formats? Look no further than HTML Code Export, a unique and easy to use software to quickly and easily reindent, export (10+ formats supported) and print your HTML documents, convert them to PDF, RTF, images and more!
Convert your HTML code to the following formats :
HTML
PDF
RTF
BMP
PNG
JPG
Lotus
SVG
QUATTRO Pro
Excel
It's freeware too. :Thmbsup:
mouser:
nice find tinyvillager!
other good software from that company as well: http://www.highdots.com/
( sending a couple credits your way :) )
Navigation
[0] Message Index
[#] Next page
Go to full version