ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE. Software > Coding Snacks

Page data harvest

<< < (2/2)

Thank you also, TJ. Taking a look at WGETand HTTrack..... great packages. 
I do appreciate your guys helping me!  :Thmbsup:


HTML is often poorly written. It could be hard to parse if, for example, end-tags are missing. I highly recommend to "sanitize" it with HTML-Tidy before start parsing. Set the "output-xml" option to get well formed XML which could be parsed with any XML-Parser-Libary (DOM/SAX) - this is often easier than using RegEx.


[0] Message Index

[*] Previous page

Go to full version