DonationCoder.com Software > Coding Snacks
Page data harvest
jpcook:
Thank you also, TJ. Taking a look at WGETand HTTrack..... great packages.
I do appreciate your guys helping me! :Thmbsup:
crono:
Hi,
HTML is often poorly written. It could be hard to parse if, for example, end-tags are missing. I highly recommend to "sanitize" it with HTML-Tidy before start parsing. Set the "output-xml" option to get well formed XML which could be parsed with any XML-Parser-Libary (DOM/SAX) - this is often easier than using RegEx.
Navigation
[0] Message Index
[*] Previous page
Go to full version