Dear fellow DC members,
I am looking for software tools that would enable me to extract information from a large number of html pages. These html pages have already been downloaded to my computer's hard drive, so no web access is needed. For each file, I want to extract a person's name, email address, and phone number. Not all this information will exist on each html page. Each html page contains the information for one individual.
I now extract information manually by:
Loading an html page into Note Tab Standard
Stripping html tags
Cut and paste (into another text file) the desired information
Repeat ad nauseum
Currently there is a subdirectory of approximately 500 such html pages. I want to automate this process so it can be executed on other data sources. I am NOT a programmer, so if someone suggests using a scripting language and regular expressions, please be prepared to walk me through the process, but I am not afraid of the command line. I would prefer to use one or more software tools, whether command line or GUI-based.
If anyone can suggest either free or commercial (not too expensive!) software tools to accomplish this task, I will be grateful. I have been a member of Donation Coder for a while, and downloaded some tools (use FARR every day), but this is the first time I have asked for your help in solving a problem. I hope you guys and gals can come up with something. If I have posted this request in an inappropriate location, please let me know.
Thanks for your time.
Ted Rose
[EMAIL REMOVED TO AVOID SPAMBOTS FINDING IT -- SEND A MESSAGE ON THE FORUM TO TED TO CONTACT HIM DIRECTLY]