ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Targeted Web clipping or scraping and formatting

(1/1)

sphere:

What is the best way to automate the collection of specific data, ie contact information, when visiting a webpage?  Years ago I used to methodically copy or enter the  person/company’s  name, address, web url telephone number and email and notes into a contact manager or address book. More recently, I have simply saved a copy of their contact page.  I can link to the saved page, tag it, and or search for it.  This works,- but it is a little chaotic.   

It occurred to me that maybe it would be possible for  Clipboard Help + Spell to identify different types of text strings and sort them neatly into their categories based on the attributes of the text.  A ".com", ".org" etc would indicate that text strig is part of a web url.  An "@" sign would indicate an email address or an Instagram etc.

It would be great to be able to pull
Name,
Address,
Telephone
Mobile
Email
Facebook Page
Instagram (more and more businesses are on instagram)
Etc…
A description
page url

It seems like an easier way to identify what text is what would be using the page coding. 
I have looked into some Customer Relationship Managers CRMs to see if they offer a way to pull contact information into their systems and I have not found anything yet.  I have also looked into web scraping tools, but most seem to target an entire site.  I would like to indicate a page and pull the information I want.

Any ideas?

sphere:
So I realize this post is old, but once again I am looking for options.  Never found anything really the first time around.

publicdomain:
I would like to indicate a page and pull the information I want.

Any ideas?
-sphere (February 12, 2020, 05:33 PM)
--- End quote ---

A user-configurable custom web scrapper should be able to perform this in a very smooth way.

Basically, you configure the HTML tags to extract the information from (as they appear on the source page) and then send the collected data into the target contact manager program via automation (e.g. winapi's WM_SETTEXT, for a traditional Windows program).

This is perfectly doable :up:

publicdomain:
A user-configurable custom web scrapper should be able to perform this in a very smooth way.
-publicdomain (February 07, 2024, 06:42 AM)
--- End quote ---

Talks with @sphere are advanced regarding the creation of such as custom web scrapper :)

We've been brainstorming back & forth and there's a pretty good idea as to what's needed to achieve a proper release :Thmbsup:

(Official dedicated thread for the new "AddressBooker" program is to be posted!)

publicdomain:
(Official dedicated thread for the new "AddressBooker" program is to be posted!)
-publicdomain (February 16, 2024, 12:16 AM)
--- End quote ---

Done! Thread is published @ https://www.donationcoder.com/forum/index.php?topic=53987.0

Navigation

[0] Message Index

Go to full version