DonationCoder.com Software > PublicDomainVic
AddressBooker - Webpage to Address Book
publicdomain:
This is based on fellow member @sphere's request:
* Targeted Web clipping or scraping and formatting
After some brainstorming & discussion, the planned feature set for the program is:
AdressBooker v1 roadmap:
* Extract contact information, contact photo (see original forum post) from web page (by monitoring the clipboard) and bulk process saved pages from a folder directory ( https://github.com/gildas-lormeau/SingleFile/wiki )
* Pull information from ‘+” link on (ie instagram)
* User will be presented with a dialog when an URL that contains a website from the template/definitions list is copied.
* Dialogue box will have:
* Option to select and/or add group/category, tags and short note from dialogue box/
* Option to include webpage title and source url when collecting information.
* User definable hotkey to pause clipboard monitoring in case user wants to clip address without triggering app.
Additional settings:
* Ability to edit and create new page templates.
Information will be collected into an Addressbook that will be developed later to group/sort contacts add some notes etc.
-sphere
--- End quote ---
GitHub Repo @ https://github.com/publicdomain/addressbooker
Development of AddressBooker has officially begun! :Thmbsup:
Cheers!
Vic
sphere:
Thats great news! Looking forward to it. Thank you.
publicdomain:
Thats great news! Looking forward to it. Thank you.
-sphere (February 24, 2024, 01:42 AM)
--- End quote ---
I'm happy to assist in bringing your brainchild to life! :-* Having the first ALPHA of AdressBooker available for commenting on its functionality is the goal for this weekend :Thmbsup:
Stay tuned!
publicdomain:
Extract contact information, contact photo (see original forum post) from web page (by monitoring the clipboard) and bulk process saved pages from a folder directory ( https://github.com/gildas-lormeau/SingleFile/wiki )
-sphere (February 23, 2024, 09:48 PM)
--- End quote ---
As the actual implementation of this main feature happens, there have been some considerations:
- Libraries dealing with direct HTML scrapping were all discarded (sites can and do load dynamically via scripts regularly).
- We originally discussed Selenium + chromedriver. I am familiar with good ol' Selenium since many years ago but I'm using this first web scrapper of 2024 for having a go at some of the more-modern libraries, particularly Google's Puppeteer (https://github.com/puppeteer/puppeteer), which is looking like the way to go.
It's a bulky package when working with Chrome for testing, but it is guaranteed to work since the integration is made at the developer level as a first-party tool (an "indivisible entity with Chromium").
It may mean a bit of an upfront learning curve for a couple more days but seems that investing in Google's Puppeteer can make AddressBooker a proper "headless" scrapper for the modern web (given the requirements are the likes of Instagram, Facebook and the program is likely going to be used with more of the modern social webs).
publicdomain:
@sphere
- What's your current Windows version?
I'm flipping and flopping between C# and JavaScript/TypeScript. I have more experience with C# for the desktop but it sounds like some form of node integration is unavoidable here.
My aim is doing this right, so any webpage that Chrome can display we can scrape correctly (regardless of size/megabytes or old Windows compatibility). This solves the many (many!) configuration issues that working with multiple browsers and versions of such browsers can bring, right off the bat.
Navigation
[0] Message Index
[#] Next page
Go to full version