Extract contact information, contact photo (see original forum post) from web page (by monitoring the clipboard) and bulk process saved pages from a folder directory ( https://github.com/g...meau/SingleFile/wiki )
-sphere
As the actual implementation of this main feature happens, there have been some considerations:
- Libraries dealing with direct HTML scrapping were all discarded (sites can and do load dynamically via scripts regularly).
- We originally discussed Selenium + chromedriver. I am familiar with good ol' Selenium since many years ago but I'm using this first web scrapper of 2024 for having a go at some of the more-modern libraries, particularly Google's Puppeteer (
https://github.com/puppeteer/puppeteer), which is looking like the way to go.
It's a bulky package when working with
Chrome for testing, but
it is guaranteed to work since the integration is made at the developer level as a first-party tool (an "
indivisible entity with Chromium").
It may mean a bit of an upfront learning curve for a couple more days but seems that investing in Google's Puppeteer can make AddressBooker a
proper "headless" scrapper for the modern web (given the requirements are the likes of Instagram, Facebook and the program is likely going to be used with more of the modern social webs).