Author Topic: AddressBooker - Webpage to Address Book (Read 13297 times)

publicdomain · « **on:** February 23, 2024, 09:48 PM »

This is based on fellow member @sphere's request:

Targeted Web clipping or scraping and formatting

After some brainstorming & discussion, the planned feature set for the program is:

AdressBooker v1 roadmap:

Extract contact information, contact photo (see original forum post) from web page (by monitoring the clipboard) and bulk process saved pages from a folder directory ( https://github.com/gildas-lormeau/SingleFile/wiki )
Pull information from ‘+” link on (ie instagram)
User will be presented with a dialog when an URL that contains a website from the template/definitions list is copied.
Dialogue box will have:
Option to select and/or add group/category, tags and short note from dialogue box/
Option to include webpage title and source url when collecting information.
User definable hotkey to pause clipboard monitoring in case user wants to clip address without triggering app.

Additional settings:
Ability to edit and create new page templates.

Information will be collected into an Addressbook that will be developed later to group/sort contacts add some notes etc.
-sphere

GitHub Repo @ https://github.com/publicdomain/addressbooker

Development of AddressBooker has officially begun!

Cheers!
Vic

sphere · « **Reply #1 on:** February 24, 2024, 01:42 AM »

Thats great news! Looking forward to it. Thank you.

publicdomain · « **Reply #2 on:** February 24, 2024, 10:46 AM »

Thats great news! Looking forward to it. Thank you.
-sphere (February 24, 2024, 01:42 AM)

I'm happy to assist in bringing your brainchild to life!

Having the first ALPHA of AdressBooker available for commenting on its functionality is the goal for this weekend

Stay tuned!

publicdomain · « **Reply #3 on:** February 25, 2024, 04:23 PM »

Extract contact information, contact photo (see original forum post) from web page (by monitoring the clipboard) and bulk process saved pages from a folder directory ( https://github.com/g...meau/SingleFile/wiki )
-sphere (February 23, 2024, 09:48 PM)

As the actual implementation of this main feature happens, there have been some considerations:

- Libraries dealing with direct HTML scrapping were all discarded (sites can and do load dynamically via scripts regularly).

- We originally discussed Selenium + chromedriver. I am familiar with good ol' Selenium since many years ago but I'm using this first web scrapper of 2024 for having a go at some of the more-modern libraries, particularly Google's Puppeteer (https://github.com/puppeteer/puppeteer), which is looking like the way to go.

It's a bulky package when working with Chrome for testing, but it is guaranteed to work since the integration is made at the developer level as a first-party tool (an "indivisible entity with Chromium").

It may mean a bit of an upfront learning curve for a couple more days but seems that investing in Google's Puppeteer can make AddressBooker a proper "headless" scrapper for the modern web (given the requirements are the likes of Instagram, Facebook and the program is likely going to be used with more of the modern social webs).

publicdomain · « **Reply #4 on:** February 25, 2024, 08:10 PM »

@sphere

- What's your current Windows version?

I'm flipping and flopping between C# and JavaScript/TypeScript. I have more experience with C# for the desktop but it sounds like some form of node integration is unavoidable here.

My aim is doing this right, so any webpage that Chrome can display we can scrape correctly (regardless of size/megabytes or old Windows compatibility). This solves the many (many!) configuration issues that working with multiple browsers and versions of such browsers can bring, right off the bat.

sphere · « **Reply #5 on:** February 25, 2024, 11:46 PM »

- What's your current Windows version?
-publicdomain (February 25, 2024, 08:10 PM)

I run windows 10 currently but expect to have to shift to windows11 at some point. I also have a linux Mint machine but was not expecting this to work there.

publicdomain · « **Reply #6 on:** February 29, 2024, 05:06 PM »

I run windows 10 currently but expect to have to shift to windows11 at some point. I also have a linux Mint machine but was not expecting this to work there.
-sphere (February 25, 2024, 11:46 PM)

Thank you. The first ALPHA version should be coming up. I'm running an Ubuntu-based distro (20.04) so it might as well be Linux since I'm using it with good results.

Puppeteer programming is logical and I'm confident this is the way forward since basically all the sites & webmasters account for Chrome/Chromium. The only complaint is the ~400MB overall release package size but we'll cope with it.

sphere · « **Reply #7 on:** February 29, 2024, 08:19 PM »

Thank you. The first ALPHA version should be coming up. I'm running an Ubuntu-based distro (20.04) so it might as well be Linux since I'm using it with good results.

Thats great. I have Linux Mint.

Puppeteer programming is logical and I'm confident this is the way forward since basically all the sites & webmasters account for Chrome/Chromium. The only complaint is the ~400MB overall release package size but we'll cope with it.

That is pretty large, but totally doable. You had mentioned it being portable. So I am assuming if need be I could run it off of an external USB 3.0 or more.

publicdomain · « **Reply #8 on:** March 05, 2024, 11:54 PM »

That is pretty large, but totally doable. You had mentioned it being portable. So I am assuming if need be I could run it off of an external USB 3.0 or more.
-sphere (February 29, 2024, 08:19 PM)

Currently doing all development on Linux. Portable looks doable, but I'm only focused on getting it going (and doing it right, including userDataDir for consistent profile + xPath definitions).

Please tell me if this command runs on your end (Mint):

[Select]

npm i puppeteer

That should do the setup to run the program cloning from GH.

(Let's handle making it portable later).

sphere · « **Reply #9 on:** March 06, 2024, 01:03 PM »

Worrying about portable later makes sense. I will try and run it tonight when I get home, but it will be pretty late. Otherwise I might not get to it until tomorrow.
Hope that does not delay you to much.
Thanks

publicdomain · « **Reply #10 on:** March 06, 2024, 02:33 PM »

No worries AMIGO

I just want to see where your Linux box is at (the full list of commands is larger; the AddressBooker program is a Node script at this time hence it needs you to have a suitable runtime set-up).

Realistically, it'll be until Friday & the weekend when I devote full days to finishing DC programs entirely so go at ease

(Thank to you for accepting the penguin for a platform as it streamlined things)

publicdomain · « **Reply #11 on:** March 11, 2024, 03:59 PM »

Notice: part of the testing is being handled by PM, but the working result is definitely being made open upon release

publicdomain · « **Reply #12 on:** March 11, 2024, 06:02 PM »

[Select]
npm i puppeteer

That should do the setup to run the program cloning from GH.
-publicdomain (March 05, 2024, 11:54 PM)

In case someone else is following, this is the current command line setup:

[Select]

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

source ~/.bashrc

nvm install 18

nvm use 18

npm i puppeteer

We're about to start the actual public testing of definitions, etc.

publicdomain · « **Reply #13 on:** March 23, 2024, 07:33 PM »

App with clipboard monitoring and xpath definitions for the holidays.

This is the final "larger" release to close the month (after delivering this one, we open our new software project).

publicdomain · « **Reply #14 on:** March 25, 2024, 03:43 PM »

App with clipboard monitoring and xpath definitions for the holidays.
-publicdomain (March 23, 2024, 07:33 PM)

Okay! I've started pushing versions to GitHub incrementally, for confirmation of at least one (1) user --likely fellow user "sphere"

First ALPHA confirms the program is not detected as a bot:

GitHub tag: https://github.com/publicdomain/addressbooker/releases/tag/v0.1.0-alpha.1

[Select]

Setup:

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

source ~/.bashrc

nvm install 18

nvm use 18

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

=====

Clone:

git clone https://github.com/publicdomain/addressbooker.git

=====

Run:

node addressbooker.js

sphere · « **Reply #15 on:** March 26, 2024, 12:26 AM »

Thanks.

Ran it.

Author Topic: AddressBooker - Webpage to Address Book (Read 13297 times)

publicdomain

AddressBooker - Webpage to Address Book

sphere

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

sphere

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

sphere

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

sphere

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

publicdomain

Re: AddressBooker - Webpage to Address Book

sphere

Re: AddressBooker - Webpage to Address Book