ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Post New Requests Here

REQ: Convert/Export Certain Browser's "Groups" as Chrome Bookmarks

(1/5) > >>

Cocoa:
Hello, all. I'm new here. Found the site while searching for a specific program and registered. I recently ran into this small problem, and as an average computer user who has absolutely no coding background, I'm at a total loss despite the simplicity of the issue. After many hours of frustration while trying various things with my limited knowledge, I decided to try posting here in the hopes the community can give me a hand.

I use Green Browser, which is a more of a Trident/IE based browser frontend. It can create these "groups", which are basically a more organized collection of favorites/bookmarks. I would like to have a small program to convert/export these groups into a html file such as those used by Chrome/Chromium to import as bookmarks. (More specifically, I use another custom Blink/Chromium browser called Slimjet, and would like to import to there, but I believe the format is probably universal for all Chrome/Chromium variants.)

Each Green Browser group is a bunch of urls, with each url given a name/label, and saved in a customized order. The info is stored as what I believe to be ANSI encoded text, with each group saved as an individual Windows file, the base file name being the group name and with the "cgp" custom extension. Here's an example (see attachment for original file):

File name: "Example Group.cgp"

--- Code: Text ---[Group]name0=BBC - Homepageurl0=http://www.bbc.com/name1=Yahoourl1=https://www.yahoo.com/name2=Googleurl2=https://www.google.com/?gws_rd=sslname3=CNN.co.jp : NASAの「空飛ぶ円盤」、ハワイで飛行実験url3=http://www.cnn.co.jp/fringe/35065380.html?tag=rcol;editorSelect
As for the Chrome bookmarks html file, I've also included a sample export. (Please see attached file for reference, the contents are too long to post here.) I don't know much about html formatting, but after analyzing the content, I believe, the main thing that's required to have Slimjet (Chrome) recognize the urls and their names, is to have each be reformatted from the above into such a format:


--- Code: Text ---<DT><A HREF="(url)">(name)</A>
The key things are to preserve each url, its name/label, and of course the order in which the urls are stored in for each group. I think you can safely ignore and exclude the other tags, such as "add date"; and of course the final file still needs the proper titles/meta tags to designate it as a html, but I'm not sure what exactly is required, as I'm don't know enough about html formatting.

Additionally, each group name can be stored as a folder name for the bookmarks, which would be at a different level than the individual urls in the html formatting, and lastly everything should be stored under a unique top level folder name, perhaps based on the date/time generated, or allow user input, so as to avoid overwriting or mixing in with existing bookmarks when importing.

Finally, a crucial thing to note is that although the groups are stored in *what I believe* to be ANSI encoding (someone can examine the file and correct me if I'm wrong), they actually can contain any number of characters, some of which are not included in ANSI, such as this  ・ (U+30FB) or  ・ (U+FF65). (If you cannot correctly display the symbol, see the wikipedia link here and jump to the Japanese section.) They are sometimes stored as just a "?" in a Green Browser group file due to the limits of the ANSI encoding, and there can be quite a few of them. The program needs to be able to handle the symbols, but does not need to preserve or restore the original symbols. It merely needs to be able to at least put a space in the place of these, or remove them (ideally, an program setting option to decide which would be best). I have on hand a small program that's suppose to be able to convert from Green Browser groups into IE favorites, and it would generate errors and fail at the conversion randomly, likely due to these characters. I've attached it here as well if someone wants a reference point. The program I'm requesting to convert to Chrome bookmarks html needs to be able to handle these symbols without failing in the same way.

Once again, the main goal is I really need something to help me convert a large number of Green Browser groups files into an html file that can be imported for Chrome (Slimjet) bookmarks. If anyone can help me out with this, then after that, perhaps other things can be added, such as support for converting back from bookmarks into groups, and perhaps adding support for other browsers that have groups with similar formatting, IE favorites, etc. Your help would be tremendously appreciated.

4wd:
This is something quick and dirty until someone comes up with something better.  It pays no attention to strange character encodings so the Japanese text in your sample will be output as basically garbage but the HTML generated will still work.

Powershell version below.

Cocoa:
This is something quick and dirty until someone comes up with something better.  It pays no attention to strange character encodings so the Japanese text in your sample will be output as basically garbage but the HTML generated will still work.

CGP2HTML.cmd
.....
-4wd (June 17, 2015, 10:06 PM)
--- End quote ---

Hi, 4wd

Thank you so much for your prompt response. I honestly did not expect a reply so fast, so I didn't see the response until really late and didn't have time to work on testing until this evening.

I've tried the cmd converter. It generated a html file for the cgp I tested just fine, but unfortunately the importing afterwards didn't work. I tried manually comparing the html with an original html generated by Slimjet, and did some editing... After several hours of mounting frustrations and pointless searches on the internet, I gave myself a facepalm :D and suddenly realized what was missing. (annotated code at bottom)

Then about the number of files processed. I have several dozens of cgp files that needs to be converted. Is it possible to make the converter mass process multiple cgp files (such as all under current folder) into one html file? The <H3> tags in the html file represent "folder" names for the bookmarks under Slimjet/Chrome. Each cgp file's name can be used to create an additional <H3> folder tag of the same level, and thus multiple cgps can be incorporated into one html file. Everything also needs to be given a unique top level <H3> "folder" name, such as the date&time generated, so they will be imported under 1 folder and avoid mixing in with pre-exisiting bookmarks in Slimjet/Chrome when imported. (There can be lower level "folder" names within "folder" names, but their tag remains the same (H3), so the hierarchy and boundaries of each folder is probably determined by each corresponding opening and closing
--- ---<DL><p> tag pairs.)

Lastly, there is the problem with the charset encoding. I'm sorry I wasn't clearer in my original post, but the non-Romance language texts needs to be preserved. The only things that are not needed are some of the extra symbols such as the interpunct dot ( ・ (U+30FB) or  ・ (U+FF65)) that are not included in the ANSI charset. The symbols are not stored properly under ANSI, resulting in their appearance as a "?" in the cgp files. Since they are only some minor symbols, it's not necessary to retain them during the conversion. The words in the non-Romance languages such as Japanese still need to be preserved however, as they retain important info for the labels of the url. Right now the conversion will result in mojibake (garbled nonsense due to incorrect character encoding). When I view source and open the generated html as text, I can view the characters just fine, but once imported into Slimjet, the browser automatically selects the encoding, and it isn't the right one, resulting in mojibake.

I may have discerned a simpler solution that will deal with both the mojibake and the extra symbols outside ANSI charset. The problem lies with choosing the right character encoding for the imported text. The original meta tags header in a Slimjet generated html file will let you set the encoding, which, in this case isn't UTF-8. However, this is a very inelegant solution that requires manually setting the encoding each time, depending on what the encoding of the imported cgp text is. A much better way to fix everything is to have the convertor first convert the text obtained into UTF-8 encoding. Then process everything into the html formatting. That way, in theory, regardless of the encoding used for the cgp files, everything should display properly, even the extra symbols.



Code here:

--- Code: Text ---<!DOCTYPE NETSCAPE-Bookmark-file-1><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8"><TITLE>Bookmarks</TITLE><H1>Bookmarks</H1><DL><p>    <DT><H3>(Top level unique folder name. Use date&time?)</H3>    <DL><p>        <DT><H3>(1st cgp group file name)</H3>        <DL><p>            <DT><A HREF="(url)">(name)</A>        </DL><p>        <DT><H3>(2nd cgp group file name)</H3>        <DL><p>            <DT><A HREF="(url)">(name)</A>        </DL><p></DL><p>

Cocoa:
Added in additional notes in my original second post regarding the top level "folder" name, etc. Also updated the annotations and code to clarify things a bit more.

4wd:
Here's a WIP version, run it over your files and see if there is anything wrong.  I already know I have to fix URLs with = in them, more interested in character encoding and output structure, (character encoding worked here).

BTW, I don't know if it's your archiver or my extractor but the example files in your archive had the Japanese characters munged - only place where they appear as they should is in your OP.  To elucidate a bit more, the characters in your OP are UTF-8, the files in your archives seem to be ANSI.

Just double-click the command file and choose a folder (full of .cgp files) from the requester, output will be to CGP2HTML-HHMMSS.html (where HHMMSS is current time) in the folder that the command file resides in.

What it generated imported into Comodo Dragon OK.

Update: Fixed URLs with = in them (well seems to have)

See below

Navigation

[0] Message Index

[#] Next page

Go to full version