topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday September 19, 2024, 9:30 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: REQ: Convert/Export Certain Browser's "Groups" as Chrome Bookmarks  (Read 26107 times)

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
Hello, all. I'm new here. Found the site while searching for a specific program and registered. I recently ran into this small problem, and as an average computer user who has absolutely no coding background, I'm at a total loss despite the simplicity of the issue. After many hours of frustration while trying various things with my limited knowledge, I decided to try posting here in the hopes the community can give me a hand.

I use Green Browser, which is a more of a Trident/IE based browser frontend. It can create these "groups", which are basically a more organized collection of favorites/bookmarks. I would like to have a small program to convert/export these groups into a html file such as those used by Chrome/Chromium to import as bookmarks. (More specifically, I use another custom Blink/Chromium browser called Slimjet, and would like to import to there, but I believe the format is probably universal for all Chrome/Chromium variants.)

Each Green Browser group is a bunch of urls, with each url given a name/label, and saved in a customized order. The info is stored as what I believe to be ANSI encoded text, with each group saved as an individual Windows file, the base file name being the group name and with the "cgp" custom extension. Here's an example (see attachment for original file):

File name: "Example Group.cgp"
Code: Text [Select]
  1. [Group]
  2. name0=BBC - Homepage
  3. url0=http://www.bbc.com/
  4. name1=Yahoo
  5. url1=https://www.yahoo.com/
  6. name2=Google
  7. url2=https://www.google.com/?gws_rd=ssl
  8. name3=CNN.co.jp : NASAの「空飛ぶ円盤」、ハワイで飛行実験
  9. url3=http://www.cnn.co.jp/fringe/35065380.html?tag=rcol;editorSelect

As for the Chrome bookmarks html file, I've also included a sample export. (Please see attached file for reference, the contents are too long to post here.) I don't know much about html formatting, but after analyzing the content, I believe, the main thing that's required to have Slimjet (Chrome) recognize the urls and their names, is to have each be reformatted from the above into such a format:

Code: Text [Select]
  1. <DT><A HREF="(url)">(name)</A>

The key things are to preserve each url, its name/label, and of course the order in which the urls are stored in for each group. I think you can safely ignore and exclude the other tags, such as "add date"; and of course the final file still needs the proper titles/meta tags to designate it as a html, but I'm not sure what exactly is required, as I'm don't know enough about html formatting.

Additionally, each group name can be stored as a folder name for the bookmarks, which would be at a different level than the individual urls in the html formatting, and lastly everything should be stored under a unique top level folder name, perhaps based on the date/time generated, or allow user input, so as to avoid overwriting or mixing in with existing bookmarks when importing.

Finally, a crucial thing to note is that although the groups are stored in *what I believe* to be ANSI encoding (someone can examine the file and correct me if I'm wrong), they actually can contain any number of characters, some of which are not included in ANSI, such as this  ・ (U+30FB) or  ・ (U+FF65). (If you cannot correctly display the symbol, see the wikipedia link here and jump to the Japanese section.) They are sometimes stored as just a "?" in a Green Browser group file due to the limits of the ANSI encoding, and there can be quite a few of them. The program needs to be able to handle the symbols, but does not need to preserve or restore the original symbols. It merely needs to be able to at least put a space in the place of these, or remove them (ideally, an program setting option to decide which would be best). I have on hand a small program that's suppose to be able to convert from Green Browser groups into IE favorites, and it would generate errors and fail at the conversion randomly, likely due to these characters. I've attached it here as well if someone wants a reference point. The program I'm requesting to convert to Chrome bookmarks html needs to be able to handle these symbols without failing in the same way.

Once again, the main goal is I really need something to help me convert a large number of Green Browser groups files into an html file that can be imported for Chrome (Slimjet) bookmarks. If anyone can help me out with this, then after that, perhaps other things can be added, such as support for converting back from bookmarks into groups, and perhaps adding support for other browsers that have groups with similar formatting, IE favorites, etc. Your help would be tremendously appreciated.
« Last Edit: June 19, 2015, 02:36 AM by Cocoa »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
This is something quick and dirty until someone comes up with something better.  It pays no attention to strange character encodings so the Japanese text in your sample will be output as basically garbage but the HTML generated will still work.

Powershell version below.
« Last Edit: June 25, 2015, 10:51 PM by 4wd »

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
This is something quick and dirty until someone comes up with something better.  It pays no attention to strange character encodings so the Japanese text in your sample will be output as basically garbage but the HTML generated will still work.

CGP2HTML.cmd
.....

Hi, 4wd

Thank you so much for your prompt response. I honestly did not expect a reply so fast, so I didn't see the response until really late and didn't have time to work on testing until this evening.

I've tried the cmd converter. It generated a html file for the cgp I tested just fine, but unfortunately the importing afterwards didn't work. I tried manually comparing the html with an original html generated by Slimjet, and did some editing... After several hours of mounting frustrations and pointless searches on the internet, I gave myself a facepalm :D and suddenly realized what was missing. (annotated code at bottom)

Then about the number of files processed. I have several dozens of cgp files that needs to be converted. Is it possible to make the converter mass process multiple cgp files (such as all under current folder) into one html file? The <H3> tags in the html file represent "folder" names for the bookmarks under Slimjet/Chrome. Each cgp file's name can be used to create an additional <H3> folder tag of the same level, and thus multiple cgps can be incorporated into one html file. Everything also needs to be given a unique top level <H3> "folder" name, such as the date&time generated, so they will be imported under 1 folder and avoid mixing in with pre-exisiting bookmarks in Slimjet/Chrome when imported. (There can be lower level "folder" names within "folder" names, but their tag remains the same (H3), so the hierarchy and boundaries of each folder is probably determined by each corresponding opening and closing
<DL><p>
tag pairs.)

Lastly, there is the problem with the charset encoding. I'm sorry I wasn't clearer in my original post, but the non-Romance language texts needs to be preserved. The only things that are not needed are some of the extra symbols such as the interpunct dot ( ・ (U+30FB) or  ・ (U+FF65)) that are not included in the ANSI charset. The symbols are not stored properly under ANSI, resulting in their appearance as a "?" in the cgp files. Since they are only some minor symbols, it's not necessary to retain them during the conversion. The words in the non-Romance languages such as Japanese still need to be preserved however, as they retain important info for the labels of the url. Right now the conversion will result in mojibake (garbled nonsense due to incorrect character encoding). When I view source and open the generated html as text, I can view the characters just fine, but once imported into Slimjet, the browser automatically selects the encoding, and it isn't the right one, resulting in mojibake.

I may have discerned a simpler solution that will deal with both the mojibake and the extra symbols outside ANSI charset. The problem lies with choosing the right character encoding for the imported text. The original meta tags header in a Slimjet generated html file will let you set the encoding, which, in this case isn't UTF-8. However, this is a very inelegant solution that requires manually setting the encoding each time, depending on what the encoding of the imported cgp text is. A much better way to fix everything is to have the convertor first convert the text obtained into UTF-8 encoding. Then process everything into the html formatting. That way, in theory, regardless of the encoding used for the cgp files, everything should display properly, even the extra symbols.

Annotated Code fin.PNG

Code here:
Code: Text [Select]
  1. <!DOCTYPE NETSCAPE-Bookmark-file-1>
  2. <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
  3. <TITLE>Bookmarks</TITLE>
  4. <H1>Bookmarks</H1>
  5. <DL><p>
  6.     <DT><H3>(Top level unique folder name. Use date&time?)</H3>
  7.     <DL><p>
  8.         <DT><H3>(1st cgp group file name)</H3>
  9.         <DL><p>
  10.             <DT><A HREF="(url)">(name)</A>
  11.         </DL><p>
  12.         <DT><H3>(2nd cgp group file name)</H3>
  13.         <DL><p>
  14.             <DT><A HREF="(url)">(name)</A>
  15.         </DL><p>
  16. </DL><p>
« Last Edit: June 19, 2015, 03:44 PM by Cocoa »

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
Added in additional notes in my original second post regarding the top level "folder" name, etc. Also updated the annotations and code to clarify things a bit more.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
Here's a WIP version, run it over your files and see if there is anything wrong.  I already know I have to fix URLs with = in them, more interested in character encoding and output structure, (character encoding worked here).

BTW, I don't know if it's your archiver or my extractor but the example files in your archive had the Japanese characters munged - only place where they appear as they should is in your OP.  To elucidate a bit more, the characters in your OP are UTF-8, the files in your archives seem to be ANSI.

Just double-click the command file and choose a folder (full of .cgp files) from the requester, output will be to CGP2HTML-HHMMSS.html (where HHMMSS is current time) in the folder that the command file resides in.

What it generated imported into Comodo Dragon OK.

Update: Fixed URLs with = in them (well seems to have)

See below
« Last Edit: June 25, 2015, 10:52 PM by 4wd »

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
Hey, 4wd

Thanks so much for your continued efforts in helping and supporting this app. I got caught up with some unexpected stuff over the weekend, didn't get a chance to test the new version until now. I double-clicked the script to run it and ran into an error after choosing the folder for groups. It seems many of the commands are not supported in my Windows 7 SP1 x64 (Ult). Is there some extra dependency?

[attachimg=#1][/attachimg][attachimg=#2][/attachimg]

And to answer your earlier question about encoding. The characters in my first post are UTF-8 because they are copied and displayed in a webpage. When displayed in the web browser that you're currently viewing, they will display according to the html and/or browser settings. Chrome and many modern browsers will automatically select the correct code page for displaying text that's not in Unicode (assuming you have the code page installed), but you can also set your own display encoding manually in your browser, and that can also result in mojibake when it doesn't match the actual encoding that's being used. Since there can be multiple code pages even for the same language (due to regional differences), it makes a fairly troublesome affair, which is why most use Unicode nowadays. Anyway, what you see in a webpage is entirely separate from the cgp files, which is probably still in ANSI, the encoding that Green Browser uses by default for generating them (and I have no control over). It's also why the mismatch in encoding causes mojibake when the text from cgp gets imported into Chrome/Slimjet.

Took a closer look and it seems generation for the html was actually in the process of working when that error caused it to fail. I have here the html file generated that contains the contents of 1 full group, as well as partial contents from a second group. I've also uploaded the original cgp files of those two groups. You said the archive/extract process might be messing with the encoding. I know this is possible for the file names, but unlikely for the contents of a file. However, just to eliminate all possibilities, I've used an outside hosting site to upload the original files without compression, since the attachment format restrictions of this forum doesn't allow cgp or html.
(*links removed since they're no longer needed)

1. Unfortunately encoding conversion still isn't working right. The non-Romance characters still end up as gibberish when imported. Since the mojibake ends up with even more unusual symbols than normal being displayed when using a wrong encoding, perhaps it was one such character/symbol that caused the app to generate an error and exit? This appears to be something strangely similar to what happened with the convertor for cgp to IE favorites.

2. The unique top level "folder" name needs to be <H3> tag, not <H4>, or it will not be properly imported into Chrome/Slimjet. Any and all following folders after that also needs to be <H3> tag, preceded by <DT> and the contents enclosed by a pair of
Code: Text [Select]
  1. <DL><p> tags
The importing process doesn't recognize anything else. The only things denoting the boundaries of each folder are the
Code: Text [Select]
  1. <DL><p> and </DL><p> tag pairs
Everything following the DL tag and before /DL tag, regardless of whether it's a <H3> folder or an url, will mean that it's inside that particular "folder". So if you want to create different layers of folders, you only need to make sure the pairs are placed appropriately, which means the "folder" names always need to be <H3> tags and MUST be followed immediately by an opening DL, p tags. Current html is also missing the DL, p tags after the top level folder. The basic code I've included in my second post are what I have tested to work, so it should work as long as the format in the generated html closely follows that.

[attachimg=#3][/attachimg]

3. If it's not too much trouble, it's probably best to have the file name of the html generated include date as well as time. Otherwise there will still be a small but distinct possibility that it will conflict with an already generated file name. e.g. instead of "CGP2HTML-%hour%%min%%sec%.html", how about "CGP2HTML-%year%%month%%day%.%hour%%min%%sec%.html"?

4. One additional issue that just occurred to me when looking over my cgp files. The file names of some cgp files themselves can be in non-Romance characters. Let's consider together how we should handle this once the other issues have been resolved.
« Last Edit: July 14, 2015, 03:58 PM by Cocoa »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
It seems many of the commands are not supported in my Windows 7 SP1 x64 (Ult). Is there some extra dependency?

They are not commands, they're fragments of the url/name that is read - something that wasn't a problem in the abbreviated .cgp in your OP.

You can't convert from ANSI characters to UTF-8 characters from what I've tried and read.  Sure you can convert the file format but that won't fix the original input characters.
(EDIT: I'm stoopid :-[)

Here's the problem, you're not talking about just converting a text file to another text file in a different format/encoding.  You're talking about having a program read each URL, fetch that page, parse it, and then store the page title plus the URL in another bookmark format and character encoding.

This is why both the converter program and command file aren't providing you with what you want - a "format" converter is not what is required.

1. Unfortunately encoding conversion still isn't working right. The non-Romance characters still end up as gibberish when imported. Since the mojibake ends up with even more unusual symbols than normal being displayed when using a wrong encoding, perhaps it was one such character/symbol that caused the app to generate an error and exit?

Exactly, it was the gibberish that is the name field - there is no way around that, it starts out as gibberish, it'll remain gibberish.  The only option is to refetch the page and save the title with the correct encoding.

I'll have a play in Powershell since it supports Unicode.

EDIT: Exactly where are you located and what locale/language is your system set to?

I've been playing around in Powershell and quite frankly, I can't get it to work consistently with some of the sites in your .cgp file.

eg.  Reading the page title through Powershell:

Code: Text [Select]
  1. http://www.cnn.co.jp/fringe/35065380.html?tag=rcol;editorSelect
correctly gives:
CNN.co.jp : NASAの「空飛ぶ円盤」、ハワイで飛行実験

Code: Text [Select]
  1. http://www.excite.co.jp/world/chinese/
always gives gibberish even though the page is supposedly UTF-8:
中国語翻訳 - エキサイト 翻訳

Not only that but there is mixed encoding, it's not all UTF-8:
Code: Text [Select]
  1. http://www.messe.gr.jp/girls/index.html?category_id=&kw=%2588%25E4%258F%25E3%2598a%2595F&pageID=5
this page is Shift-JIS which means the title will be illegible when output as UTF-8.
« Last Edit: June 24, 2015, 07:30 PM by 4wd »

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
You're talking about having a program read each URL, fetch that page, parse it, and then store the page title plus the URL in another bookmark format and character encoding.
...
Exactly, it was the gibberish that is the name field - there is no way around that, it starts out as gibberish, it'll remain gibberish.  The only option is to refetch the page and save the title with the correct encoding.

I've been playing around in Powershell and quite frankly, I can't get it to work consistently with some of the sites in your .cgp file.
At first I was very confused about the first sentence, since from the beginning, I was talking about just converting text from the cgp files into text stored in the html format, but now I understand where you're coming from. The thing is, fetching titles from the sites themselves wouldn't work, and not just because you have no control over the encoding any particular site chooses to use (yes, some use Unicode, but many will use a specific encoding which varies depending on the language). Most importantly, because my original "names" for the urls are more than just the titles for the pages. I used it more for notes than anything else, summarizing the site's content in a way that makes sense to me, and often include important reminders of other things. That's why it's crucial to retain the original "name" fields.

I truly appreciate your patience and varied approach in finding a solution to this challenge. ;D

I thought it would be possible to convert directly from ANSI to Unicode, but as someone without any technical background in computers, I don't really know how the process works, so I guess I over simplified things. You said if it starts out as gibberish, it would remain gibberish. Thinking back on the some of the little built-in tools I've occasionally used from other software to fix mojibake (such as this Chacon plugin for foobar2000), they have generally always required you to choose a character encoding for the source text, then an output encoding.

In Chacon, which is used to fix tags for audio files, it provides a preview window to allow you to switch between each encoding and see immediately if the characters are displayed correctly, i.e. if the correct source character encoding is chosen, prior to conversion and final overwrite to the tags themselves. It allows non-Unicode characters that was generated on one computer to be displayed correctly on another computer with a different system locale. The preview is useful due to the fact that not only do you have no idea what system locale was used on the computer from which the file originated, even when you know the actual language the characters are suppose to be in, the encoding itself doesn't always match the actual language of the characters used. With a large enough characterset, it's possible to display nearly all other characters from other languages in the same characterset intended for another language. e.g. It's possible to display Japanese characters using Chinese encoding, and maybe vice versa. (It's also why most European languages only have one "Western" charset.) However, sometimes there are some characters not included, and another important function of the preview is to allow you to see what might be lost if you chose a certain encoding, and which encoding might retain the most info, etc. I didn't think it would've been necessary to do it this way for converting from ANSI into Unicode, but I overlooked the fact that Unicode is ultimately still just another encoding, albeit one that is able to include all languages.
Chacon scrnsht.PNG

I will understand if you consider this too involved to implement. There is always the option to set it aside. I can try manually editing the "UTF-8" field for the right encoding. In your first version of the tool CGP2HTML, the html generated was able to be displayed correctly after I edited the "UTF-8" field to the correct encoding that I assume the text was originally saved in. If you just obtain the text without attempting to convert (or however it was done in the first version),  I can manually edit this field into what I think might be the proper encoding, based on the non-Unicode settings of the system it was generated on.

Case in point, on my system, the "system locale" or what Windows calls "the language setting for non-Unicode programs" varies, changing from time to time, depending on what language I need to display. It is often set to "Chinese (Simplified, PRC)", and sometimes "Japanese (Japan)". As I said, there are many different code pages even just for Simplified Chinese, and I'm not sure exactly what that translates to code page wise, but when I open a cgp file directly in the Green Browser window as a "webpage", "GB2312" is the encoding it selects automatically, and appears to display correctly. When I try to open the cgp in Slimjet, I have to manually select an encoding to have it display properly, and since GB2312 wasn't available, I chose "GBK", which is apparently just an extended version of the GB2312 charset. It also appears to display properly. In the html generated by the first version of CGP2HTML, all I did was edit UTF-8 to GBK and it displayed correctly when imported into Slimjet.
« Last Edit: June 24, 2015, 05:17 PM by Cocoa »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
Case in point, on my system, the "system locale" or what Windows calls "the language setting for non-Unicode programs" varies, changing from time to time, depending on what language I need to display. It is often set to "Chinese (Simplified, PRC)", and sometimes "Japanese (Japan)".

Bingo!  I wish you'd said that at the start.  OK, let's see how far I get this time :)

I was wrong about the character encoding conversion, oh well ... happens sometimes   ;)  Anyway, found a program that will do it at the command line so I can work it into the command file or you can bulk convert all your .cgp before running the reformat.  One or the other, I'll see what happens.

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
 :-[ Sorry, besides the fact I switch between different system locales depending on what I need, I didn't think it was necessary to know the initial encoding in order to convert from ANSI to Unicode. I apologize for any unnecessary detours caused.  :P

I'm glad you found something to simplify the encoding conversion process. :) I think ultimately, all data is stored in binary format, so while it may appear to be gibberish, it's not actually gibberish, as long as it hasn't been corrupted. As I understand it, that data just needs the proper key to be "translated" into human readable form, which in this case, is what the character encodings are for. That's why the characters in the cgp files are able to be consistently displayed in Green Browser on my computer, as well as show up correctly in Slimjet, once the proper encoding was selected. (They might not show up properly on your system until you have the proper encoding installed, though I'm pretty sure most post-WinXP OS's already have most encoding installed by default). It's only when the data gets translated with the wrong encoding and then stored in the "translated" form, then they will get corrupted and likely lose portions, if not all, original info, even if translated correctly again. After all, some things are "lost in translation", especially incorrect translation. That's when gibberish will truly stay gibberish.

Please take your time and try things out to see what works. I'll look forward to seeing the result and testing things again.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
Powershell version, don't know what the minimum version of Powershell it requires is since I'm using v4 (Win 8.1).

It's set to Simplified Chinese input but that can be changed by editing one value or I could change it to use whatever the console is using, (entering a value gives more flexibility I think).  Line 27 in script is where it's set, eg. gb2312 = Simplified Chinese, (PRC).

Double-click the shortcut to run it, (needs to be in same folder as the script unless you edit the shortcut properties).

Code: PowerShell [Select]
  1. <#
  2. .SYNOPSIS
  3.    Converts Green Browser Group files to HTML bookmarks (Mozilla/Chrome)
  4. .DESCRIPTION
  5.    Reads all ANSI .cgp files in a folder according to encoding specified, saves
  6.    to temp UTF-8 file, then reads file and outputs to Mozilla/Chrome compatible
  7.    bookmark format.
  8. .PARAMETER <paramName>
  9.    None
  10. .EXAMPLE
  11.    PS> CGP2HTML.ps1
  12. #>
  13.  
  14. Function Get-Folder {
  15.   Add-Type -AssemblyName System.Windows.Forms
  16.   $FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog
  17.   [void]$FolderBrowser.ShowDialog()
  18.   Return $FolderBrowser.SelectedPath
  19. }  
  20.  
  21. Function Convert-UTF {
  22.   Param(
  23.     [String]$SourceFile
  24.   )
  25.   $Buffer = Get-Content $SourceFile -Encoding byte
  26.  
  27.   $Encoding = [System.Text.Encoding]::GetEncoding("gb2312")
  28.   $String = $Encoding.GetString($Buffer)
  29.   Out-File -FilePath .\C2Htmp.txt -Encoding utf8 -InputObject $String
  30. }
  31.  
  32. Function Write-Header {
  33.   Param(
  34.     [String]$OutFile,
  35.     [String]$Folder
  36.   )
  37.   '<!DOCTYPE NETSCAPE-Bookmark-file-1>' | Out-File -Encoding utf8 -FilePath $OutFile -NoClobber
  38.   '<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  39.   '<TITLE>Bookmarks</TITLE>' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  40.   '<H1>Bookmarks</H1>' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  41.   '<DL><P>' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  42.   '<DT><H3>{0} --- {1}</H3>' -f $Folder, (Get-Date) | Out-File -Encoding utf8 -FilePath $OutFile -Append
  43.   '<DL><P>' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  44. }
  45.  
  46. Function Out-Bookmark {
  47.   Param(
  48.     [Object]$File,
  49.     [String]$Outfile
  50.   )
  51.   for($i=0; $i -lt $File.Length; $i++) {
  52.     $tmpFile = $File[$i] | % { $_.BaseName }
  53.     Write-Host "Converting: " $File[$i]
  54.     Convert-UTF $File[$i]
  55.     '<DT><H3>' + $tmpFile + '</H3></DT>' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  56.     '<DL><P>' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  57.     $Lines = (Get-Content ".\C2Htmp.txt")
  58.     for($j=2; $j -lt $Lines.Length; $j+=2) {
  59.       '<DT><A HREF="{0}">{1}</A>' -f $Lines[$j].Substring($Lines[$j].IndexOf("=") + 1), `
  60.                                      $Lines[$j - 1].Substring($Lines[$j - 1].IndexOf("=") + 1) `
  61.                                      | Out-File -Encoding utf8 -FilePath $OutFile -Append
  62.     }
  63.     '</DL><P>' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  64.     Remove-Item '.\C2HTmp.txt'
  65.   }
  66.   '</DL><P>' | Out-File -Encoding utf8 -FilePath $OutFile -Append
  67. }
  68.  
  69. If($PSVersionTable.PSVersion.Major -lt 4) {
  70.   Write-Host '** Script requires at least Powershell V4 **'
  71. } else {
  72.   $sFolder = (Get-Folder)
  73.   If($sFolder -eq '') {Exit}
  74.   Write-Host '-- Conversion started --'
  75.   $aFiles = (Get-ChildItem -Include *.cgp -Path ($sFolder + "\*"))
  76.   If(($sFolder.LastIndexOf('\') + 1) -eq $sFolder.Length) {
  77.   $sOutfile =  ($sFolder | % { $_.Substring(0, $_.LastIndexOf('\') + 1) }) `
  78.                 + 'CGP2HTML-' + ("{0:yyyy}" -f (Get-Date)) + ("{0:MM}" -f (Get-Date)) `
  79.                 + ("{0:dd}" -f (Get-Date)) + ('{0:hh}' -f (Get-Date)) `
  80.                 + ('{0:mm}' -f (Get-Date)) + ('{0:ss}' -f (Get-Date)) + '.html'
  81.   } else {
  82.   $sOutfile =  $sFolder + '\' + 'CGP2HTML-' + ("{0:yyyy}" -f (Get-Date)) `
  83.                 + ("{0:MM}" -f (Get-Date)) + ("{0:dd}" -f (Get-Date)) `
  84.                 + ('{0:hh}' -f (Get-Date)) + ('{0:mm}' -f (Get-Date)) `
  85.                 + ('{0:ss}' -f (Get-Date)) + '.html'
  86.   }
  87.   Write-Host '-- Conversion finished --'
  88.   Out-Bookmark $aFiles $sOutfile
  89.   Write-Header $sOutfile $sFolder
  90. }
  91. Write-Host ''
  92. Write-Host 'Output file: ' $sOutfile -foregroundcolor "blue" -backgroundcolor "yellow"
  93. Write-Host ''
  94. Write-Host 'Press a key to exit ...'
  95. cmd /c pause | out-null
« Last Edit: August 31, 2015, 09:03 PM by 4wd »

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
Hi 4wd,
I'm getting a window with errors when double-clicking the shortcut. It was gone too fast, but after several tries I managed to capture this scrnsht. After some research, it seems this is due to the fact the default security settings in Windows does not allow the execution of unsigned scripts. Is there a way you can sign it? Or do I need to permanently set the security level to "unsigned" in order to run the script? I hope there is a more secure workaround to avoid opening a permanent security risk for all unsigned scripts on my system.

error scrnsht.gif

I found this while searching for a solution. Perhaps you can make use of it, or help me understand how to use it?

On my machine that I use to dev scripts, I will use -unrestricted as above. When deploying my scripts however, to an end user machine, I will just call powershell with the -executionpolicy switch:
powershell.exe -noprofile -executionpolicy bypass -file .\script.ps1
« Last Edit: June 27, 2015, 10:14 PM by Cocoa »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
On my machine that I use to dev scripts, I will use -unrestricted as above. When deploying my scripts however, to an end user machine, I will just call powershell with the -executionpolicy switch:
powershell.exe -noprofile -executionpolicy bypass -file .\script.ps1

Try adding the -noprofile -executionpolicy bypass switches to the command line in the shortcut.

EDIT: This worked, it allowed the script to run after I changed my policy back to AllSigned, (mine's normally at RemoteSigned).

2015-06-28 13_31_32.png

BTW, you did amend any path in the shortcut to point to wherever you've put the script, right?

Is there a way you can sign it?

Yes, but that means you'd need to import a certificate into your system that I create - bit much for one script.
« Last Edit: June 28, 2015, 11:32 PM by 4wd »

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
Try adding the -noprofile -executionpolicy bypass switches to the command line in the shortcut.
Tried it, but this time all it shows is a blinking white input cursor  :huh:


BTW, you did amend any path in the shortcut to point to wherever you've put the script, right?
Which path exactly do you mean? If you mean the "start in" box for the shortcut, I removed the old path. Based on previous experience with shortcuts in general, I didn't fill in any new ones because I didn't think it was necessary, but now I have updated it. As with the previous result, running the shortcut gets the same white blinking cursor and nothing else.

After further consideration, I've also tried replacing the -File with -"C:\Program Files (x86)\Green Browser\Groups", which is where the cgp files are stored, but this time I once again got a window with red error messages that disappeared too fast for me to read or screenshot. I tried running it repeatedly after that to capture a screenshot, but it seems all I'm getting now are blank windows that last less than a second, without the error messages anymore.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
-File with -"C:\Program Files (x86)\Green Browser\Groups", which is where the cgp files are stored, ...

The -File argument points to where the script is which is why I said this:

BTW, you did amend any path in the shortcut to point to wherever you've put the script, right?

So, if you extracted the script into the folder C:\Powershell Scripts and put the shortcut on your Desktop, then the shortcut needs to be edited to show:

%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe -noprofile -executionpolicy bypass -File "C:\Powershell Scripts\CGP2HTML.ps1"

I updated the shortcut in the archive with the necessary parameters.

You could also start a CLI, change to the folder where the script is, and type in the command:

powershell -noprofile -executionpolicy bypass -File "CGP2HTML.ps1"

2015-06-30 13_47_31.pngREQ: Convert/Export Certain Browser's "Groups" as Chrome Bookmarks

FWIW, running via the shortcut works on all the machines I have access to here, when run from a flash drive.

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
Hey, 4wd

I had both the script and the shortcut in the same folder and you had said I didn't need to change anything if I did that. After I followed your instructions in the latest post and amended the target path in the shortcut properties to

%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe -noprofile -executionpolicy bypass -File "C:\Temp\CGP2HTML\v3\CGP2HTML.ps1"
(Which is where I stored both the script and the shortcut)

I got the same result as before, a white blinking cursor in a powershell window with nothing else.

I decided to try updating Powershell to 4.0, the version that was in your system, Windows 8.1. This time, the cursor window also brought up a box to choose a folder, similar to your screenshot.

I pointed it to the folder where the cgp files were stored, the conversion happened, then a prompt said "conversion finished, press any key to exit".

I did that.

The problem now is... I can't find the html file that was supposed to be generated after that. It's not in the cgp folder, or in the folder where the script was. I ran it a second time with the updated files in your previous post, same result. I tried using Windows Search to do a full hard drive search for the file, but since I turned off indexing, don't know the file name, and have no other more advanced file search tool, it's likely trying a needle in a haystack. Where is the html file??

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
I decided to try updating Powershell to 4.0, the version that was in your system, Windows 8.1. This time, the cursor window also brought up a box to choose a folder, similar to your screenshot.

I guess it does use something particular to Powershell v4, I suppose I should test for that.

I pointed it to the folder where the cgp files were stored, the conversion happened, then a prompt said "conversion finished, press any key to exit".

I did that.

Don't take it the wrong way but a small piece of advice: read slower ;)

2015-07-01 13_19_15.png

The file will be named CGP2HTML-YYYYMMDDHHMMSS.html and will be in the parent of the folder you chose, (or if it was a root folder it will be in that).  See update.

And no, it won't appear as yellow on blue, that's just the highlighter of the screen capture program :)

UPDATED:
Added: Colour output
Added: Check for Powershell version
Fixed:  Output is to the selected folder
« Last Edit: July 01, 2015, 02:30 AM by 4wd »

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
(^ ^!) I think with about a hundred lines flashing past in the span of a few seconds for all the cgps I had to convert, I would need to have super humanly fast reading speed, rather than slower, to catch that 1 line at the beginning. I did check the the last few lines very carefully to look for some info on the final conversion results, as that's where the info usually is, but unfortunately didn't expect that to be at the very beginning or I would have tried to go back and see it.

I've tested importing and all has gone very smoothly this time. As far as I can tell, the nearly a thousand urls and their names in all the groups are accounted for and any Japanese or Chinese characters have also been transferred intact.  Even the non-Romance file names of the cgp groups have been handled just fine. ;D

Thank you so much, 4wd for all your work and unwavering support that finally created this tool. Being able to retain all the urls and info collected over the years is invaluable and I have thoroughly enjoyed the process of our tackling the challenges together.  :D

In case you or anyone else feels like adding additional features in the future, such as reverse converting back from html to cgp, I'll still more than welcome them.

BTW, I'm curious about the workings of the program you found that could perform encoding conversions. Could you provide a link to the original?

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
(^ ^!) I think with about a hundred lines flashing past in the span of a few seconds for all the cgps I had to convert, I would need to have super humanly fast reading speed, rather than slower, to catch that 1 line at the beginning.

DOH!  Sorry, forgot I was dealing with a serial bookmarker  ;D

UPDATED: I've moved it to the bottom.

In case you or anyone else feels like adding additional features in the future, such as reverse converting back from html to cgp, I'll still more than welcome them.

Should be easy enough to go the other way since the .cgp format is even simpler.

BTW, I'm curious about the workings of the program you found that could perform encoding conversions. Could you provide a link to the original?

I didn't need it, Powershell does the necessary conversion - all you need to do is tell it what the input character encoding is (in this case gb2312 - here's a list) and what output encoding you want (UTF-8).

I was going to use the other program if I continued on with using a normal command file.  I've since deleted it but here's a simple Powershell script that does the same thing - paste below into a file called X2UTF8.ps1

There's no error checking, so you need to get encoding and file paths correct.

Code: PowerShell [Select]
  1. <#
  2. .SYNOPSIS
  3.    Converts text files to UTF-8
  4. .DESCRIPTION
  5.    Reads text file according to encoding specified, saves to UTF-8 file.
  6. .PARAMETER
  7.    <input encoding> - encoding of original file.
  8.    <infile>         - file to convert.
  9.    <outfile>        - output file.  
  10. .EXAMPLE
  11.    PS> X2UTF8.ps1 gb2312 input.txt output.txt
  12. #>
  13.  
  14. Param(
  15.     [String]$Encoded,
  16.     [String]$SourceFile,
  17.     [String]$Destfile
  18. )
  19. $Buffer = Get-Content $SourceFile -Encoding byte
  20. $Encoding = [System.Text.Encoding]::GetEncoding($Encoded)
  21. $String = $Encoding.GetString($Buffer)
  22. Out-File -FilePath $Destfile -Encoding utf8 -InputObject $String

Open a Powershell console and call as: .\x2utf8.ps1 <encoding> <input file> <output file>

eg.  .\X2UTF8.ps1 gb2312 D:\test\12_31-03.47.29.cgp K:\test.txt
« Last Edit: August 31, 2015, 09:04 PM by 4wd »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
In case you or anyone else feels like adding additional features in the future, such as reverse converting back from html to cgp, I'll still more than welcome them.

Something for you to test: HTML2CGP

Still in Powershell, has a simple GUI:

2015-07-10 15_30_44.png

Extract all files to a folder somewhere and double-click the shortcut to run.  You should see the normal PoSh command window open then hide and then the GUI opens.

Two buttons, self-explanatory, the Output: line will have simple instructions, (choose file/folder), progress messages, (converting file xxx, etc), as well as show where the output went.

I've changed it so that it uses whatever the system character encoding is for the input and output, (at least I think I have, it's a bit hard to test since I'm stuck with English :) ).

eg. If your system is set for Simplified Chinese then it should read/write the .cgp files in that language.  However, if your system is set for OEM437, (or something), and the files are Simplified Chinese characters and ANSI encoded, then you'll get something indecipherable.

That's what I got anyway, even though I'm set to OEM850 the output files were byte identical to the original input files after converting to HTML and then back to CGP, so in theory it should work.

Requires Powershell v4, should open a requester if you haven't got it (haven't got anything to test it on).

Cocoa

  • Supporting Member
  • Joined in 2015
  • **
  • Posts: 16
    • View Profile
    • Donate to Member
Thanks so much for providing the UTF8 conversion script and the latest version of the HTML2CGP tool. I tested both and here are the results.

HTML2CGP:

Like before, the conversion works perfectly for converting from cgp to html, but when converting back from html to cgp, it's actually not quite as straightforward due to the more complicated format of the html file.

There are several things that needs to be addressed:

1. Slimjet exports ALL bookmarks as a single html file, without the option to exclude, but I neither need, nor want the entire bookmark collection to be converted back into cgp groups wholesale. So I would really like an option to specify a folder name that the exporter should look for, for example "Export to GB", under which I will copy over all the bookmarks and their folders that I actually want to be converted into cgp. Additionally, the folders under this folder might have multiple levels (nested folders), and the export needs to be able to export only those that have urls (bookmarks) DIRECTLY under them, while retaining the folder name as the cgp group file name. (Please see screenshot for clarification on what the format might be like.)

2. Since the folder names are to be converted into cgp group file names, some folder names may contain characters that are illegal for Windows file names. If the converter runs into any of them, it would be immensely helpful to have it auto-replace the illegal characters with an underscore "_"

[attachimg=#1][/attachimg]
[attachurl=#3][/attachurl] (html format of the above example)

X2UTF8:

I hate to bother you about this in addition to the converter, but I tested the script with different files, and it seems it's not accepting the encoding names. First I tried Shift-JIS, and then I switched to a cgp file, using the "gb2312" you gave as an example, but I kept getting the same error as below:
[attachimg=#2][/attachimg]
« Last Edit: July 14, 2015, 04:47 PM by Cocoa »