DonationCoder.com Forum

Main Area and Open Discussion => General Software Discussion => Topic started by: me_7834539 on July 01, 2012, 11:56 AM

Title: sorting units of text
Post by: me_7834539 on July 01, 2012, 11:56 AM
Hello,

I usually type all kinds of information into plain text files, categorizing every memo/info/todo with a simple tag like e.g. "car" for car-related stuff, or hou for "house" related stuff etc. I also type current todo items into the same text. the text may look like this:
>>
cat3[tab]blablabla
[tab](..continued blabla

cat2[tab]blabla

cat3[tab]blabla

cat1[tab]blabla....
[tab]continued blabla.....
[tab]continued blabla.......

cat2[tab]blabla
<<

after some time, I have a huge text with unsorted units of text. I would like to sort the entries alphabetically/numerically so that it looks like this:


>>
cat1[tab]blabla

cat2[tab]blabla

cat2...

cat3...

cat3...
<<

After an hour of research, I found the only program capable of doing this was an online service, http://sortmylist.com/
This service has the feature to define blank lines as a separator. (However, they seem to send the info over the
internet, which I object to.)

(even more helpful would be "a new item starts with a linebreak that is directly
followed by up to 7 alphanumeric characters, that are again followed by another tab".
this would not require a blank line between items, or any other manually added separator.)

perhaps the program could simply work as a one-click-program (or one shortcut, like win-Q or so), converting the
current contents of the clipboard. text should not be altered, i.e. special /international characters and tabs etc.
should all be retained.

I think I am problably not the only person who simply types their stuff into a text, with a tag + tab to
at least categorize it a little. I am aware there are programs that will do this, but using plain text and
a lightweitht editor has so many advantages.

-> my question: can anyone please point me to a locally installable program that can do this? I was unable to find one.
Or, does anyone feel this might be a coding idea worth pursuing?

Thanks!
Title: Re: sorting units of text
Post by: eleman on July 01, 2012, 12:24 PM
Check out Cleanhaven (http://www.holymackerelsoftware.com/CleanHaven/CleanHaven.html)
Title: Re: sorting units of text
Post by: MilesAhead on July 01, 2012, 01:21 PM
Instead of a plain text file you may consider using TreePadLite (http://www.treepad.com/treepadfreeware/)

If you are using text only then the free version should do all you need. The paid versions can do things like embed graphics.  In the free version hyperlinks are not underlined, but if you put the caret on one and hit Control-h it will browse to the url.

The "cat1" you use as tags would instead be nodes in the tree and show on the left.  Here's a screen shot

[ You are not allowed to view attachments ]


Of course all the work would be transfer of the plain text to the TreePad file. But they do publish their format, at least for the TreePadLite plain text.  They also have links to 3rd party software that works with TreePad. Most of it is paid though.

edit: initially you may have to do a lot of cut & paste.  But there is a button in the toolbar to Sort a subtree.   You can add subtree nodes as you go and paste in the text, then just hit the Sort button to put the nodes in order.

edit2: here's the spec for TreePad file format:
http://www.treepad.com/docs/fileformat.txt

To create a file of nodes with a paragraph of text for each node is not difficult.  In TreePadGen the text for each node was blank. I just got all the filenames in a folder and added a node for each. You could do something similar by detecting "cat1" tags and then adding the associated paragraph.  For a bunch of nodes on the same level it's not hard to add the file markers between each node in a loop.  The tough part would likely be digging out the tags, then determining when the text was finished. Handling dupes might also be an issue.

It's one approach.  Also supposedly other note taking type apps can import TreePad files if you change in the future. (I say "supposedly" because I didn't spend any money to test any of the paid apps that claim to be able to import TreePad. Just standard skepticism if I haven't tried it myself.) :)
Title: Re: sorting units of text
Post by: rjbull on July 01, 2012, 03:54 PM
Do you really want what you said, and sort the items, or do you just want to find them using the categories as keywords/tags?  If the latter, you might look at NoteFrog (http://notefrog.com/indexgo.html).  The current version can import files delimited with markers.

If you really want to keep your text organised, then you might try a single-pane outliner like Noteliner (http://www.noteliner.org/i/Main.html).  It's free, but .NET if that bothers you, and uses its own file format.

Otherwise, I concure with MilesAhead, though I normally reach first for MemPad (http://www.horstmuc.de/wmem.htm).  MemPad has particularly nice input/output from/to delimited files.  It shouldn't be too hard to massage your files into forms that NoteFrog or MemPad can import.

FWIW, following MilesAhead's comment, I do know that Ultra Recall (http://www.kinook.com/index.html) Standard or Professional can import Treepad Lite .HJT files.  Some of the other outliners, like RightNote (http://www.bauerapps.com/RightNote.html) and AllMyNotes (http://www.vladonai.com/), only accept KeyNote (http://www.tranglos.com/free/keynote.html) .KNT files, but KeyNote is free and can import .HJT files itself, so you can convert in two stages.
Title: Re: sorting units of text
Post by: me_7834539 on July 01, 2012, 03:58 PM
thanks a lot for your input so far.

however, cleanhaven apparently cannot do what I described. Treepad seems like nice idea, even though this could be implemented even better with a tabbed notepad like e.g. notepad++ and a bunch of auto-loaded text-files. problem is, my method has these advantages:
- having just one plain text file and sorting the items at a later point in time enables you to have items written in the last days in one place, and not scattered around lots of tab-textx resp. tree folders.
- it is possible to type some text and only later categorize it.
- having just one plain text file gives you a maximum of options to choose the text editor that you like best.
- you need not use the mouse/need not navigate
- etc.pp.

:)
Title: Re: sorting units of text
Post by: me_7834539 on July 01, 2012, 04:36 PM
(Once again thank you for your comments.)

| Do you really want what you said, and sort the items, or do you just want to find them using the categories as keywords/tags? 

It's about sorting. In the end, I want a category  bunches of e.g. 15 - 30 text items each.

| If the latter, you might look at NoteFrog.  The current version can import files delimited with markers.

Seems similar to Treepad... same problems therefore...

| Noteliner.  It's free, but .NET if that bothers you, and uses its own file format.

Doesn't seem to be able to do what I described, either...

| MemPad. 

dito.

| Ultra Recall , RightNote, AllMyNote.  KeyNote.

dito.dito.dito.dito...  :)

however, in general these are nice recommendations. I did not know of all of these. Thanks.
Title: Re: sorting units of text
Post by: MilesAhead on July 01, 2012, 04:40 PM
Try vim (http://www.vim.org/)

I don't know of any other free Windows text editors that can match it when it comes to manipulating paragraphs. But you'd have to absorb the learning curve.

The "sort it afterwards" part of your plan seems to be the sticking point.

As far as free form text entry in TreePad I would create a node for it.  Perhaps "Uncategorized" and just type paragraphs in that node.

A programmable editor such as vim seems to be the best chance of unscrambling the current mish mash.
Title: Re: sorting units of text
Post by: MilesAhead on July 01, 2012, 05:24 PM
Vim has a page for user contributed scripts. You may find one you can modify to your scheme

http://www.vim.org/scripts/script_search_results.php

Title: Re: sorting units of text
Post by: skwire on July 01, 2012, 06:13 PM
Hi, me_7834539, and welcome to DonationCoder.  Are you familiar with AutoHotkey?
Title: Re: sorting units of text
Post by: me_7834539 on July 01, 2012, 06:31 PM
hello! cool, so much help! thanks!

thanks for pointing to vim. that's interesting. I couldn't find a script there that does what I described (though I might have missed one), but I might find some more help there if this thread won't yield an answer. Notepad++ plugins might also be a resource, though there are no plugins that can do paragraph sorting either.

Yes, I am actually using AHK for very simple tasks. (I can't code, obviously.) I was thinking of posting my question there, as well, but I remembered this place here as very friendly... that's why I came back here first :-* ;)

Title: Re: sorting units of text
Post by: me_7834539 on July 01, 2012, 06:48 PM
had the idea of googling for "paragraph sorting".
that has yielded these:
http://www.kahrel.plus.com/indesign/sort.html
http://www.linuxmisc.com/12-unix-shell/b7ccbdb46ef0b9ce.htm
http://backreference.org/2010/08/07/sorting-by-paragraph/

in general, much of this seems to come close...

just as a side note, I'd spend 30-50 USD on a solution...
Title: Re: sorting units of text
Post by: skwire on July 01, 2012, 06:57 PM
Yes, I am actually using AHK for very simple tasks.

Good, I was just making sure before posting a simple script that should accomplish what you want.  I'd normally just paste the script into a codebox right into this post but the code uses a special character that our forum keeps converting erroneously.  Grab the actual code from this link:

http://skwire.dcmembers.com/apps/snacks/cat-parser.ahk (http://skwire.dcmembers.com/apps/snacks/cat-parser.ahk)

It's just a simple hotkey script so you can merge it into your own scripts or save it out as a stand-alone script.  Once you have it how you like, copy your category text to the clipboard, run the hotkey (default Win+1, or however you incorporate it), and the text should be sorted and placed back on your clipboard.  Let us know how it goes.  Thanks.

Title: Re: sorting units of text
Post by: MilesAhead on July 01, 2012, 07:09 PM
Another approach may be using a pattern matching language/program. One that's pretty easy to pick up is awk/nawk/gawk .. you can find various free implementations of awk for Windows. In a nutshell it operates on a line of text at a time. When you encounter a pattern, such as some particular thing found in the line like a Tab, colon, substring or whatever, you can set a variable to indicate doing some particular action until the state changes. For example if you find a start tag, then do such and such with the following lines of text until some terminating condition(an empty line for example) is encountered.

It's so long since I used it I totally forget the syntax.  But it's really designed for sifting through text input, rearranging or skipping certain portions, and then writing the output.

Plus you can find a ton of "awk one liners" on the web.  There have to be zillions of canned scripts for awk out there for free.

http://www.pement.org/awk/awk1line.txt

Title: Re: sorting units of text
Post by: me_7834539 on July 02, 2012, 07:59 AM
milesahead, thanks for pointing to awk!

however, skwire's quick ahk script actually works, that's so great, thank you, thank you! (you've got email from paypal, skwire).

I would never have thought that the script would be soo short!

I still have some questions / remarks:

- the script seemed to work perfectly, except for the very first item. the very first item always stayed on top (like this: dfbaacbc -> daabbccf).

- for your information, win-1 is a shortcut that by default is registered by windows itself, to switch to the first taskbar application tab. I had to change it (e.g. to win-i).

- I have not yet thoroughly tested the script, but so far it seems to work very nicely. I did a test if any characters (or tabstops) are lost (German keyboard, all keys incl. using modifiers shift and alt-gr), but everything seemed fine.

Title: Re: sorting units of text
Post by: me_7834539 on July 02, 2012, 08:32 AM
oh... and would it be difficult to add to the script that blank lines be removed after the sorting?  :-[ :Thmbsup:
Title: Re: sorting units of text
Post by: skwire on July 02, 2012, 08:44 AM
- the script seemed to work perfectly, except for the very first item. the very first item always stayed on top (like this: dfbaacbc -> daabbccf).

That's odd as I didn't experience this in my, albeit minimal, testing.  Can you PM me a short sample text file that this happens with?

- I have not yet thoroughly tested the script, but so far it seems to work very nicely. I did a test if any characters (or tabstops) are lost (German keyboard, all keys incl. using modifiers shift and alt-gr), but everything seemed fine.

If you run into situations where this occurs, you will probably have to switch to AutoHotkey_L which handles Unicode.

oh... and would it be difficult to add to the script that blank lines be removed after the sorting?  :-[ :Thmbsup:

Sure, can do. Did you mean ALL blank lines, e.g.,:

cat1    Here is some cat1 text.
        Here is some continued cat1 text.
cat2    Here is some cat2 text.
        Here is some continued cat2 text.
        Here is some more continued cat2 text.
        Here is even more continued cat2 text.
cat3    Here is some cat3 text.
        Here is some continued cat3 text.
Title: Re: sorting units of text
Post by: me_7834539 on July 02, 2012, 09:37 AM
hello jody,

what do you mean by "all" blank lines? only one "type" of blank lines would have been possible in the text example you gave, which are blank lines that separate each item from another.

different "types" of blank lines are, afaics, only thinkable if there are several items of the same category involved (which typically is the case, of course). if this is what you meant, here's an example:

this is original, non-transformed text:

(note: the line numbering is not part of the text, obviously)
Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3.  
  4. cat3    Here is some cat3 text.
  5.         Here is some continued cat3 text.
  6.  
  7. cat2    Here is some cat2 text.
  8.         Here is some continued cat2 text.
  9.         Here is some more continued cat2 text.
  10.  
  11. cat2    Here is ANOTHER cat2 item.

this is what your original script currently does:

Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3.  
  4. cat2    Here is ANOTHER cat2 item.
  5.  
  6. cat2    Here is some cat2 text.
  7.         Here is some continued cat2 text.
  8.         Here is some more continued cat2 text.
  9.  
  10. cat3    Here is some cat3 text.
  11.         Here is some continued cat3 text.

here is what I meant by removing blank lines:

Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3. cat2    Here is ANOTHER cat2 item.
  4. cat2    Here is some cat2 text.
  5.         Here is some continued cat2 text.
  6.         Here is some more continued cat2 text.
  7. cat3    Here is some cat3 text.
  8.         Here is some continued cat3 text.

here is what is it would look like if it differentiated between blank lines that sperated items, versus seperating blocks of same-category-items. I suppose this would be a bit difficult to code and is not really necessary, though it would be neat, of course.

Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3.  
  4. cat2    Here is ANOTHER cat2 item.
  5. cat2    Here is some cat2 text.
  6.         Here is some continued cat2 text.
  7.         Here is some more continued cat2 text.
  8.  
  9. cat3    Here is some cat3 text.
  10.         Here is some continued cat3 text.

(side rematk: I am now outing myself as a total idiot, I just noticed that the script temporarily replaces blank lines by §, I am curious as to why this does not break text that originally contains §.)

and here is an example of text where your script will not sort the first item:

Code: Text [Select]
  1. qqq     this is a test line (q)
  2.  
  3. aaa     this is a test line (a)
  4.  
  5. ccc     this is a test line (c1)
  6.  
  7. bbb     this is a test line (b)
  8.  
  9. ccc     this is a test line (c2)
  10.  
  11.  
  12. -------------------------------
  13.  
  14. here's the faulty result:
  15.  
  16. qqq     this is a test line (q)
  17.  
  18. aaa     this is a test line (a)
  19.  
  20. bbb     this is a test line (b)
  21.  
  22. ccc     this is a test line (c1)
  23.  
  24. ccc     this is a test line (c2)

thanks for your efforts!!

matthias
Title: Re: sorting units of text
Post by: skwire on July 02, 2012, 10:21 AM
here is what I meant by removing blank lines:

Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3. cat2    Here is ANOTHER cat2 item.
  4. cat2    Here is some cat2 text.
  5.         Here is some continued cat2 text.
  6.         Here is some more continued cat2 text.
  7. cat3    Here is some cat3 text.
  8.         Here is some continued cat3 text.

Yep, I was just making sure you wanted it this way.

(side rematk: I am now outing myself as a total idiot, I just noticed that the script temporarily replaces blank lines by §, I am curious as to why this does not break text that originally contains §.)

If that character is in the original text, it should break the script.  I chose that character since it's rarely used.  However, if you do happen to use it on a regular basis, we can change it to some other, rarely-used character.

and here is an example of text where your script will not sort the first item:

That's very strange as it sorts fine for me.   :huh:  You could try changing the sorting line in the script this:

    Sort, myText, CL D§

thanks for your efforts!!

You're welcome.  Thanks very much for the donation.   :D
Title: Re: sorting units of text
Post by: me_7834539 on July 02, 2012, 11:57 AM
hello, the script's behavior is all a bit too voodoo to me... hmm. first of all:

this:
qqq   testtest

ppp   testtest

will sort correctly for you? that's really strange. I am using windows notepad and win-i as a shortcut; win 7 32bit (German). autohotkey recently installed.

(side remark, I suppose the capital D in your suggestion is of no importance, I seem to get identical results). so, adding CL yields lots of errors:

qqq   test1a

qqq   test1b

aaa   test2

ppp   test3

....is turned into......:

§aaa   test2

ppp   test3

qqq   test1bÂqqq   test1a

however, note that aaa is now the first item.

what is strange, too: text is NOT broken when it contains "§". (I tested this in your original script, not in the one with CL added).

Title: Re: sorting units of text
Post by: me_7834539 on July 02, 2012, 12:00 PM
btw, I am wondering if donation coder is the right forum for discussing AHK... seems like the ahk forums might solve the problem faster at this point... just an idea though.
Title: Re: sorting units of text
Post by: skwire on July 02, 2012, 02:07 PM
hello, the script's behavior is all a bit too voodoo to me... hmm. first of all:

this:
qqq   testtest

ppp   testtest

will sort correctly for you?

Yep, works perfectly here.  I wonder if this is a "German OS versus English OS" issue.  Do me a favour, please, and run this script:

Code: Autohotkey [Select]
  1. myText =
  2. (
  3. n
  4. q
  5. p
  6. r
  7. a
  8. e
  9. )
  10.  
  11. Sort, myText
  12.  
  13. MsgBox, % myText

Please let me know the results.  Thank you.
Title: Re: sorting units of text
Post by: tomos on July 02, 2012, 02:11 PM
btw, I am wondering if donation coder is the right forum for discussing AHK... seems like the ahk forums might solve the problem faster at this point... just an idea though.
I think most of the "coding snacks" here are made using AHK.
Skwire's a meister at any rate ;-)


BTW I'm using a German keyboard with an English OS (7).
If it's any help I can test something - just say the word - keeping in mind you'll have to tell me exactly what to do :-[ :)
Title: Re: sorting units of text
Post by: me_7834539 on July 02, 2012, 04:29 PM
hello, it will show a message box displaying aenpqr. so yes, it starts with a, if this is what you wanted to find out. :)
Title: Re: sorting units of text
Post by: me_7834539 on July 05, 2012, 04:51 PM
@skwire
just out of curiosity, did you find out why the sorting does not work for the first line...?

oh, and, from my point of view, the script did not seem completed because of this "bug", when in fact, from your pov it was. so, since this is just a minor annoyance, I think it's time to say a huge thank you for the script which essential solved my problem! thank you!!!!!!!  :-*
Title: Re: sorting units of text
Post by: skwire on July 05, 2012, 05:49 PM
@skwire
just out of curiosity, did you find out why the sorting does not work for the first line...?

Apologies for the delay.  I did some more testing and I think I know what the problem is.  I'm going to assume that are using the regular vanilla AutoHotkey build.  Can you install the AutoHotkey_L build and then try the cat-parser.ahk script, please?

http://l.autohotkey.net/
http://l.autohotkey.net/AutoHotkey_L_Install.exe
Title: Re: sorting units of text
Post by: me_7834539 on July 05, 2012, 05:59 PM
hello, thanks a lot. I will be installing it on a test machine on sunday. I don't like overwriting (resp. later reinstalling) the vanilla version, due to the compatiblity warnings. (I can't understand why this replaces AHK. they should make a separate installtion and define a separate extension, like ".ahkl" or so. But I may be missing s.th.)
Title: Re: sorting units of text
Post by: me_7834539 on July 06, 2012, 08:20 AM

hello,

i did tests with

(a) the vanilla install
(b) the l version, unicode
(c) the l version, ansi

and two snippets of text which are different only due to line breaks before and after the text;
text start immediately after/before the "<<<"/">>>" delimiters.

(1)
>>>>>>>>


qqq   aldkjf

baa   aldfkjad

abb   aldjfaldfj

<<<<<<<<

(2)
>>>>>>>>>>>qqq   aldkjf

baa   aldfkjad

abb   aldjfaldfj<<<<<<<<


results, reproducible:

(a)(1) wrong sorting
(b)(1) wrong sorting
(c)(1) wrong sorting
(a)(2) wrong sorting
(b)(2) correct sorting
(c)(2) correct sorting

wrong sorting = first line wrongly stays on top, following lines are sorted correctly

editor used = notepad.

personally, I am fine with the script as it is, despite the minor inconvenience. perhaps it would be more efficient
to ask for help in the ahk forums, at this point, really, if you are curious what the problem is...  

For the record, i am using a (German) Sony Vaio X.

(re)installing ahk l version / vanilla version seemed, and was in fact, very easy and unproblematic, so i didnt wait for the test machine. :)

ps. i might not find time to reply/do more tests these days. thx for understanding.
Title: Re: sorting units of text
Post by: skwire on July 06, 2012, 08:55 AM
Right on.  That does confirm my hunch that the AHK_L Unicode version would work properly.  Thanks very much for testing.  =]

personally, I am fine with the script as it is

If you're happy, I'm happy.   :D
Title: Re: sorting units of text
Post by: me_7834539 on July 06, 2012, 09:26 AM
the L ansi version works properly, too. and both unicode and ansi version do not work properly in described case...
I'd like to ask an important question: Do you think it's likely that your script will (on autohotkey vanilla) produce even more problems, other than the first-line-sorting bug? Would you find it interesing if i re-posted my tests in the ahk forum? I would do so, if you think it's a good idea.
Title: Re: sorting units of text
Post by: skwire on July 06, 2012, 09:36 AM
The script is super simple so I don't foresee any future issues.  It doesn't matter to me, but you're welcome to post it on the AHK forums.
Title: Re: sorting units of text
Post by: me_7834539 on July 06, 2012, 09:41 AM
great!!

i posted the "bug" here
http://www.autohotkey.com/community/viewtopic.php?f=1&t=88471

so, let's see if anyone knows the answer!