topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday March 28, 2024, 5:38 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: sorting units of text  (Read 19337 times)

me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
sorting units of text
« on: July 01, 2012, 11:56 AM »
Hello,

I usually type all kinds of information into plain text files, categorizing every memo/info/todo with a simple tag like e.g. "car" for car-related stuff, or hou for "house" related stuff etc. I also type current todo items into the same text. the text may look like this:
>>
cat3[tab]blablabla
[tab](..continued blabla

cat2[tab]blabla

cat3[tab]blabla

cat1[tab]blabla....
[tab]continued blabla.....
[tab]continued blabla.......

cat2[tab]blabla
<<

after some time, I have a huge text with unsorted units of text. I would like to sort the entries alphabetically/numerically so that it looks like this:


>>
cat1[tab]blabla

cat2[tab]blabla

cat2...

cat3...

cat3...
<<

After an hour of research, I found the only program capable of doing this was an online service, http://sortmylist.com/
This service has the feature to define blank lines as a separator. (However, they seem to send the info over the
internet, which I object to.)

(even more helpful would be "a new item starts with a linebreak that is directly
followed by up to 7 alphanumeric characters, that are again followed by another tab".
this would not require a blank line between items, or any other manually added separator.)

perhaps the program could simply work as a one-click-program (or one shortcut, like win-Q or so), converting the
current contents of the clipboard. text should not be altered, i.e. special /international characters and tabs etc.
should all be retained.

I think I am problably not the only person who simply types their stuff into a text, with a tag + tab to
at least categorize it a little. I am aware there are programs that will do this, but using plain text and
a lightweitht editor has so many advantages.

-> my question: can anyone please point me to a locally installable program that can do this? I was unable to find one.
Or, does anyone feel this might be a coding idea worth pursuing?

Thanks!

eleman

  • Spam Killer
  • Supporting Member
  • Joined in 2009
  • **
  • default avatar
  • Posts: 413
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #1 on: July 01, 2012, 12:24 PM »
Check out Cleanhaven

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #2 on: July 01, 2012, 01:21 PM »
Instead of a plain text file you may consider using TreePadLite

If you are using text only then the free version should do all you need. The paid versions can do things like embed graphics.  In the free version hyperlinks are not underlined, but if you put the caret on one and hit Control-h it will browse to the url.

The "cat1" you use as tags would instead be nodes in the tree and show on the left.  Here's a screen shot

TreePad.jpg


Of course all the work would be transfer of the plain text to the TreePad file. But they do publish their format, at least for the TreePadLite plain text.  They also have links to 3rd party software that works with TreePad. Most of it is paid though.

edit: initially you may have to do a lot of cut & paste.  But there is a button in the toolbar to Sort a subtree.   You can add subtree nodes as you go and paste in the text, then just hit the Sort button to put the nodes in order.

edit2: here's the spec for TreePad file format:
http://www.treepad.c.../docs/fileformat.txt

To create a file of nodes with a paragraph of text for each node is not difficult.  In TreePadGen the text for each node was blank. I just got all the filenames in a folder and added a node for each. You could do something similar by detecting "cat1" tags and then adding the associated paragraph.  For a bunch of nodes on the same level it's not hard to add the file markers between each node in a loop.  The tough part would likely be digging out the tags, then determining when the text was finished. Handling dupes might also be an issue.

It's one approach.  Also supposedly other note taking type apps can import TreePad files if you change in the future. (I say "supposedly" because I didn't spend any money to test any of the paid apps that claim to be able to import TreePad. Just standard skepticism if I haven't tried it myself.) :)
« Last Edit: July 01, 2012, 02:41 PM by MilesAhead »

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #3 on: July 01, 2012, 03:54 PM »
Do you really want what you said, and sort the items, or do you just want to find them using the categories as keywords/tags?  If the latter, you might look at NoteFrog.  The current version can import files delimited with markers.

If you really want to keep your text organised, then you might try a single-pane outliner like Noteliner.  It's free, but .NET if that bothers you, and uses its own file format.

Otherwise, I concure with MilesAhead, though I normally reach first for MemPad.  MemPad has particularly nice input/output from/to delimited files.  It shouldn't be too hard to massage your files into forms that NoteFrog or MemPad can import.

FWIW, following MilesAhead's comment, I do know that Ultra Recall Standard or Professional can import Treepad Lite .HJT files.  Some of the other outliners, like RightNote and AllMyNotes, only accept KeyNote .KNT files, but KeyNote is free and can import .HJT files itself, so you can convert in two stages.

me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #4 on: July 01, 2012, 03:58 PM »
thanks a lot for your input so far.

however, cleanhaven apparently cannot do what I described. Treepad seems like nice idea, even though this could be implemented even better with a tabbed notepad like e.g. notepad++ and a bunch of auto-loaded text-files. problem is, my method has these advantages:
- having just one plain text file and sorting the items at a later point in time enables you to have items written in the last days in one place, and not scattered around lots of tab-textx resp. tree folders.
- it is possible to type some text and only later categorize it.
- having just one plain text file gives you a maximum of options to choose the text editor that you like best.
- you need not use the mouse/need not navigate
- etc.pp.

:)

me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #5 on: July 01, 2012, 04:36 PM »
(Once again thank you for your comments.)

| Do you really want what you said, and sort the items, or do you just want to find them using the categories as keywords/tags? 

It's about sorting. In the end, I want a category  bunches of e.g. 15 - 30 text items each.

| If the latter, you might look at NoteFrog.  The current version can import files delimited with markers.

Seems similar to Treepad... same problems therefore...

| Noteliner.  It's free, but .NET if that bothers you, and uses its own file format.

Doesn't seem to be able to do what I described, either...

| MemPad. 

dito.

| Ultra Recall , RightNote, AllMyNote.  KeyNote.

dito.dito.dito.dito...  :)

however, in general these are nice recommendations. I did not know of all of these. Thanks.

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #6 on: July 01, 2012, 04:40 PM »
Try vim

I don't know of any other free Windows text editors that can match it when it comes to manipulating paragraphs. But you'd have to absorb the learning curve.

The "sort it afterwards" part of your plan seems to be the sticking point.

As far as free form text entry in TreePad I would create a node for it.  Perhaps "Uncategorized" and just type paragraphs in that node.

A programmable editor such as vim seems to be the best chance of unscrambling the current mish mash.

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #7 on: July 01, 2012, 05:24 PM »
Vim has a page for user contributed scripts. You may find one you can modify to your scheme

http://www.vim.org/s...t_search_results.php


skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #8 on: July 01, 2012, 06:13 PM »
Hi, me_7834539, and welcome to DonationCoder.  Are you familiar with AutoHotkey?

me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #9 on: July 01, 2012, 06:31 PM »
hello! cool, so much help! thanks!

thanks for pointing to vim. that's interesting. I couldn't find a script there that does what I described (though I might have missed one), but I might find some more help there if this thread won't yield an answer. Notepad++ plugins might also be a resource, though there are no plugins that can do paragraph sorting either.

Yes, I am actually using AHK for very simple tasks. (I can't code, obviously.) I was thinking of posting my question there, as well, but I remembered this place here as very friendly... that's why I came back here first :-* ;)


me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #10 on: July 01, 2012, 06:48 PM »
had the idea of googling for "paragraph sorting".
that has yielded these:
http://www.kahrel.pl...m/indesign/sort.html
http://www.linuxmisc...b7ccbdb46ef0b9ce.htm
http://backreference...orting-by-paragraph/

in general, much of this seems to come close...

just as a side note, I'd spend 30-50 USD on a solution...

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #11 on: July 01, 2012, 06:57 PM »
Yes, I am actually using AHK for very simple tasks.

Good, I was just making sure before posting a simple script that should accomplish what you want.  I'd normally just paste the script into a codebox right into this post but the code uses a special character that our forum keeps converting erroneously.  Grab the actual code from this link:

http://skwire.dcmembers.com/apps/snacks/cat-parser.ahk

It's just a simple hotkey script so you can merge it into your own scripts or save it out as a stand-alone script.  Once you have it how you like, copy your category text to the clipboard, run the hotkey (default Win+1, or however you incorporate it), and the text should be sorted and placed back on your clipboard.  Let us know how it goes.  Thanks.


MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #12 on: July 01, 2012, 07:09 PM »
Another approach may be using a pattern matching language/program. One that's pretty easy to pick up is awk/nawk/gawk .. you can find various free implementations of awk for Windows. In a nutshell it operates on a line of text at a time. When you encounter a pattern, such as some particular thing found in the line like a Tab, colon, substring or whatever, you can set a variable to indicate doing some particular action until the state changes. For example if you find a start tag, then do such and such with the following lines of text until some terminating condition(an empty line for example) is encountered.

It's so long since I used it I totally forget the syntax.  But it's really designed for sifting through text input, rearranging or skipping certain portions, and then writing the output.

Plus you can find a ton of "awk one liners" on the web.  There have to be zillions of canned scripts for awk out there for free.

http://www.pement.org/awk/awk1line.txt


me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #13 on: July 02, 2012, 07:59 AM »
milesahead, thanks for pointing to awk!

however, skwire's quick ahk script actually works, that's so great, thank you, thank you! (you've got email from paypal, skwire).

I would never have thought that the script would be soo short!

I still have some questions / remarks:

- the script seemed to work perfectly, except for the very first item. the very first item always stayed on top (like this: dfbaacbc -> daabbccf).

- for your information, win-1 is a shortcut that by default is registered by windows itself, to switch to the first taskbar application tab. I had to change it (e.g. to win-i).

- I have not yet thoroughly tested the script, but so far it seems to work very nicely. I did a test if any characters (or tabstops) are lost (German keyboard, all keys incl. using modifiers shift and alt-gr), but everything seemed fine.


me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #14 on: July 02, 2012, 08:32 AM »
oh... and would it be difficult to add to the script that blank lines be removed after the sorting?  :-[ :Thmbsup:

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #15 on: July 02, 2012, 08:44 AM »
- the script seemed to work perfectly, except for the very first item. the very first item always stayed on top (like this: dfbaacbc -> daabbccf).

That's odd as I didn't experience this in my, albeit minimal, testing.  Can you PM me a short sample text file that this happens with?

- I have not yet thoroughly tested the script, but so far it seems to work very nicely. I did a test if any characters (or tabstops) are lost (German keyboard, all keys incl. using modifiers shift and alt-gr), but everything seemed fine.

If you run into situations where this occurs, you will probably have to switch to AutoHotkey_L which handles Unicode.

oh... and would it be difficult to add to the script that blank lines be removed after the sorting?  :-[ :Thmbsup:

Sure, can do. Did you mean ALL blank lines, e.g.,:

cat1    Here is some cat1 text.
        Here is some continued cat1 text.
cat2    Here is some cat2 text.
        Here is some continued cat2 text.
        Here is some more continued cat2 text.
        Here is even more continued cat2 text.
cat3    Here is some cat3 text.
        Here is some continued cat3 text.

me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #16 on: July 02, 2012, 09:37 AM »
hello jody,

what do you mean by "all" blank lines? only one "type" of blank lines would have been possible in the text example you gave, which are blank lines that separate each item from another.

different "types" of blank lines are, afaics, only thinkable if there are several items of the same category involved (which typically is the case, of course). if this is what you meant, here's an example:

this is original, non-transformed text:

(note: the line numbering is not part of the text, obviously)
Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3.  
  4. cat3    Here is some cat3 text.
  5.         Here is some continued cat3 text.
  6.  
  7. cat2    Here is some cat2 text.
  8.         Here is some continued cat2 text.
  9.         Here is some more continued cat2 text.
  10.  
  11. cat2    Here is ANOTHER cat2 item.

this is what your original script currently does:

Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3.  
  4. cat2    Here is ANOTHER cat2 item.
  5.  
  6. cat2    Here is some cat2 text.
  7.         Here is some continued cat2 text.
  8.         Here is some more continued cat2 text.
  9.  
  10. cat3    Here is some cat3 text.
  11.         Here is some continued cat3 text.

here is what I meant by removing blank lines:

Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3. cat2    Here is ANOTHER cat2 item.
  4. cat2    Here is some cat2 text.
  5.         Here is some continued cat2 text.
  6.         Here is some more continued cat2 text.
  7. cat3    Here is some cat3 text.
  8.         Here is some continued cat3 text.

here is what is it would look like if it differentiated between blank lines that sperated items, versus seperating blocks of same-category-items. I suppose this would be a bit difficult to code and is not really necessary, though it would be neat, of course.

Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3.  
  4. cat2    Here is ANOTHER cat2 item.
  5. cat2    Here is some cat2 text.
  6.         Here is some continued cat2 text.
  7.         Here is some more continued cat2 text.
  8.  
  9. cat3    Here is some cat3 text.
  10.         Here is some continued cat3 text.

(side rematk: I am now outing myself as a total idiot, I just noticed that the script temporarily replaces blank lines by §, I am curious as to why this does not break text that originally contains §.)

and here is an example of text where your script will not sort the first item:

Code: Text [Select]
  1. qqq     this is a test line (q)
  2.  
  3. aaa     this is a test line (a)
  4.  
  5. ccc     this is a test line (c1)
  6.  
  7. bbb     this is a test line (b)
  8.  
  9. ccc     this is a test line (c2)
  10.  
  11.  
  12. -------------------------------
  13.  
  14. here's the faulty result:
  15.  
  16. qqq     this is a test line (q)
  17.  
  18. aaa     this is a test line (a)
  19.  
  20. bbb     this is a test line (b)
  21.  
  22. ccc     this is a test line (c1)
  23.  
  24. ccc     this is a test line (c2)

thanks for your efforts!!

matthias

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #17 on: July 02, 2012, 10:21 AM »
here is what I meant by removing blank lines:

Code: Text [Select]
  1. cat1    Here is some cat1 text.
  2.         Here is some continued cat1 text.
  3. cat2    Here is ANOTHER cat2 item.
  4. cat2    Here is some cat2 text.
  5.         Here is some continued cat2 text.
  6.         Here is some more continued cat2 text.
  7. cat3    Here is some cat3 text.
  8.         Here is some continued cat3 text.

Yep, I was just making sure you wanted it this way.

(side rematk: I am now outing myself as a total idiot, I just noticed that the script temporarily replaces blank lines by §, I am curious as to why this does not break text that originally contains §.)

If that character is in the original text, it should break the script.  I chose that character since it's rarely used.  However, if you do happen to use it on a regular basis, we can change it to some other, rarely-used character.

and here is an example of text where your script will not sort the first item:

That's very strange as it sorts fine for me.   :huh:  You could try changing the sorting line in the script this:

    Sort, myText, CL D§

thanks for your efforts!!

You're welcome.  Thanks very much for the donation.   :D

me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #18 on: July 02, 2012, 11:57 AM »
hello, the script's behavior is all a bit too voodoo to me... hmm. first of all:

this:
qqq   testtest

ppp   testtest

will sort correctly for you? that's really strange. I am using windows notepad and win-i as a shortcut; win 7 32bit (German). autohotkey recently installed.

(side remark, I suppose the capital D in your suggestion is of no importance, I seem to get identical results). so, adding CL yields lots of errors:

qqq   test1a

qqq   test1b

aaa   test2

ppp   test3

....is turned into......:

§aaa   test2

ppp   test3

qqq   test1bÂqqq   test1a

however, note that aaa is now the first item.

what is strange, too: text is NOT broken when it contains "§". (I tested this in your original script, not in the one with CL added).


me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #19 on: July 02, 2012, 12:00 PM »
btw, I am wondering if donation coder is the right forum for discussing AHK... seems like the ahk forums might solve the problem faster at this point... just an idea though.

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #20 on: July 02, 2012, 02:07 PM »
hello, the script's behavior is all a bit too voodoo to me... hmm. first of all:

this:
qqq   testtest

ppp   testtest

will sort correctly for you?

Yep, works perfectly here.  I wonder if this is a "German OS versus English OS" issue.  Do me a favour, please, and run this script:

Code: Autohotkey [Select]
  1. myText =
  2. (
  3. n
  4. q
  5. p
  6. r
  7. a
  8. e
  9. )
  10.  
  11. Sort, myText
  12.  
  13. MsgBox, % myText

Please let me know the results.  Thank you.

tomos

  • Charter Member
  • Joined in 2006
  • ***
  • Posts: 11,959
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #21 on: July 02, 2012, 02:11 PM »
btw, I am wondering if donation coder is the right forum for discussing AHK... seems like the ahk forums might solve the problem faster at this point... just an idea though.
I think most of the "coding snacks" here are made using AHK.
Skwire's a meister at any rate ;-)


BTW I'm using a German keyboard with an English OS (7).
If it's any help I can test something - just say the word - keeping in mind you'll have to tell me exactly what to do :-[ :)
Tom

me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #22 on: July 02, 2012, 04:29 PM »
hello, it will show a message box displaying aenpqr. so yes, it starts with a, if this is what you wanted to find out. :)

me_7834539

  • Participant
  • Joined in 2010
  • *
  • default avatar
  • Posts: 33
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #23 on: July 05, 2012, 04:51 PM »
@skwire
just out of curiosity, did you find out why the sorting does not work for the first line...?

oh, and, from my point of view, the script did not seem completed because of this "bug", when in fact, from your pov it was. so, since this is just a minor annoyance, I think it's time to say a huge thank you for the script which essential solved my problem! thank you!!!!!!!  :-*

skwire

  • Global Moderator
  • Joined in 2005
  • *****
  • Posts: 5,286
    • View Profile
    • Donate to Member
Re: sorting units of text
« Reply #24 on: July 05, 2012, 05:49 PM »
@skwire
just out of curiosity, did you find out why the sorting does not work for the first line...?

Apologies for the delay.  I did some more testing and I think I know what the problem is.  I'm going to assume that are using the regular vanilla AutoHotkey build.  Can you install the AutoHotkey_L build and then try the cat-parser.ahk script, please?

http://l.autohotkey.net/
http://l.autohotkey....Hotkey_L_Install.exe