Welcome Guest.   Make a donation to an author on the site September 01, 2014, 06:23:13 PM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
Check out and download the GOE 2007 Freeware Challenge productivity tools.
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1] 2 Next   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: Finished Coding Snack: Separate .txt file at any specified punctuation mark  (Read 11068 times)
Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« on: November 24, 2011, 01:09:25 PM »

I need to be able to separate text at any specified punctuation mark onto it's own line and then insert a blank line between it and the next separation.  If this can be done simply (I only use a computer, I have no idea of how they do what they do).  I assume this can be done via something like the find & replace box.  I would be happy to make a decent donation if anyone can devise a way of doing this (and tell me how to use it).

Quidnunc  
« Last Edit: November 28, 2011, 09:03:52 PM by mouser » Logged
rjbull
Charter Member
***
Posts: 2,749

View Profile Give some DonationCredits to this forum member
« Reply #1 on: November 24, 2011, 04:35:36 PM »

Could you give a before-and-after example of what you mean, with dummy text if necessary?
Logged
skwire
Moderator
*****
Posts: 4,038



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #2 on: November 24, 2011, 04:40:41 PM »

Also, which text editor are you using?
Logged

AbteriX
Charter Honorary Member
***
Posts: 1,049


Member #520

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #3 on: November 25, 2011, 01:40:36 AM »

You can do that by regular expression f.ex.

FROM:
I need to be able to separate text at any specified punctuation mark onto it's own line and then insert a blank line between it and the next separation.  If this can be done simply (I only use a computer, I have no idea of how they do what they do).  I assume this can be done via something like the find & replace box.  I would be happy to make a decent donation if anyone can devise a way of doing this (and tell me how to use it).

TO:
Quote
I need to be able to separate text at any specified punctuation mark onto it's own line and then insert a blank line between it and the next separation.
If this can be done simply (I only use a computer,
I have no idea of how they do what they do).
I assume this can be done via something like the find & replace box.
I would be happy to make a decent donation if anyone can devise a way of doing this (and tell me how to use it)

USE:
[X] Regular Expression

Search for: (.+?)(\.|,)\s*

With EmEditor replace with \1\2\n
With HippoEDIT replace with $1$2\n


Explanation:
(.+?) means: search one-or-more of any sign, non-greedy, and store that in group no. 1
(\.|,) means: search an literal dot "\."  OR an coma, and store that in group no. 2
\s* means: match none-or-more space(s), we don't store that match but drop them (if any)

Then we replace with what is in group 1 by using \1 or $1, that's the matched sentence.
Then we replace with what is in group 2 by using  \2 or $2, that's the matched punctuation mark.
Then we add an line break or two by using \n

Please note that this may not work like that with all editors. The regex implementation is slightly different between different editors.


To get what you want:
Quote
I need to be able to separate text at any specified punctuation mark onto it's own line and then insert a blank line between it and the next separation.

If this can be done simply (I only use a computer,

I have no idea of how they do what they do).

I assume this can be done via something like the find & replace box.

I would be happy to make a decent donation if anyone can devise a way of doing this (and tell me how to use it)


just USE two '\n'




HTH?  cheesy
Logged

Greetings, Stefan.
Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« Reply #4 on: November 25, 2011, 10:28:03 AM »

Hi
Thanks for the replies.
Abterix: You have suggested using Regular Expressions, however, I only have the vaguest idea as to what a regular expression is and how it works. To use it I would guess I would have to have some software installed (you mention EmEditor & HippoEDIT which no doubt I would have to buy and learn how to use).

RJBull: Here is an example.
Before
A breeze ruffled the neat hedges of Privet Drive, which lay silent and tidy under the inky sky, the very last place you would expect astonishing things to happen. Harry Potter rolled over inside his blankets without waking up. One small hand closed on the letter beside him and he slept on, not knowing he was special, not knowing he was famous, not knowing he would be woken in a few hours' time by Mrs Dursley's scream as she opened the front door to put out the milk bottles, nor that he would spend the next few weeks being prodded and pinched by his cousin Dudley ...He couldn't know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: To Harry Potter - the boy who lived!'

After
A breeze ruffled the neat hedges of Privet Drive, which lay silent and tidy under the inky sky, the very last place you would expect astonishing things to happen.
 
Harry Potter rolled over inside his blankets without waking up.

I might, on occasion, also need to chop this next long sentence into meaningful grammatical units at each comma.
 
One small hand closed on the letter beside him and he slept on, not knowing he was special, not knowing he was famous, not knowing he would be woken in a few hours' time by Mrs Dursley's scream as she opened the front door to put out the milk bottles, nor that he would spend the next few weeks being prodded and pinched by his cousin Dudley ...
He couldn't know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: To Harry Potter - the boy who lived!'

I hope that explains what I want to do.

Skywire: At the moment I use a concordancer, I also use an online text analyser for counting #sentences and other statistical info and (from this website) a line numberer all of which are very useful. Before I use these tools I would like to prepare the text by splitting into individual sentences (or clauses) each on its own line and, if at all possible, to number them.



Logged
timns
Supporting Member
**
Posts: 1,209



Veni, vidi, debuggi

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #5 on: November 25, 2011, 11:45:08 AM »

It would be extremely easy to roll a tiny program to read a .txt file, split according to AbteriX's regex rules and spit out a new version with the line breaks.

At the moment if your text contained (say) numbers with decimal points, they would also be split by that expression. But it would not be difficult to cater for that, along with perhaps splitting on ?, ! etc.
« Last Edit: November 25, 2011, 11:51:08 AM by timns » Logged

Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« Reply #6 on: November 25, 2011, 12:14:06 PM »

Hi Timns

By a small program do you mean something like a dialogue box in which I could paste some text and have it do what I ask.  If so that would be wonderfull.  I know this will sound terrible and I don't want to seem unappreciative but the less work I have to do the better because as already stated, I only use a computer I have very little understanding of the way they work or the associated language.
Logged
timns
Supporting Member
**
Posts: 1,209



Veni, vidi, debuggi

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #7 on: November 25, 2011, 12:27:19 PM »

Hi Timns

By a small program do you mean something like a dialogue box in which I could paste some text and have it do what I ask.  If so that would be wonderfull.  I know this will sound terrible and I don't want to seem unappreciative but the less work I have to do the better because as already stated, I only use a computer I have very little understanding of the way they work or the associated language.

That's exactly what I mean. The heavyweight alternative is that you can already do this in Word and many text editors, but if you want a small snack-sized solution then you shall have one - it's what DC is all about!

Unless anyone else steps up before the weekend, I'll have a go at this. If anyone else DOES decide to have a go, please let me know   Thmbsup
Logged

Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« Reply #8 on: November 25, 2011, 12:45:55 PM »

Hi again

I will look forward to seeing this and will be forever grateful to you, and a donation will be made thereafter.  Some members might wonder why I want such a tool; well I spend a lot of time analysing texts for meaning and understanding (as distinct from just reading it).  I am currently looking at 'Harry Potter and The Philosophers Stone', building a time-line, discovering inconsistencies etc.

Quidnunc
Logged
AbteriX
Charter Honorary Member
***
Posts: 1,049


Member #520

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #9 on: November 25, 2011, 01:27:02 PM »

Abterix: You have suggested using Regular Expressions, however, I only have the vaguest idea as to what a regular expression is and how it works.
Just google for 'regular expression wiki'

Quote
To use it I would guess I would have to have some software installed (you mention EmEditor & HippoEDIT which no doubt I would have to buy and learn how to use).
Not necessary, PSPad can do it also and is freeware.

But an dedicated app would be much more nifty of course.

BTW, our DonationCoder "Clipboard Help and Spell" can do it as well:
(here with an additional feature idea: 'split long lines without punctuation too at char number n' )

Logged

Greetings, Stefan.
Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« Reply #10 on: November 25, 2011, 04:10:07 PM »

AbertriX

As you will no doubt know now, Timns has kindly offered to write  a small program to do exactly what I need based on the information I have given (as opposed to using an existing program which has far more features than I need, want or understand).

However I thank you for your time and responses, it is indeed heartening to find that there are people willing and able to devote their time and energies helping a stranger solve a problem.

Quidnunc   
Logged
rjbull
Charter Member
***
Posts: 2,749

View Profile Give some DonationCredits to this forum member
« Reply #11 on: November 25, 2011, 04:33:51 PM »

@Quidnunc: If you don't know it already, you might like to check out TextSTAT - Simple Text Analysis Tool (freeware):
Quote
Concordance software for Windows, GNU/Linux and MacOS

TextSTAT is a simple programme for the analysis of texts. It reads plain text files (in different encodings) and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. This version includes a web-spider which reads as many pages as you want from a particular website and puts them in a TextSTAT-corpus. The new news-reader, too, puts news messages in a TextSTAT-readable corpus file.
TextSTAT reads MS Word and OpenOffice files. No conversion needed, just add the files to your corpus...
In TextSTAT you can use regular expression which provides you with powerful search possibilities. The programme is multilingual. Because it uses Unicode internally, TextSTAT can cope with many different languages and file encodings.

@timns: are you thinking of making checkboxes for each punctuation mark, so Quidnunc can build regular expressions without having to understand them?
Logged
timns
Supporting Member
**
Posts: 1,209



Veni, vidi, debuggi

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #12 on: November 25, 2011, 04:46:36 PM »

I'll definitely have some way of controlling which characters cause a line-break. There are rather a lot of candidates when one starts looking: '.,?!"-:; and of course ♫ 
Logged

timns
Supporting Member
**
Posts: 1,209



Veni, vidi, debuggi

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #13 on: November 26, 2011, 02:04:38 PM »

How's this:



You just have to enter the punctuation or other characters at the top, and dump your text into the upper pane. Clicking go produces output as shown.

I added some convenience buttons for clearing, copying and pasting.
Logged

timns
Supporting Member
**
Posts: 1,209



Veni, vidi, debuggi

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #14 on: November 26, 2011, 03:01:33 PM »

Later I'll put this in my coding area, with a proper little write-up, but for now this URL:

http://head-in-the-clouds.com/game/Scissors.jnlp

... will launch the program for you. You may get a warning that it's signed by an unknown publisher. That's ok, it's me  Thmbsup

Instructions:

1. Select and copy the text that is to be re-formatted so it's in your PC's (or Mac or Linux box) clipboard
2. Click the 'Paste' button or use Ctrl+V in the top window
3. Click the 'Go' button
3a. Go "ooh" and "aah" in admiration (optional step)
4. Select a subset of the results, or leave everything unselected (which gets you all text in the results area), and click the 'Copy' button - your formatted text is now available on the clipboard to do whatever-it-is-you-need-to-do with it.


« Last Edit: November 26, 2011, 03:12:12 PM by timns » Logged

cranioscopical
Friend of the Site
Supporting Member
**
Posts: 4,167



see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #15 on: November 26, 2011, 09:27:27 PM »

Nice work, timns, and at that price it's a snip!
 
Logged

Chris
Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« Reply #16 on: November 27, 2011, 05:04:40 AM »

Timns, you are a wonderful person and so clever too!  Your program is GREAT and does exactly what I wanted and will save me loads of time.  To show my appreciation, both to you and to the others who's efforts I have taken advantage of, I have made a donation.  I'm not sure how it affects you personally but I hope you get some benefit from it.

My grateful thanks to all those in general who do this work and to you in particular.

Quidnunc
Logged
mouser
First Author
Administrator
*****
Posts: 33,294



see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #17 on: November 27, 2011, 08:33:01 AM »

It's fun to read threads like this!

ps. In case you haven't figured it out yet, you can click on the gold coin under Timns name to send him some of your donation -- and I encourage you to do so.
Logged
anandcoral
Honorary Member
**
Posts: 221



see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #18 on: November 27, 2011, 09:44:57 AM »

This what DC is for  cheesy

Thanks @Timns for keeping the DC flag high  Thmbsup

Regards,

Anand
Logged
timns
Supporting Member
**
Posts: 1,209



Veni, vidi, debuggi

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #19 on: November 27, 2011, 11:04:37 AM »

Ah ha! Well that's excellent news and I greatly appreciate the feedback and kind comments...

... and the generous donation!  Thank you kindly Thmbsup

If you need any tweaks or twiddles just let me know.
Logged

Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« Reply #20 on: November 28, 2011, 07:42:39 AM »

Timns

If you are happy to do a tweak then could Iask for this to be included:

After splitting  the text into sentences etc, to:
1) number each complete unit (a unit being the separated text, even if it wraps to another line i.e. only complete units to be numbered but not wrapped lines.
2) put a space between the number and start of text. 


Nit-picking I know but if it is possible without too much trouble then it would be even better than it already is.
Logged
timns
Supporting Member
**
Posts: 1,209



Veni, vidi, debuggi

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #21 on: November 28, 2011, 10:15:19 AM »

Extremely simple to do, in fact, so no problem.

Literally like this?

1 blah blah blah blah blah,

2 blah BLAH blah blah-blah.

3 bla blah blah Blahhhhhh blah <long line>
blah blah blah.

4 and blah?

How would you like the numbers formatted?
Logged

Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« Reply #22 on: November 28, 2011, 10:40:57 AM »

Yes, that is exactly how I would like it to appear but I'm not sure what you mean by 'formatting the numbers'.  Do you mean just as numbers-1 2 3....or in some other way-e.g. Roman numerals?  If so just plain numbers will be fine.
Logged
timns
Supporting Member
**
Posts: 1,209



Veni, vidi, debuggi

see users location on a map View Profile WWW Read user's biography. Give some DonationCredits to this forum member
« Reply #23 on: November 28, 2011, 10:57:50 AM »

Yes, that is exactly how I would like it to appear but I'm not sure what you mean by 'formatting the numbers'.  Do you mean just as numbers-1 2 3....or in some other way-e.g. Roman numerals?  If so just plain numbers will be fine.

Some examples, all of which would be equally easy:

1 sentence
1. sentence
1/ sentence
(1) sentence
001 sentence
001 <tab character> sentence

etc. etc.
Logged

Quidnunc
Supporting Member
**
Posts: 16


View Profile Give some DonationCredits to this forum member
« Reply #24 on: November 28, 2011, 11:04:29 AM »


001. sentence would be great (as there could be 000's of sentences scope to accommodate this, say up to five figures, would be good).

Trevor
Logged
Pages: [1] 2 Next   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.051s | Server load: 0 ]