topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday December 13, 2024, 11:55 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: IDEA - how to capitalise the first word of every line  (Read 17447 times)

stormproof

  • Participant
  • Joined in 2013
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
IDEA - how to capitalise the first word of every line
« on: December 11, 2013, 07:07 AM »
I'm creating a hymn book.  The book I scanned and OCR'd had all the songs as blocks, with capitals only at the beginning of a sentence.
Whereas I want to put a capital at the start of every line.

There are just under a thousand songs so I don't want to do it by hand!!

Any ideas?  I've looked on line and can't find anything except via MS Word, which can be configured to do it after every carriage return.
But as far as I can see, only as you type.

Is there a way to replace every "carriage return" with "carriage return and capitalise the next letter"?

TaoPhoenix

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 4,642
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #1 on: December 11, 2013, 07:53 AM »
Hallo Stormproof!

Kingsoft writer may have some ways to help.

(P.S. these "fast posted screen shots" are from Chris Gingerich's NANY this year! This is what it's for!)

So suppose you have this hymn fragment etc.

oh lord
tell me to stay strong
let me hold to thy name

In Kingsoft Writer

http://i.imm.io/1m26C.png

KSoft Format ChangeCase.png

TaoPhoenix

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 4,642
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #2 on: December 11, 2013, 07:55 AM »
http://i.imm.io/1m271.png

Then you can experiment with the options.

KSoft Format ChangeCase2.png

So that hymn above becomes:

Oh lord
Tell me to stay strong
Let me hold to thy name


JoTo

  • Super Honorary
  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 236
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #3 on: December 11, 2013, 08:07 AM »
Or you can use Notepad++ (Freeware).

Notepad++ has a colum select mode (hold alt-key while selecting the appropriate text/columns). Then with the first column marked you can use menu TextFX->TextFX Characters->UPPERCASE.

HTH
JoTo

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #4 on: December 11, 2013, 10:30 AM »
There's a few sed answers with s/^\(.\)/\U\1/. GNU sed also has a \u directive that changes only the next letter to uppercase, so

sed 's/./\u&/'

Although if the first character on a line is a space, you won't see an uppercase letter, so

sed 's/[[:alpha:]]/\u&/'

from this page:
http://stackoverflow...-a-file-to-uppercase

This is conducive to batch. Once you test and find it works as expected then you can just loop through all the files and save them to the same file name with an additional extension.  For example file.txt the output could be file.txt.cap or whatever.  Once you verify the output has no errors then you can just rename the output files back to the original names.

There are free sed implementations for Windows.  Which you want to use generally is determined by whether you wish to do any shell programming such as with bash shell.  Also some need to have bash set up because special characters can conflict with those in Windows command prompts.  It's often easier just to run a bash command prompt to do the stream editing.

This particular task can probably be done fairly easily using awk.  A 'nawk' (for new awk) executable may not need the bash environment.  I'm not sure as there have been several Windows releases and i'm sure some changes in the Linux tools ported.  Last time I really played with them was in XP.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #5 on: December 11, 2013, 03:26 PM »
This particular task can probably be done fairly easily using awk.  A 'nawk' (for new awk) executable may not need the bash environment.  I'm not sure as there have been several Windows releases and i'm sure some changes in the Linux tools ported.
I've always used Gnu AWK for Windows, gawk.  Try Gawk for Windows.  The slightly older version I have doesn't need a shell.  I presume this version does what you need; I've found some variation between implementations in the past.

BigVent

  • Member
  • Joined in 2013
  • **
  • Posts: 36
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #6 on: December 11, 2013, 04:33 PM »
 Modify as needed.  :Thmbsup:

HTH's


Code: Autohotkey [Select]
  1.  
  2. ;//replace this string with your fileread, string, ...
  3. ;//FileRead, orig_str, C:\My Documents\My File.txt  ;//<== if this is used then the "join" just below this needs to be deleted
  4. ;//*from here*
  5. orig_str=
  6. (
  7. i'm creating a hymn book.  The book I scanned and OCR'd had all the songs as blocks, with capitals only at the beginning of a sentence.
  8. whereas I want to put a capital at the start of every line.
  9.  
  10. there are just under a thousand songs so I don't want to do it by hand!!
  11.  
  12. any ideas?  I've looked on line and can't find anything except via MS Word, which can be configured to do it after every carriage return.
  13. but as far as I can see, only as you type.
  14.  
  15. is there a way to replace every "carriage return" with "carriage return and capitalise the next letter"?
  16. )
  17. ;//*To here*
  18.  
  19. msgbox, Original text used:`n`n`n%orig_str%
  20.  
  21. loop, parse, orig_str, `n, `r
  22. {
  23.         if (A_LoopField != "")
  24.         {
  25.                 word_array := StrSplit(A_LoopField, A_Space)
  26.                                        
  27.                 str := word_array[1]
  28.                 StringUpper,str,str,T
  29.                                
  30.                 word_array[1] := str
  31.                
  32.                 loop % word_array.MaxIndex()
  33.                 {
  34.                         temp_var .= word_array[a_index] " "
  35.                 }
  36.                 temp_var .= "`n`n"
  37.         }
  38. }
  39.  
  40. msgbox, Final Result:`n`n`n%temp_var%
  41.  
  42. esc::
~BigVent
« Last Edit: December 11, 2013, 04:50 PM by BigVent »

Target

  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 1,832
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #7 on: December 11, 2013, 06:23 PM »
post removed cos I somehow duplicated it (WTF?)
« Last Edit: December 11, 2013, 07:06 PM by Target »

Target

  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 1,832
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #8 on: December 11, 2013, 06:24 PM »
this is a bit clumsy cos I just dashed it off quickly, but instead of entering the text into the script, loop through the files.  

And instead of looping through each line, just work with the requisite strings (should be faster)

#singleinstance, force
setbatchlines, -1

fileselectfolder, myfolder, *%a_workingdir%

loop, %myfolder%\*.txt,0,1
{
tooltip, %a_index% - %a_loopfilename%
newfile:= a_loopfilename
stringreplace, newfile, newfile, `., _.,all

loop,read, %myfolder%\%a_loopfilename%
{
stringleft, xx, a_loopreadline,1
stringupper, xx, xx
stringtrimleft, yy, a_loopreadline, 1
tmp_.= xx . yy . "`n"
}

fileappend, %tmp_%, %myfolder%\%newfile%
tmp_:=
}

input files must be text, and output will be to a new file ie output from SOURCEFILE.TXT will be to SOURCEFILE_.TXT

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #9 on: December 11, 2013, 08:36 PM »
The ahk solutions ignore the case of leading white space on a line.  One could use trim or regex to get around it.  But the advantage of stream editor is you don't have to mess with the file details.  Command line redirection inside a batch and you're done.  The only chore is coming up with the pattern replacement string(never my strong point.)  Plus I'm inclined to think it's resource efficient.  Open a command window and kick it off to run in background.

BigVent

  • Member
  • Joined in 2013
  • **
  • Posts: 36
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #10 on: December 12, 2013, 11:03 AM »
Notes:
  • We can only assume that the OP has one single file.
  • I ran this new script on a 105,400 line txt file.  It returned in 1800 ms. (txt file is 6525 KB)
  • Fixed leading white spaces... under the assumption that the OP had leading white spaces on lines.
  • Target's method used is a little faster. Since it's a "once and done" project either method should suffice.

Anyway, Good Luck stormproof!

Code: Autohotkey [Select]
  1. SetWorkingDir %A_ScriptDir%
  2.  
  3. ;// OP -> "The book I scanned and OCR'd had all the songs as blocks, with capitals only at the beginning of a sentence."
  4. ;// We can only assume it's ONE single file
  5. FileSelectFile, SelectedFile, 3, , Open a file, Text Documents (*.txt; *.doc)
  6. if SelectedFile =
  7. {
  8.     MsgBox, You didn't select a file.
  9.                 exitapp
  10. }
  11. else
  12. {
  13.         FileRead, orig_str, %SelectedFile%
  14.         loop, parse, orig_str, `n, `r
  15.         {
  16.                 if (A_LoopField != "")
  17.                 {
  18.                         word_array := StrSplit(A_LoopField, A_Space)
  19.                        
  20.                         ;// fixed leading whitespace but is ONLY an assumption that one exists
  21.                         ;// IF there's not a white space
  22.                         if (word_array[1] != "")
  23.                         {
  24.                                 str := word_array[1]
  25.                                 StringUpper,str,str,T
  26.                                 word_array[1] := str
  27.                         }
  28.                         else
  29.                         {
  30.                                 str := word_array[2]
  31.                                 StringUpper,str,str,T
  32.                                 word_array[2] := str
  33.                         }
  34.                        
  35.                         ;// Rebuild the string and append to temp_var
  36.                         loop % word_array.MaxIndex()
  37.                         {
  38.                                 if (word_array[a_index] = "")
  39.                                         continue
  40.                                 temp_var .= word_array[a_index] " "
  41.                         }
  42.                         temp_var .= "`n"
  43.                 }
  44.         }
  45. }
  46.  
  47. IfExist, Output_File.txt
  48.         FileDelete, Output_File.txt
  49. FileAppend, %temp_var%, Output_File.txt
  50.  
  51. Run, Output_File.txt
  52.        
  53. esc::
  54.         critical
  55.         exitapp
  56. return
~BigVent

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #11 on: December 12, 2013, 01:35 PM »
We can only assume that the OP has one single file.
Why would you jump to that conclusion?  He says he scanned a thousand songs.  I'd say it was a safer assumption that he has 1000 files.

sed was made for this kind of stuff.  Even back in Dos people were more comfortable with command line piping and redirection.. stringing tools together.  Since everything Gui people have to load it in a window or there's some discomfort.

Edit:  I downloaded the installer, the topmost download in the list, "Complete package, except sources" from here: http://gnuwin32.sour...net/packages/sed.htm

It installed on Windows 8.  I added the folder where sed.exe lives to the PATH.  sed --help in a command prompt shows the help. Supposedly everything it needs to run is in the installer.  It's good to have for cases where an already debugged one-liner may save you a ton of work.  :)
« Last Edit: December 12, 2013, 01:56 PM by MilesAhead »

BigVent

  • Member
  • Joined in 2013
  • **
  • Posts: 36
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #12 on: December 12, 2013, 03:13 PM »
 :huh:

"Why would you jump to that conclusion?  He says he scanned a thousand songs.  I'd say it was a safer assumption that he has 1000 files." - MilesAhead

"I'm creating a hymn book. The book I scanned and OCR'd had all the songs as blocks, with capitals only at the beginning of a sentence." - OP

He didn't say I scanned *several* books... but "THE BOOK".  Ergo, my assumption.  I could ask why you thought he had thousands of items and call you out on the boards for poorly reading what he wrote... but frankly I don't care.


It's good to have for cases where an already debugged one-liner may save you a ton of work.

That's awesome!  Perhaps I learned something from my 15 minutes of work on this script.   8)
~BigVent

MilesAhead

  • Supporting Member
  • Joined in 2009
  • **
  • Posts: 7,736
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #13 on: December 12, 2013, 05:20 PM »
He didn't say I scanned *several* books... but "THE BOOK".  Ergo, my assumption.  I could ask why you thought he had thousands of items and call you out on the boards for poorly reading what he wrote... but frankly I don't care.

It's no big deal.  I'm just arguing for my position.  :)  The reason I think he's likely to have multiple files is my assumption that each page scanned will automatically be saved to a file. It's just an assumption.  But sed will work either way.. one file or a thousand.

Perhaps I learned something from my 15 minutes of work on this script.   8)

  Linus does have some awesome tools.  I keep telling myself to get into vim editor.  But I end up surfing the boards instead.  :)  Eventually I'll tackle it though.  It's too powerful to ignore forever.  :)

TaoPhoenix

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 4,642
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #14 on: December 12, 2013, 07:28 PM »
It's no big deal.  I'm just arguing for my position.  :)  The reason I think he's likely to have multiple files is my assumption that each page scanned will automatically be saved to a file. It's just an assumption.  But sed will work either way.. one file or a thousand.

Just a random comment.

There's no assumptions when dealing with "scanning". The three rough options are "single file", "single page", and "...other".

Even my junk scanner and certainly the pro-grade one at work lets you batch pages! So no need to have single page files!

But neither would I raw-scan everything to a single file! First, that tends not to be the Way-of-part-of-the-Web.

Chapter scanning is becoming really popular - small enough to parse both on production and people side, but more than a page.

So one vote here for multi files, just not 1000 - maybe 17-30.


BigVent

  • Member
  • Joined in 2013
  • **
  • Posts: 36
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #15 on: December 12, 2013, 09:15 PM »
Either way gentlemen... Thank you for the banter.

MilesAhead - thanks for being a good sport. The tools you suggested are great. I downloaded & agree.

Tao - I scan everything to a single file... But that's just me. Handle it all at once & done.

We're just geeks plain and simple. All ways work.  :Thmbsup:

OP: if you're still reading... Find the way you prefer & roll with it.

There are many many ways to take care of what you need.

~BigVent

c.gingerich

  • Supporting Member
  • Joined in 2011
  • **
  • Posts: 748
    • View Profile
    • The Blind House
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #16 on: December 12, 2013, 10:23 PM »
@TaoPhoenix I just happened to read this... thanks for the mention! :D

Hallo Stormproof!

Kingsoft writer may have some ways to help.

(P.S. these "fast posted screen shots" are from Chris Gingerich's NANY this year! This is what it's for!)

So suppose you have this hymn fragment etc.

oh lord
tell me to stay strong
let me hold to thy name

In Kingsoft Writer

http://i.imm.io/1m26C.png
 (see attachment in previous post)

AbteriX

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 1,149
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #17 on: December 13, 2013, 01:52 AM »
IDEA - how to capitalise the first word of every line and behind an dot?


I think every text editor with Format Sentence-Case should be able to do that. E.g. Notepad2.




To work on only the first char of every line use a text editor with \L, \U regex metachars like EmEditor.
Find: ^(\w)
Repl: \U\1




Or with XYplorer:
Code: Text [Select]
  1. $file = readfile("c:\Temp\input.txt");
  2.  
  3.    $out="";
  4.    foreach( $line, $file, "<crlf>"){
  5.          $first = recase(substr($line, 0, 1), "upper");
  6.          $rest  =        substr($line, 1);
  7.          $out   = $out . $first . $rest . "<crlf>";
  8.    }
  9.    text $out;
  10.   //writefile("c:\Temp\out.txt", $out);





Or with PowerShell (v2.0 Syntax)

#Upper case first char in line, here also chars after dot, exclamation or question marks too:
Get-Content hymn-book.txt | ForEach-Object{ [Regex]::Replace($_, '^\w|[.!?]\s+\w', {param($mymatch) $mymatch.Value.toUpper()})}


Note: "ForEach-Object" can be shorten to "foreach" and even to just "%"


.
« Last Edit: December 13, 2013, 09:04 AM by AbteriX »

stormproof

  • Participant
  • Joined in 2013
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #18 on: December 16, 2013, 04:38 PM »
Being naive and not ever having done programming, I spent a couple of days bashing through it.

The Kingsoft Office approach was great - however it removed every upper case except for the first letter of the line. 
So "thank you Jesus" became "Thank you jesus"

Plus I had the bright idea of putting verse numbers like so - "2)" instead of "2."  So I had all the verses to do as well.
However, Find and Replace was very helpful, and helped to rationalise what had been quite a haphazard set of capitalisings.

I ended up with a sore bum and a muzzy head but all done.

Just to clarify - we started with a 30 year old xeroxed book, scanned in and OCR'd it to separate pages, copied and pasted to a master document, spellchecked it, corrected it and corrected it again.
Then did the capitalising bit.
Then made it so pages don't begin half way through a verse.
Then printed to PDF.

Next will be to print as required.

They are songs from the early days when we had revival.  We don't now, but it's inspiring to read what people wrote from their experience.
Be nice to get there again.

So - having used a Stone Age method this time round, and not knowing how to get into the mysteries of Powershell etc, I'd quite like to learn.  Any recommendations? 
I'm not terminally daft yet, and sort out configuration and malware for friends,  It's just the next step and I don't know where to begin.
I'll have another similar project in a month or so, and it would be a good learning project.

What do you collectively think?

stormproof

  • Participant
  • Joined in 2013
  • *
  • default avatar
  • Posts: 3
    • View Profile
    • Donate to Member
Re: IDEA - how to capitalise the first word of every line
« Reply #19 on: December 16, 2013, 04:39 PM »
Sorry I forgot to thank you all for your time and advice.
 :-[