topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday December 12, 2024, 5:45 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: [Solved] Manipulate lines in a txt file  (Read 20656 times)

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
[Solved] Manipulate lines in a txt file
« on: February 16, 2017, 03:58 PM »
I have a txt file with many lines.
Each line may have several points.
Phrase 1. Phrase 2.Phrase3. Phrase 4 ....... until ten or more

I would like create from each line of the original file a line with each phrase, except phrase 1 that is deleted.

I would like a simple application to do this with regular expressions or similar. Or an autohotkey script for this purpose.

Split a line by the points and delete the first phrase (until the first point....)

Do you know a simple environment to do this ?

Best Regards
« Last Edit: October 28, 2017, 12:57 PM by Contro »

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #1 on: February 17, 2017, 01:34 AM »
Showing some actual data might help to provide the best solution. Knowing the separator between the phrases is crucial.

Simple: Do it by hand in a text editor. (Yes, that's time consuming, but simple)
Less simple: use sed or awk (both of these Unix command-line tools are available in a Windows version, or can be run using Cygwin)

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #2 on: February 17, 2017, 02:45 AM »
awk one-liner to get your original request, assuming period ('.') as a separator, ignoring the first phrase on each sentence, removing leading whitespace from each column and restoring the period after each phrase:
awk -F'.' '{for(i=2;i<=NF;i++) {$i=gensub(/^[ \t]*|[ \t]*$/, "","g",$i);if($i!="") print$i"."}}'
add this to process input.file to output.file (output.file is created/overwritten):
input.file >output.file

anandcoral

  • Honorary Member
  • Joined in 2009
  • **
  • Posts: 783
    • View Profile
    • Free Portable Apps
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #3 on: February 17, 2017, 03:45 AM »
Hi Contro,

You can try search and replace in text editor like Notepad++ to get what you want.

You can also try online line break http://textmechanic....dremove-line-breaks/, it has many text manipulation options.

There is another text manipulation website and may help you http://nimbletext.co...HowTo/ManipulateText

Regards,

Anand


Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #4 on: February 17, 2017, 08:39 AM »
Hi Contro,

You can try search and replace in text editor like Notepad++ to get what you want.

You can also try online line break http://textmechanic....dremove-line-breaks/, it has many text manipulation options.

There is another text manipulation website and may help you http://nimbletext.co...HowTo/ManipulateText

Regards,

Anand


I have tried with pspad with no good result.
I doubt notepad++ may help.

trying the rest.
the last one seems a paid option

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #5 on: February 17, 2017, 10:07 AM »
In NANY 2014 there was TIMU by phitsc, but I couldn't get it to remove the first phrase or empty lines.

Just download awk (or gawk) for windows (command-line tool) and run it using the command I've shown above... :tellme:

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #6 on: February 17, 2017, 01:39 PM »
See below, Reply #18, for a better working, 4 step, solution, using NotePad++


Here's a '3-step' Notepad++ solution:

Settings for Replace dialog (Ctrl-H):
Search mode: Regular expression
. matches newline: Off (=not checked)

Step 1: (Remove all first phrases, trim any leading white-space from the remaining part)
Find what: [^\.]*?\.\s*(.*)
Replace with: $1\n
Press Replace all button

Step 2: (Replace all remaining phrases by the phrase and a new-line, removing any white-space in front of a phrase)
Find what: \s*([^\.]*?\.)
Replace with: $1\n
Press Replace all button

There are other issues with these type of free texts, the (short) example contains shortened words like St. and Dr. that cause excess line-breaks to be inserted. These can be fixed by adding another step to the Notepad++ recipe:
Step 3: (Restore shortened words with their second part)
Find what: (St\.|Dr\.)\n
Replace with: $1
Press Replace all button
Done :D

If more shortened words are found, add them to Find what within the round braces, prefixed with a vertical pipe, like this for adding "Mt.":
Find what: (St\.|Dr\.|Mt\.)\n

Remember: It replaces the file-contents, so either work on a copy or use Save as... if the original file needs to be preserved.

PS: Tested using Notepad++ 7.2.2 and EditPad Lite 7.5

NB: please report how it's working for your file(s) so the thread can be moved to Resolved when satisfactory.

Edit: Updated step 1 to remove leading white-space of the remaining text.
Edit2: Added step 3 to restore 'unintended' line-breaks after shortened words like Dr., St., etc.
Edit2: Replaced \1 substitutions by more generic $1.
Edit2: Also tested with EditPad Lite. PSPad isn't capable of doing step 3, so I'd advise to not use PSPad!
Edit3: Superseded by a better 4 step solution, see below, Reply #18
« Last Edit: February 18, 2017, 05:57 AM by Ath »

wraith808

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 11,190
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #7 on: February 17, 2017, 02:16 PM »
I want to echo one of Ath's requests that still has not been answered... Give us sample data.  It will make this conversation a lot more productive.

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #8 on: February 17, 2017, 02:17 PM »
Here's a '2-step' Notepad++ solution:

Settings for Replace dialog (Ctrl-H):
Search mode: Regular expression
. matches newline: Off (=not checked)

Step 1: (Remove all first phrases)
Find what: [^\.]*?\.(.*)
Replace with: \1\n
Press Replace all button

Step 2: (Replace all remaining phrases by the phrase and a new-line, removing any white-space in front of a phrase)
Find what: \s*([^\.]*?\.)
Replace with: \1\n
Press Replace all button
Done :D

Remember: It replaces the file-contents, so either work on a copy or use Save as... if the original file needs to be preserved.

PS: Tested using Notepad++ 7.2.2

NB: please report how it's working for your file(s) so the thread can be moved to Resolved when satisfactory.


I will try and comment Ath. Best Regards

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #9 on: February 17, 2017, 02:47 PM »
The example file is like this with 19000 lines . Each line contains several phrases seperated by periods ("." ) (what i call erronously "points" )
1 line
Hasta aquí esta revisión sobre los conceptos de George Vithoulkas, que resumiendo lleva de nuevo a ratificar que: los síntomas de mayor jerarquía en cada caso individual de enfermedad, son los síntomas del plano mental/espiritual, seguidos por los del plano emocional y por último el plano físico. De ahí la importancia del estudio detallado y profundo de los síntomas que aparecen en el repertorio en el primer capítulo que es Mente o Psiquis.
2 line
El Dr. James Tyler Kent nació en los Estados Unidos de América en Woodhull, New York, el 31 de marzo de 1849, estudió medicina en Eclectic Medical Institute de Cincinati, Ohio y recibió su grado en 1871. Su práctica médica la comenzó a los 26 años en St. Louis, Missouri, y llegó a ser un distinguido miembro de la Eclectic National Medical Association. Su primera esposa había muerto en el año 1877 y Lucy, su segunda esposa, enfermó y fue naturalmente sometida al tratamiento de la época, tanto ortodoxo como ecléctico, pero ella no mostraba mejoría y seguía empeorando, por lo cual se consultó al Dr. Richard Phelan, que era médico homeópata. Éste hizo su prescripción y los síntomas (debilidad nerviosa, insomnio y anemia) desaparecieron rápidamente y la paciente se recuperó totalmente en poco tiempo. Por ese motivo, Kent decidió entonces estudiar homeopatía con el Dr. Phelan y llegó a considerar que la homeopatía “era el único método terapéutico basado en leyes y principios, y el único que se dirigía a la causa fundamental de la enfermedad” (Llobet, 2014). Se convirtió así en un estudioso de la homeopatía, en especial de la 5ª edición del Organon de Hahnemann2 y se convirtió completamente a la nueva escuela. Después de abandonar la Eclectic National Medical Association en 1879, ocupó la cátedra de Homeopatía en el “Homoeopathic Medical College” de St. Louis desde 1883 hasta 1888; fue profesor de Materia Médica en el “Hahnemann Medical College” de Philadelphia y además en el “Hering Medical College Hospital” de Chicago.
3 line
Es y ha sido considerado como uno de los homeópatas más destacados de la escuela de homeopatía norteamericana; fue miembro de varias asociaciones médicas como la “Illinois State Homoeopathic Medical Society”, el “American Institute of Homoeopathy” y la “International Hahnemannian Association” y miembro honorario de la “British Homoeopathic Medical Society”. En 1916, cuando disfrutaba de vacaciones, se agravó una bronquitis crónica que padecía y que luego se complicó con glomerulonefritis y falleció el 6 de junio de ese año en Stevensville (Montana).
4 line
Según el autor de este artículo, Kent fue el creador directo de su Repertorio, mientras que sus otros escritos, fueron una recopilación de los apuntes o notas que sus alumnos tomaban durante las clases que él dictaba.


and so on ....

The target is deleted the first phrase of each line, and then explode the resto of each line or string into seperate phrases by the period.


wraith808

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 11,190
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #10 on: February 17, 2017, 05:53 PM »
When you say delete the first phrase of each line... in your line 1, what part would you delete?

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #11 on: February 17, 2017, 06:51 PM »
Here's a '2-step' Notepad++ solution:

Settings for Replace dialog (Ctrl-H):
Search mode: Regular expression
. matches newline: Off (=not checked)

Step 1: (Remove all first phrases)
Find what: [^\.]*?\.(.*)
Replace with: \1\n
Press Replace all button

Step 2: (Replace all remaining phrases by the phrase and a new-line, removing any white-space in front of a phrase)
Find what: \s*([^\.]*?\.)
Replace with: \1\n
Press Replace all button
Done :D

Remember: It replaces the file-contents, so either work on a copy or use Save as... if the original file needs to be preserved.

PS: Tested using Notepad++ 7.2.2

NB: please report how it's working for your file(s) so the thread can be moved to Resolved when satisfactory.


I am a user of pspad several years ago. First with notepad++. Both are excelent programs.
I tried to use , for the ocassion, only notepad++ to follow Ath instructions.

The two steps procedure works like a charm.

I had installed the notepad++ spanish version and use very little. Only when i have many files opened, just because I am more used to pspad.

I haven't discover the similar option with pspad. In the spanish version the option .matches newline correspond to .se ajusta a linea (unmarked)

Following the very good instructions from Ath I obtain the first phrase deleted and the beginning of the next phrase seperated by one space from the left.
So I only have to trim the left and delete any empty line after deleting the first phrase of each line.
This is very easy with pspad, and I suppose too with notepad++

Then first step wonderful with Notepad++

Then apply the second step to the rest of lines.

The second step works wonderfully. This time without need to trim from the left.


During Ath wonderful intervention I was searching : text mechanic, text crawler and a lot more. But the simplicity of notepad++ is the final winner.

The problem - my problem - is totally solved.

Best Regards
 :-* :tellme: :tellme:



Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #12 on: February 17, 2017, 06:55 PM »
When you say delete the first phrase of each line... in your line 1, what part would you delete?

By example :

Phrase 1. Phrase2.
Phrase1 .Phrase2.
Phrase1. Phrase 2.

The result is Phrase2. or Phrase 2.

Delete Phrase 1. Delete the period. Delete the space beginning the next phrase.
Until the next first letter of the second phrase.

 :-*

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #13 on: February 17, 2017, 07:07 PM »
BTW . I like homeopathy as energetic medicine.
I don't understand very well why is a pseudo science.
Perhaps by economic interests of the official medicine ?

What do you think ?

Ejem. I am trying to find an homeopathic remedy for my lack of memory. Since is an atoxic medicine is no risk.

Recently , in my first mountain excursion to lose weight , I have an accident and received a lot of medicines to cure. Anti inflammatory, analgesics, ..... . But I decided use ice and manage the ache all over.

So I got an influeza by the cold  ;D and rest a lot more of hours because of the pain. But discover that I lose more kilograms according to the number of hours i rest.

So my real medicine is rest and dream.

 :-*

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #14 on: February 18, 2017, 02:49 AM »
So I only have to trim the left and delete any empty line after deleting the first phrase of each line.
This is very easy with pspad, and I suppose too with notepad++
Hm, you didn't say you wanted to store the intermediate result too.
A small addition to step 1 would fix that:
Updated step 1:
Step 1: (Remove all first phrases, trim any leading white-space from the remaining part)
Find what: [^\.]*?\.\s*(.*)
Replace with: \1\n
Press Replace all button

I updated the original post for this Notepad++ solution.
« Last Edit: February 18, 2017, 02:55 AM by Ath »

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #15 on: February 18, 2017, 02:54 AM »
4 line
Según el autor de este artículo, Kent fue el creador directo de su Repertorio, mientras que sus otros escritos, fueron una recopilación de los apuntes o notas que sus alumnos tomaban durante las clases que él dictaba.

This is not such a good example, as it has only 1 period, so the entire line is removed in step 1...

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #16 on: February 18, 2017, 03:23 AM »
Updated: Don't use PSPad for this operation, as it doesn't have a powerful enough Find & Replace operation, use NotePad++ or EditPad Lite (both free) instead.




Well, I also tried to run this with PSPad, but Find & Replace is quite less powerful than Notepad++ in this area (and even buggy, IMHO).
Open Find & Replace (Ctrl-H)
Regular Expressions needs to be checked, and I chose Direction: Entire scope
Step 1 needs an adjustment:
Replace: $1
Step 2 needs an no adjustment:
Replace: $1\n
Step 3 above can't be performed using PSPad.

Step 4 needs to be added for removing empty lines:
Choose Edit/Lines manipulation/Remove Blank Lines (Alt-E,N,M)
Voila :-\

NB: There are several reasons I dropped PSPad many years ago in favor of Notepad++, this is one of them.
Edit: Do not use PSPad for these operations. Alternatives mentioned.
« Last Edit: February 18, 2017, 04:41 AM by Ath »

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #17 on: February 18, 2017, 04:18 AM »
There are other issues with these type of free texts, the (short) example contains shortened words like St. and Dr. that cause excess line-breaks to be inserted. These can be fixed by adding another step to the Notepad++ recipe:
Step 3: (Restore shortened words with their second part)
Find what: (St\.|Dr\.)\n
Replace with: \1
Press Replace all button

If more shortened words are found, add them to Find what within the round braces, prefixed with a vertical pipe, like this for adding "Mt.":
(St\.|Dr\.|Mt\.)\n

NB: This won't (can't) work with PSPad! because of the limitations of Find & Replace.
NB2: I haven't found a way to keep the lines with a single period yet, but don't know if that's desired/required?

Also added to the the Notepad++ solution, above.
« Last Edit: February 18, 2017, 04:42 AM by Ath »

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #18 on: February 18, 2017, 05:54 AM »
Here's a better, 4 step, solution using NotePad++, realizing that the shortened words really mess up the removal of the 'first phrases' part.

Settings for Replace dialog (Ctrl-H):
Search mode: Regular expression
. matches newline: Off (=not checked)

Step 1: Replace shortened words with an alternative (_ used, assumed not to be anywhere in the document! )
Find what: (Dr|Mt|St|Lt)\.
Replace with: $1_

Step 2: Remove all first phrases, trim any leading white-space from the remaining part
Find what: [^\.]*?\.\s*(.*)
Replace with: $1\n
Press Replace all button

Step 3: Replace all remaining phrases by the phrase and a new-line, removing any white-space in front of a phrase
Find what: \s*([^\.]*?\.)
Replace with: $1\n
Press Replace all button

Step 4: Restore periods after all shortened words
Find what: _
Replace with: .
Press Replace all button
Done :D

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #19 on: February 18, 2017, 08:00 AM »
I take all this to home.
The problem is completely solved

Thanks a lot  :tellme:

I have another target.
Opening a new thread !!!!


Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #20 on: February 18, 2017, 08:05 AM »
I take all this to home.
The problem is completely solved
Only if you have applied the 4 step NotePad++ solution, previous 2/3 step solutions aren't correctly removing the first phrase in all cases.

Thanks a lot  :tellme:
You're welcome. :up:

wraith808

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 11,190
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #21 on: February 18, 2017, 09:27 AM »
4 line
Según el autor de este artículo, Kent fue el creador directo de su Repertorio, mientras que sus otros escritos, fueron una recopilación de los apuntes o notas que sus alumnos tomaban durante las clases que él dictaba.

This is not such a good example, as it has only 1 period, so the entire line is removed in step 1...


Well, wouldn't it be a good example?  What does he want done if this happens?

Ath

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 3,629
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #22 on: February 19, 2017, 07:14 AM »
Well, wouldn't it be a good example?  What does he want done if this happens?
Yep, it would be. No answer yet :(

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #23 on: March 04, 2017, 05:11 AM »
if the line is Phrase1. phrase 2 . Phrase3.
remain

phrase 2 . Phrase3.

Delete until the beginning of the phrase number 2

Contro

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 3,940
    • View Profile
    • Donate to Member
Re: Manipulate lines in a txt file
« Reply #24 on: March 04, 2017, 05:14 AM »
Sorry I was trying to quote wraith808, but fails.

If a line is composed of several phrases delete until the beginning of the next phrase. Each phrase must finish in a period. So is determine the first character after the period (except the space...).

But I have this resolve up the lines. I was revising because I don't like forget the thread.