Author Topic: Parsing / Filtering text (Read 8341 times)

RedPillow · « **on:** February 25, 2010, 04:27 AM »

Yo again, another topic I need help with.

I have this big list in a txt-file which contains paths.

Example paths:

Motorcycle\crap\morecrap
Car\crap\morecrap
Truck\crap\morecrap
Bike\crap\morecrap

And now, I want to delete everything after the first "\" so the lines will look like this:

Motorcycle
Car
Truck
Bike

How do I do this?
What program to use?
Possibly a script?
It can`t be done with notepad`s search & replace command like this:

Search: \*
Replace: -empty-

Cause it tries to find "\*" and not "\anything".

Suggestions?

skwire · « **Reply #1 on:** February 25, 2010, 04:44 AM »

If you're familiar with RegEx at all, here is a way to do it in AutoHotkey. You could easily adapt the RegEx portion to your own code or a more capable editor that has RegEx search capabilities.

Code: AutoIt [Select]

Text =
(
Motorcycle\crap\morecrap
Car\crap\morecrap
Truck\crap\morecrap
Bike\crap\morecrap
)
 
Loop, Parse, Text, `n
{
    If ( A_LoopField != "" )
    {
        RegExMatch( A_LoopField, "^(.+?)\\", SubPat )
        Block .= SubPat1 . "`r`n"
    }
}
 
MsgBox, % Block

ewemoa · « **Reply #2 on:** February 25, 2010, 05:16 AM »

Here's one way to use Notepad++ for this task.

Edited the images -- should be easier for subsequent viewings

Open file in Notepad++

Parsing / Filtering text

Choose Search -> Replace

Parsing / Filtering text

Ensure Cursor is at Beginning of Text and Select "Regular expression" for Search Mode

3. Ensure Cursor is at Beginning of Text and Select _Regular expression_ for Search Mode.png

Parsing / Filtering text

Fill in Appropriately Values for "Find what" and "Replace with"

4. Fill in Appropriately Values for _Find what_ and _Replace with_.png

Parsing / Filtering text

Click "Replace All" Button

Parsing / Filtering text

Examine the Results

Parsing / Filtering text

RedPillow · « **Reply #3 on:** February 25, 2010, 05:39 AM »

Nice one ewemoa!!

I had to replace ¥`s with \`s thou :]

Can you break this ^([^\\]+)\\.* apart and explain what each thing there does and the \1 also?

skwire · « **Reply #4 on:** February 25, 2010, 05:53 AM »

I had to replace ¥`s with \`s thou :]
-RedPillow (February 25, 2010, 05:39 AM)

The yen symbols are actually backslashes on ewemoa's (and my) computer. It's a side effect of using Japanese as the default language on a English Windows box. I've become so accustomed to it over the years that my eyes don't even "see" them as yen symbols anymore.

ewemoa · « **Reply #5 on:** February 25, 2010, 06:30 AM »

Sorry, the environment I was using was non-English (ah, skwire has explained already) -- but you figured out the appropriate replacement

Copy-pasting much text from http://regularexpression.info/:

^([^\\]+)\\.*

Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character.

^([^\\]+)\\.*

Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex.

^([^\\]+)\\.*

Starts a character class. A character class matches a single character out of all the possibilities offered by the character class. Inside a character class, different rules apply. Note: in this case, the closing square bracket ends the character class in question.

^([^\\]+)\\.*

Negates the character class, causing it to match a single character not listed in the character class. (Specifies a caret if placed anywhere except after the opening [)

^([^\\]+)\\.*

A backslash escapes special characters to suppress their special meaning. Wanted to express backslash, but the backslash character has a special meaning in these contexts, so had to "escape" them using a backslash character in each case.

^([^\\]+)\\.*

Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once. "Previous item" here means the character class of non-backslash characters.

^([^\\]+)\\.*

Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.

^([^\\]+)\\.*

Repeats the previous item zero or more times. "Previous item" here refers to the dot. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.

^([^\\]+)\\.*

Bringing the pieces together, one English translation might be:

Match a line which:

starts with a sequence of non-backslash characters (and, oh, let's hold on to this for later reference [1]),
continues with at least one backslash character,
and further continues with some text which we don't really care about

As for the replacement portion:

\1

Substituted with the text matched between the 1st through 9th pair of capturing parentheses. In the case in question, there is only one pair of capturing parentheses and they captured a sequence of non-backslash characters at the beginning of a line.

This description is not as complete as it might be, but perhaps it will suffice.

[1] Referred to as a "backreference".

ewemoa · « **Reply #6 on:** March 07, 2010, 09:23 PM »

Here are some samples of using "The Regex Coach" to study regular expressions:

Specify Regular Expression and Target String

1. Specify Regular Expression and Target String.png

Parsing / Filtering text

Observe Tree Analysis of Regular Expression

2. Observe Tree Analysis of Regular Expression.png

Parsing / Filtering text

Specify Replacement String

Parsing / Filtering text

Step Through Regular Expression Evaluation

4. Step Through Regular Expression Evaluation.png

Parsing / Filtering text

Author Topic: Parsing / Filtering text (Read 8341 times)

RedPillow

Parsing / Filtering text

skwire

Re: Parsing / Filtering text

ewemoa

Re: Parsing / Filtering text

RedPillow

Re: Parsing / Filtering text

skwire

Re: Parsing / Filtering text

ewemoa

Re: Parsing / Filtering text

ewemoa

Re: Parsing / Filtering text