Welcome Guest.   Make a donation to an author on the site July 23, 2014, 09:11:50 AM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
Your Support Funds this Site: View the Supporter Yearbook.
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: Parsing / Filtering text  (Read 2075 times)
RedPillow
Member
**
Posts: 140


Pillows.

see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« on: February 25, 2010, 04:27:21 AM »

Yo again, another topic I need help with.

I have this big list in a txt-file which contains paths.

Example paths:

Motorcycle\crap\morecrap
Car\crap\morecrap
Truck\crap\morecrap
Bike\crap\morecrap

And now, I want to delete everything after the first "\" so the lines will look like this:

Motorcycle
Car
Truck
Bike

How do I do this?
What program to use?
Possibly a script?
It can`t be done with notepad`s search & replace command like this:

Search: \*
Replace: -empty-

Cause it tries to find "\*" and not "\anything".

Suggestions?
Logged
skwire
Charter Member
***
Posts: 4,015



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #1 on: February 25, 2010, 04:44:55 AM »

If you're familiar with RegEx at all, here is a way to do it in AutoHotkey.  You could easily adapt the RegEx portion to your own code or a more capable editor that has RegEx search capabilities.

Formatted for AutoIt with the GeSHI Syntax Highlighter [copy or print]
  1. Text =
  2. (
  3. Motorcycle\crap\morecrap
  4. Car\crap\morecrap
  5. Truck\crap\morecrap
  6. Bike\crap\morecrap
  7. )
  8.  
  9. Loop, Parse, Text, `n
  10. {
  11.    If ( A_LoopField != "" )
  12.    {
  13.        RegExMatch( A_LoopField, "^(.+?)\\", SubPat )
  14.        Block .= SubPat1 . "`r`n"
  15.    }
  16. }
  17.  
  18. MsgBox, % Block
Logged

ewemoa
Honorary Member
**
Posts: 2,386



View Profile Give some DonationCredits to this forum member
« Reply #2 on: February 25, 2010, 05:16:20 AM »

Here's one way to use Notepad++ for this task.

Edited the images -- should be easier for subsequent viewings smiley

Open file in Notepad++


Choose Search -> Replace


Ensure Cursor is at Beginning of Text and Select "Regular expression" for Search Mode


Fill in Appropriately Values for "Find what" and "Replace with"


Click "Replace All" Button


Examine the Results
« Last Edit: February 25, 2010, 08:59:31 PM by ewemoa » Logged
RedPillow
Member
**
Posts: 140


Pillows.

see users location on a map View Profile Read user's biography. Give some DonationCredits to this forum member
« Reply #3 on: February 25, 2010, 05:39:50 AM »

Nice one ewemoa!!

I had to replace ¥`s with \`s thou :]

Can you break this ^([^\\]+)\\.* apart and explain what each thing there does and the \1 also?
Logged
skwire
Charter Member
***
Posts: 4,015



Another Coding Snack request? Om nom nom...

see users location on a map View Profile WWW Give some DonationCredits to this forum member
« Reply #4 on: February 25, 2010, 05:53:52 AM »

I had to replace ¥`s with \`s thou :]

The yen symbols are actually backslashes on ewemoa's (and my) computer.  It's a side effect of using Japanese as the default language on a English Windows box.  I've become so accustomed to it over the years that my eyes don't even "see" them as yen symbols anymore.
Logged

ewemoa
Honorary Member
**
Posts: 2,386



View Profile Give some DonationCredits to this forum member
« Reply #5 on: February 25, 2010, 06:30:19 AM »

Sorry, the environment I was using was non-English (ah, skwire has explained already) -- but you figured out the appropriate replacement smiley

Copy-pasting much text from http://regularexpression.info/:

^([^\\]+)\\.*

Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character.

^([^\\]+)\\.*

Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex.

^([^\\]+)\\.*

Starts a character class. A character class matches a single character out of all the possibilities offered by the character class. Inside a character class, different rules apply.  Note: in this case, the closing square bracket ends the character class in question.

^([^\\]+)\\.*

Negates the character class, causing it to match a single character not listed in the character class. (Specifies a caret if placed anywhere except after the opening [)

^([^\\]+)\\.*

A backslash escapes special characters to suppress their special meaning.  Wanted to express backslash, but the backslash character has a special meaning in these contexts, so had to "escape" them using a backslash character in each case.

^([^\\]+)\\.*

Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.  "Previous item" here means the character class of non-backslash characters.

^([^\\]+)\\.*

Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.

^([^\\]+)\\.*

Repeats the previous item zero or more times. "Previous item" here refers to the dot.  Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.  

^([^\\]+)\\.*

Bringing the pieces together, one English translation might be:

Match a line which:

starts with a sequence of non-backslash characters (and, oh, let's hold on to this for later reference [1]),
continues with at least one backslash character,
and further continues with some text which we don't really care about

As for the replacement portion:

\1

Substituted with the text matched between the 1st through 9th pair of capturing parentheses.  In the case in question, there is only one pair of capturing parentheses and they captured a sequence of non-backslash characters at the beginning of a line.

This description is not as complete as it might be, but perhaps it will suffice.


[1] Referred to as a "backreference".
Logged
ewemoa
Honorary Member
**
Posts: 2,386



View Profile Give some DonationCredits to this forum member
« Reply #6 on: March 07, 2010, 09:23:23 PM »

Here are some samples of using "The Regex Coach" to study regular expressions:

Specify Regular Expression and Target String


Observe Tree Analysis of Regular Expression


Specify Replacement String


Step Through Regular Expression Evaluation
Logged
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.104s | Server load: 0.13 ]