ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Other Software > Developer's Corner

Parsing / Filtering text

<< < (2/2)

ewemoa:
Sorry, the environment I was using was non-English (ah, skwire has explained already) -- but you figured out the appropriate replacement :)

Copy-pasting much text from http://regularexpression.info/:

^([^\\]+)\\.*

Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character.

^([^\\]+)\\.*

Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex.

^([^\\]+)\\.*

Starts a character class. A character class matches a single character out of all the possibilities offered by the character class. Inside a character class, different rules apply.  Note: in this case, the closing square bracket ends the character class in question.

^([^\\]+)\\.*

Negates the character class, causing it to match a single character not listed in the character class. (Specifies a caret if placed anywhere except after the opening [)

^([^\\]+)\\.*

A backslash escapes special characters to suppress their special meaning.  Wanted to express backslash, but the backslash character has a special meaning in these contexts, so had to "escape" them using a backslash character in each case.

^([^\\]+)\\.*

Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.  "Previous item" here means the character class of non-backslash characters.

^([^\\]+)\\.*

Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too.

^([^\\]+)\\.*

Repeats the previous item zero or more times. "Previous item" here refers to the dot.  Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.  

^([^\\]+)\\.*

Bringing the pieces together, one English translation might be:

Match a line which:

starts with a sequence of non-backslash characters (and, oh, let's hold on to this for later reference [1]),
continues with at least one backslash character,
and further continues with some text which we don't really care about

As for the replacement portion:

\1

Substituted with the text matched between the 1st through 9th pair of capturing parentheses.  In the case in question, there is only one pair of capturing parentheses and they captured a sequence of non-backslash characters at the beginning of a line.

This description is not as complete as it might be, but perhaps it will suffice.


[1] Referred to as a "backreference".

ewemoa:
Here are some samples of using "The Regex Coach" to study regular expressions:

Specify Regular Expression and Target String
Parsing / Filtering text

Observe Tree Analysis of Regular Expression
Parsing / Filtering text

Specify Replacement String
Parsing / Filtering text

Step Through Regular Expression Evaluation
Parsing / Filtering text

Navigation

[0] Message Index

[*] Previous page

Go to full version