Author Topic: Maybe someone can help with regular expressions (Read 7309 times)

Hilario · « **on:** December 05, 2019, 06:34 AM »

Hello
I am trying to define a text conversion but don't find how to do. If some one can help it would be wonderfull.

For example I have this text
¿Sabrias evitar los problemas que te genera la documentación en tu negocio? ¡virtualizate!

and want to convert in
sabrias_evitar_los_problemas_que_te_genera_la_documentación_en_tu_negocio_virtualizate

That means:
1- no capital letters
2- "space" converted to "_"
3- all ¿?¡!.;. (etc) deleted
4- special texts like áà substitute by a, éè substitute by e, the same for íì, óò, úù

Hope someone will answer this. Thanks to this good samaritan ;-)

c-sanchez · « **Reply #1 on:** December 08, 2019, 08:56 AM »

Why you need use RegEx? I tried to understand regex, but I find it really difficult and annoying, programmers usually avoid it, that's probably why no one has answered you yet.
I think with coding you can make it easier, and also "human readable"

So, you need this to replace in text files or something like that?
Or to use with something like PHP, server side?
Maybe Javascript, client side?

In any case, all is doable without RegEx.

mouser · « **Reply #2 on:** December 08, 2019, 09:19 AM »

For doing replacement stuff, c-sanchez is probably right.. you may be better off using a language like python to do the search and replacement stuff..

Ath · « **Reply #3 on:** December 08, 2019, 11:23 AM »

If you insist on using regexes for the conversions, then sed can do it. When downloading sed for Windows, be sure to get the latest updated version. A GNU version of sed is required as BSD doesn't support the lowercase conversion \L.
This script should be saved as an ASCII/ANSI file for the y command to work properly, when saved as UTF-8, sed will complaint about the original and replacement strings not being of the same length

[Select]

# Save this file as ANSI or the y command will cause an error
# Convert to lowercase
s:(.*):\L\1:
# Convert all spaces to _
s:[[:space:]]:_:g
# Remove special characters (][ must be first in range, - must be last in range!)
s:[][?¿/.>,<;\:'"!¡@#$^&*+=\{\}|-]::g
# Replace diacritics by non-diacritics (to be completed)
y:äáàâëéèêüúùûïíìîöóòôÿýçñ:aaaaeeeeuuuuiiiiooooyycn:

I tried to find all diacritics I could enter on my US-international keyboard layout, I may have left out some you might need. Please add what you need and is missing.
Separator character used is : on all script lines

Command-line should be something like:

[Select]

sed -r -f above_script.txt <input.txt

Hilario · « **Reply #4 on:** December 09, 2019, 05:30 AM »

Thanks a lot for all the comments.
I see this is really cumbersome.
Why did I decide to use regex?
Because I thought It was the option using clipboard. I may be wrong and they are others.
Thanks a lot to Ath for your scripts, it allows me to understand some ideas, really complex (for me) all this sed usage but clarify the sintax idea.

The main idea was to SELECT a phrase and with clipboard PASTE it transformed in a webready form.
May be some one as a better idea than what a I was asking.

Thanks again for the suggestions

Stoic Joker · « **Reply #5 on:** December 09, 2019, 06:43 AM »

I've never used RegEx frequently enough to be good at it, but I have found WildGem handy for getting through it from time to time. It's a freeware graphical RegEx query builder utility written by one of the members here at DC.

Author Topic: Maybe someone can help with regular expressions (Read 7309 times)

Hilario

Maybe someone can help with regular expressions

c-sanchez

Re: Maybe someone can help with regular expressions

mouser

Re: Maybe someone can help with regular expressions

Ath

Re: Maybe someone can help with regular expressions

Hilario

Re: Maybe someone can help with regular expressions

Stoic Joker

Re: Maybe someone can help with regular expressions