ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Clipboard Help+Spell

Maybe someone can help with regular expressions

(1/2) > >>

Hilario:
Hello
I am trying to define a text conversion but don't find how to do. If some one can help it would be wonderfull.

For example I have this text
¿Sabrias evitar los problemas que te genera la documentación en tu negocio? ¡virtualizate! 

and want to convert in 
sabrias_evitar_los_problemas_que_te_genera_la_documentación_en_tu_negocio_virtualizate

That means:
1- no capital letters
2- "space" converted to "_" 
3- all ¿?¡!.;. (etc) deleted
4- special texts like áà substitute by a, éè substitute by e, the same for íì, óò, úù

Hope someone will answer this. Thanks to this good samaritan ;-)

c-sanchez:
Why you need use RegEx? I tried to understand regex, but I find it really difficult and annoying, programmers usually avoid it, that's probably why no one has answered you yet.
I think with coding you can make it easier, and also "human readable" :P

So, you need this to replace in text files or something like that?
Or to use with something like PHP, server side?
Maybe Javascript, client side?

In any case, all is doable without RegEx.

mouser:
For doing replacement stuff, c-sanchez is probably right.. you may be better off using a language like python to do the search and replacement stuff..

Ath:
If you insist on using regexes for the conversions, then sed can do it. When downloading sed for Windows, be sure to get the latest updated version. A GNU version of sed is required as BSD doesn't support the lowercase conversion \L.
This script should be saved as an ASCII/ANSI file for the y command to work properly, when saved as UTF-8, sed will complaint about the original and replacement strings not being of the same length :huh:

--- ---# Save this file as ANSI or the y command will cause an error
# Convert to lowercase
s:(.*):\L\1:
# Convert all spaces to _
s:[[:space:]]:_:g
# Remove special characters (][ must be first in range, - must be last in range!)
s:[][?¿/.>,<;\:'"!¡@#$^&*\(\)+=\{\}|-]::g
# Replace diacritics by non-diacritics (to be completed)
y:äáàâëéèêüúùûïíìîöóòôÿýçñ:aaaaeeeeuuuuiiiiooooyycn:

I tried to find all diacritics I could enter on my US-international keyboard layout, I may have left out some you might need. Please add what you need and is missing.
Separator character used is : on all script lines

Command-line should be something like:

--- ---sed -r -f above_script.txt <input.txt

Hilario:
Thanks a lot for all the comments.
I see this is really cumbersome.
Why did I decide to use regex?
Because I thought It was the option using clipboard. I may be wrong and they are others.
Thanks a lot to Ath for your scripts, it allows me to understand some ideas, really complex (for me) all this sed usage but clarify the sintax idea.

The main idea was to SELECT a phrase and with clipboard PASTE it transformed in a webready form.
May be some one as a better idea than what a I was asking.

Thanks again for the suggestions

Navigation

[0] Message Index

[#] Next page

Go to full version