If you insist on using regexes for the conversions, then sed
can do it. When downloading sed for Windows, be sure to get the latest updated version. A GNU version of sed is required as BSD doesn't support the lowercase conversion \L.
This script should be saved as an ASCII/ANSI file for the y command to work properly, when saved as UTF-8, sed will complaint about the original and replacement strings not being of the same length
# Save this file as ANSI or the y command will cause an error
# Convert to lowercase
# Convert all spaces to _
# Remove special characters (][ must be first in range, - must be last in range!)
# Replace diacritics by non-diacritics (to be completed)
I tried to find all diacritics I could enter on my US-international keyboard layout, I may have left out some you might need. Please add what you need and is missing.
Separator character used is : on all script lines
Command-line should be something like:
sed -r -f above_script.txt <input.txt