Well, the simplest regex to extract textual content as 'words' from (binary) files would be: \w+
If you only want alphanumeric characters (no accent- or punctuation- characters), that could be something like [a-zA-Z0-9]+
A great site to learn about and try out Regular Expressions has been
https://regex101.com for me, with an extended explanation of what's going on with the regex you're trying, and an index of all expression elements available.