Find words in text using Regular Expressions
-
Hello!
I need to find all Russian words in a UTF8 string using RegExp. I only interesting in those words which is longer than 3 symbols.
I tryed to use word boundaries, but regexp doesn't recognise Russian wors as wordsThe "word" is something that contain only the Russian letters, and do not has russian letters before and after it.
So, I have this RegExp:
(?:[^\x{0410}-\x{044f}\x{0401}\x{0451}]|^)[\x{0410}-\x{044f}\x{0401}\x{0451}]{4,}
The problem is that I miss the first word in a string. What is your suggestions?
-
More precise: I getting one more letter than I need except for the first word.
Another question: how to write RegExp that will skip words containing all capitals?