Find words in text using Regular Expressions



  • Hello!

    I need to find all Russian words in a UTF8 string using RegExp. I only interesting in those words which is longer than 3 symbols.
    I tryed to use word boundaries, but regexp doesn't recognise Russian wors as words 😞

    The "word" is something that contain only the Russian letters, and do not has russian letters before and after it.

    So, I have this RegExp:

    (?:[^\x{0410}-\x{044f}\x{0401}\x{0451}]|^)[\x{0410}-\x{044f}\x{0401}\x{0451}]{4,}

    The problem is that I miss the first word in a string. What is your suggestions?



  • More precise: I getting one more letter than I need except for the first word.

    Another question: how to write RegExp that will skip words containing all capitals?


Anmelden zum Antworten