Ruby: Extracting Words From String Ruby: Extracting Words From String ruby ruby

Ruby: Extracting Words From String


The split command.

   words = @string1.split(/\W+/)

will split the string into an array based on a regular expression. \W means any "non-word" character and the "+" means to combine multiple delimiters.


For me the best to spliting sentences is:

line.split(/[^[[:word:]]]+/)

Even with multilingual words and punctuation marks work perfectly:

line = 'English words, Polski Żurek!!! crème fraîche...'line.split(/[^[[:word:]]]+/)=> ["English", "words", "Polski", "Żurek", "crème", "fraîche"] 


Well, you could split the string on spaces if that's your delimiter of interest

@string1.split(' ')

Or split on word boundaries

\W  # Any non-word character\b  # Any word boundary character

Or on non-words

\s  # Any whitespace character

Hint: try testing each of these on http://rubular.com

And note that ruby 1.9 has some differences from 1.8