Best way to count words in a string in Ruby? Best way to count words in a string in Ruby? ruby-on-rails ruby-on-rails

Best way to count words in a string in Ruby?


string.split.size

Edited to explain multiple spaces

From the Ruby String Documentation page

split(pattern=$;, [limit]) → anArray

Divides str into substrings based on a delimiter, returning an array of these substrings.

If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.

If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. If pattern contains groups, the respective matches will be returned in the array as well.

If pattern is omitted, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ' ' were specified.

If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.

" now's  the time".split        #=> ["now's", "the", "time"]

While that is the current version of ruby as of this edit, I learned on 1.7 (IIRC), where that also worked. I just tested it on 1.8.3.


I know this is an old question, but this might be useful to someone else looking for something more sophisticated than string.split. I wrote the words_counted gem to solve this particular problem, since defining words is pretty tricky.

The gem lets you define your own custom criteria, or use the out of the box regexp, which is pretty handy for most use cases. You can pre-filter words with a variety of options, including a string, lambda, array, or another regexp.

counter = WordsCounted::Counter.new("Hello, Renée! 123")counter.word_count #=> 2counter.words #=> ["Hello", "Renée"]# filter the word "hello"counter = WordsCounted::Counter.new("Hello, Renée!", reject: "Hello")counter.word_count #=> 1counter.words #=> ["Renée"]# Count numbers onlycounter = WordsCounted::Counter.new("Hello, Renée! 123", rexexp: /[0-9]/)counter.word_count #=> 1counter.words #=> ["123"]

The gem provides a bunch more useful methods.


If the 'word' in this case can be described as an alphanumeric sequence which can include '-' then the following solution may be appropriate (assuming that everything that doesn't match the 'word' pattern is a separator):

>> 'one-way street'.split(/[^-a-zA-Z]/).size=> 2>> 'one-way street'.split(/[^-a-zA-Z]/).each { |m| puts m }one-waystreet=> ["one-way", "street"]

However, there are some other symbols that can be included in the regex - for example, ' to support the words like "it's".