What is a simple way to generate keywords from a text? What is a simple way to generate keywords from a text? python python

What is a simple way to generate keywords from a text?


The name for the "high frequency English words" is stop words and there are many lists available. I'm not aware of any python or perl libraries, but you could encode your stop word list in a binary tree or hash (or you could use python's frozenset), then as you read each word from the input text, check if it is in your 'stop list' and filter it out.

Note that after you remove the stop words, you'll need to do some stemming to normalize the resulting text (remove plurals, -ings, -eds), then remove all the duplicate "keywords".


You could try using the perl module Lingua::EN::Tagger for a quick and easy solution.

A more complicated module Lingua::EN::Semtags::Engine uses Lingua::EN::Tagger with a WordNet database to get a more structured output. Both are pretty easy to use, just check out the documentation on CPAN or use perldoc after you install the module.


To find the most frequently-used words in a text, do something like this:

#!/usr/bin/perl -wuse strict;use warnings 'all';# Read the text:open my $ifh, '<', 'text.txt'  or die "Cannot open file: $!";local $/;my $text = <$ifh>;# Find all the words, and count how many times they appear:my %words = ( );map { $words{$_}++ }  grep { length > 1 && $_ =~ m/^[\@a-z-']+$/i }    map { s/[",\.]//g; $_ }      split /\s/, $text;print "Words, sorted by frequency:\n";my (@data_line);format FMT = @<<<<<<<<<<<<<<<<<<<<<<...     @########@data_line.local $~ = 'FMT';# Sort them by frequency:map { @data_line = ($_, $words{$_}); write(); }  sort { $words{$b} <=> $words{$a} }    grep { $words{$_} > 2 }      keys(%words);

Example output looks like this:

john@ubuntu-pc1:~/Desktop$ perl frequency.pl Words, sorted by frequency:for                                   32Jan                                   27am                                    26of                                    21your                                  21to                                    18in                                    17the                                   17Get                                   13you                                   13OTRS                                  11today                                 11PSM                                   10Card                                  10me                                     9on                                     9and                                    9Offline                                9with                                   9Invited                                9Black                                  8get                                    8Web                                    7Starred                                7All                                    7View                                   7Obama                                  7