What is a simple way to generate keywords from a text?

python perl metadata

The name for the "high frequency English words" is stop words and there are many lists available. I'm not aware of any python or perl libraries, but you could encode your stop word list in a binary tree or hash (or you could use python's frozenset), then as you read each word from the input text, check if it is in your 'stop list' and filter it out.

Note that after you remove the stop words, you'll need to do some stemming to normalize the resulting text (remove plurals, -ings, -eds), then remove all the duplicate "keywords".

python perl metadata

You could try using the perl module Lingua::EN::Tagger for a quick and easy solution.

A more complicated module Lingua::EN::Semtags::Engine uses Lingua::EN::Tagger with a WordNet database to get a more structured output. Both are pretty easy to use, just check out the documentation on CPAN or use perldoc after you install the module.

python perl metadata

To find the most frequently-used words in a text, do something like this:

#!/usr/bin/perl -wuse strict;use warnings 'all';# Read the text:open my $ifh, '<', 'text.txt'  or die "Cannot open file: $!";local $/;my $text = <$ifh>;# Find all the words, and count how many times they appear:my %words = ( );map { $words{$_}++ }  grep { length > 1 && $_ =~ m/^[\@a-z-']+$/i }    map { s/[",\.]//g; $_ }      split /\s/, $text;print "Words, sorted by frequency:\n";my (@data_line);format FMT = @<<<<<<<<<<<<<<<<<<<<<<...     @########@data_line.local $~ = 'FMT';# Sort them by frequency:map { @data_line = ($_, $words{$_}); write(); }  sort { $words{$b} <=> $words{$a} }    grep { $words{$_} > 2 }      keys(%words);

Example output looks like this:

john@ubuntu-pc1:~/Desktop$ perl frequency.pl Words, sorted by frequency:for                                   32Jan                                   27am                                    26of                                    21your                                  21to                                    18in                                    17the                                   17Get                                   13you                                   13OTRS                                  11today                                 11PSM                                   10Card                                  10me                                     9on                                     9and                                    9Offline                                9with                                   9Invited                                9Black                                  8get                                    8Web                                    7Starred                                7All                                    7View                                   7Obama                                  7

CodeHunter

What is a simple way to generate keywords from a text?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last