How to Remove duplication of words from both sentences using shell script?

bash shell unix uniq

Your code would remove repeated lines; both sort and uniq operate on lines, not words. (And even then, the loop is superfluous; if you wanted to do that, your code should be simplified to just sort -u my_text.txt.)

The usual fix is to split the input to one word per line; there are some complications with real-world text, but the first basic Unix 101 implementation looks like

tr ' ' '\n' <my_text.txt | sort -u

Of course, this gives you the words in a different order than in the original, and saves the first occurrence of every word. If you wanted to discard any words which occur more than once, maybe try

tr ' ' '\n' <my_text.txt | sort | uniq -c | awk '$1 == 1 { print $2 }'

(If your tr doesn't recognize \n as newline, maybe try '\012'.)

Here is a dead simple two-pass Awk script which hopefully is a little bit more useful. It collects all the words into memory during the first pass over the file, then on the second, removes any words which occurred more than once.

awk 'NR==FNR { for (i=1; i<=NF; ++i) ++a[$i]; next }{ for (i=1; i<=NF; ++i) if (a[$i] > 1) $i="" } 1' my_test.txt my_test.txt

This leaves whitespace where words were removed; fixing that should be easy enough with a final sub().

A somewhat more useful program would split off any punctuation, and reduce words to lowercase (so that Word, word, Word!, and word? don't count as separate).

bash shell unix uniq

Can use this command to remove duplication of words from both sentences :

tr ' ' '\n' <my_text.txt | sort | uniq | xargs

bash shell unix uniq

Using awk (GNU awk):

 awk '{         for (i=1;i<=NF;i++) { # Loop on each word on each line          gsub(/[[:punct:]]/,"",$i); # Srip out any punctuation          cnt++; Set a word count variable          if (!map[$i]) { If there is not an entry for the word in an array, set it with the word as the index and the cnt variable as the value            map[$i]=cnt           }          }       }   END {         PROCINFO["sorted_in"]="@val_num_asc"; # Set the order of the array to value number ascending        for (i in map) {            printf "%s ",i # Print each word with a space        }        }' filename

One liner:

 awk '{ for (i=1;i<=NF;i++) { gsub(/[[:punct:]]/,"",$i);cnt++;if (!map[$i]) { map[$i]=cnt } } } END { PROCINFO["sorted_in"]="@val_num_asc";for (i in map) { printf "%s ",i } }' filename

NOTE - This will strip out any punctuation (full stops after words)

CodeHunter

How to Remove duplication of words from both sentences using shell script?

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last