combine multiple text files and remove duplicates combine multiple text files and remove duplicates unix unix

combine multiple text files and remove duplicates


First off, you're not using the full power of cat. The loop can be replaced by just

cat data/* > dnsFull

assuming that file is initially empty.

Then there's all those temporary files that force programs to wait for hard disks (commonly the slowest parts in modern computer systems). Use a pipeline:

cat data/* | sort | uniq > dnsOut

This is still wasteful since sort alone can do what you're using cat and uniq for; the whole script can be replaced by

sort -u data/* > dnsOut

If this is still not fast enough, then realize that sorting takes O(n lg n) time while deduplication can be done in linear time with Awk:

awk '{if (!a[$0]++) print}' data/* > dnsOut