Delete duplicate lines in a file in Unix

Sunday, December 30, 2012 0 Comments

Below is a simple way to delete duplicate lines from a file:

awk '!x[$0]++' file.txt

Explanation:

Each an every line is inserted in a hash(an assosiative array) when it is not a key in that hash. And when that key is already available in the assosiative array the value of that key incremented by 1.

so ! here means when the value of the key is 0(means key does not exist) it will turn out to be true and the line will be prited. but if there is already a key existing in the hash, it will be incremented by 1 and "!" will make the overall value to false which will not let the line to be printed.

0 comments: