Counting duplicates in a file

Tuesday, March 26, 2013 0 Comments

I have a file where column one has a list of family identifiers
AB
AB
AB
AB
SAR
SAR
EAR
Is there a way that I can create a new column where each repeat is numbered creating a new label for each repeat i.e.
AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1
Below is a pretty simple solution for this:
awk '{print $1"_"++a[$1]}' file
Since the hash map a has all the counters for all the duplicates. You can use that in the END block if you wish to see just the counters.

0 comments: