Saturday, December 29, 2012


This awk one-liner works for multi-column on unsorted files:

Comparision is done based on 1st,2nd,3rd,4th of the first file and 1st,3rd,6th,7th of the second file.

File1
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1

File2

7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
Output

awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' File1 File2
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
Categories: ,

2 comments:

  1. Hi Vijay, Thanks for the post.
    Can you help me out in this. Actually, I have rows like

    50121 abc.com 28/1/2014-12:00:00
    52111 xyz.com 27/1/2014-12:00:00
    deusr abc.com 26/1/2014-12:00:00
    50121 abc.com 26/1/2014-12:00:00
    52111 abc.com 25/1/2014-12:00:00

    I removed duplicates based on first column and got the output as
    50121 abc.com 28/1/2014-12:00:00
    52111 xyz.com 27/1/2014-12:00:00
    deusr abc.com 26/1/2014-12:00:00

    but the issue here is I am willing to remove duplicates based on 2 columns comparison. i.e., 1st and 2nd one. I am trying using 'awk' command. But I am not getting it. Can you help me out in this please..

    ReplyDelete
  2. @Bharghav, try the below:

    awk '{a[$1,$2]=$0}END{for(i in a)print a[i]}' your_file

    ReplyDelete