AWK one-liner for multi-column comparision two unsorted files

29.12.12
This awk one-liner works for multi-column on unsorted files. Comparision is done based on 1st,2nd,3rd,4th of the first file and 1st,3rd,6th,7th of the second file.
File1
7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1
File2
7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7
Output
awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' File1 File2
7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1

2 comments:

  1. Hi Vijay, Thanks for the post.
    Can you help me out in this. Actually, I have rows like

    50121 abc.com 28/1/2014-12:00:00
    52111 xyz.com 27/1/2014-12:00:00
    deusr abc.com 26/1/2014-12:00:00
    50121 abc.com 26/1/2014-12:00:00
    52111 abc.com 25/1/2014-12:00:00

    I removed duplicates based on first column and got the output as
    50121 abc.com 28/1/2014-12:00:00
    52111 xyz.com 27/1/2014-12:00:00
    deusr abc.com 26/1/2014-12:00:00

    but the issue here is I am willing to remove duplicates based on 2 columns comparison. i.e., 1st and 2nd one. I am trying using 'awk' command. But I am not getting it. Can you help me out in this please..

    ReplyDelete
  2. @Bharghav, try the below:

    awk '{a[$1,$2]=$0}END{for(i in a)print a[i]}' your_file

    ReplyDelete