Saturday, December 29, 2012

AWK one-liner for multi-column comparision two unsorted files


This awk one-liner works for multi-column on unsorted files:

In order for this to work, it is imperative that the first file used for input
file1.txt in my example) be the file that only has 4 fields like so:

comparision is done based on 1st,2nd,3rd,4th of the first file and 1st,3rd,6th,7th of the second file.

file1.txt

7000,2,1,6
7001,2,1,7
7002,2,1,6
7003,1,2,1

file2.txt

7000,john,2,0,0,1,6
7000,john,2,0,0,1,7
7000,john,2,0,0,1,8
7000,john,2,0,0,1,9
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1
7003,mike,1,0,0,2,2
7003,mike,1,0,0,2,3
7003,mike,1,0,0,2,4
8001,nike,1,2,4,1,8
8002,paul,2,0,0,2,7

Output

awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' file1.txt file2.txt

7000,john,2,0,0,1,6
7001,elen,2,0,0,1,7
7002,sami,2,0,0,1,6
7003,mike,1,0,0,2,1

No comments:

Post a Comment