Summing up column values based upon ranges in another file

Friday, February 22, 2013 , 2 Comments

File 1 has ranges 3-9, 2-6 etc
    3 9
    2 6
    12 20

File2 has values: column 1 indicates the range and column 2 has values.
    1 4
    2 4
    3 5
    4 4
    5 4
    6 1
    7 1
    8 1
    9 4

I would like to calculate the sum of values (file2, column2) for ranges in file1). Eg: If range is 3-9, then sum of values will be 5+4+4+1+1+1+4 = 20

Below is the script for doing the same:

use strict;
use warnings;
use feature qw( say );

use List::Util qw( sum );
my $file1 = 'file1.txt';
my $file2 = 'file2.txt';

my @file2;
{   
   open(my $fh, '<', $file2)
      or die "Can't open $file2: $!\n";
   while (<$fh>) {
      my ($k, $v) = split;
      $file2[$k] = $v;
   }
}

{   
   open(my $fh, '<', $file1)
      or die "Can't open $file1: $!\n";
   while (<$fh>) {
      my ($start, $end) = split;
      say sum grep defined, @file2[$start .. $end];
   }
}

2 comments:

  1. This can also be done with bash, and a few commonly installed utilities (sorry about the mangled whitespace):

    #!/bin/bash

    SCALE=2

    isint ()
    { test "$1" && printf '%d' "$1" &>/dev/null; }
    isnum ()
    { test "$1" && printf '%f' "$1" &>/dev/null; }

    # Prints out the commands needed to do a summation in bc
    accumulate ()
    {
    local num=0
    local lnum=0

    echo "scale=$SCALE"
    echo "total=0"

    while read num
    do
    if isint "$num"; then
    echo "total+=$num"
    else
    echo "Value on line $lnum is not a number" >&2
    exit 1
    fi

    lnum=$(( $lnum + 1 ))
    done

    echo "total"
    }

    FILE="$1"
    shift

    if isint "$1" && isint "$2"; then
    # Open a new file handle, and send everything
    # printed to that handle to bc as commands
    exec 5> >(bc -lq)

    # Generate the sequence and join with a sorted
    # copy of the file on the 1st index.
    #
    # This will give the subset of the file we need
    # to run the summation on, but we need to pull
    # only the second column - the indexes are no
    # longer needed.
    #
    # Accumulate generates the bc commands, and sending
    # it to file descriptor 5 will send them to bc
    printf '%d\n' $(seq "$1" "$2") |
    join <(sort -n "$FILE") - -j 1 |
    cut -d' ' -f2 |
    accumulate >&5

    # Clean up the file handle once we are done
    exec 5>&-
    else
    echo "$0 file start end"
    fi

    ReplyDelete