Clueless Fundatma: Combining data from independent simulation runs using a bash script

Today I came across a problem that I have solved several times before. From my simulations, I generate a bunch of files called stat1, stat2, ... statN, which contain the following data:

$cat stat1
567.20 0.88
45.29 3.08
296.58 21.50
0.33 0.14

The first column are some properties in a particular simulation run, and second column is the standard error. The "N" different "stat" files are N independent simulation runs. When I finally report, I like to report the average properties and associated standard errors. The following shell script DataAgg.sh creates a new file TotalProp which contains exactly that.

$cat TotalProp
567.49 0.24
43.57 0.45
289.91 1.61
0.67 0.10
The shell script is here:

$cat DataAgg.sh

i=0
for s in stat*
do

let i=i+1

if [ $i == 1 ]; then
    awk '{print $1}' $s > TmpProp
    awk '{print $2*$2}' $s > TmpErr2Prop
else
    awk '{print $1}' $s > tmp
    paste tmp TmpProp > more
    awk '{print $1+$2}' more > TmpProp

    awk '{print $2}' $s > tmp
    paste tmp TmpErr2Prop > more
    awk '{print $1+$2}' more > TmpErr2Prop
fi
done

awk '{print $1/n}' n=$i TmpProp > more; mv more TmpProp
awk '{print sqrt($1)/n}' n=$i TmpErr2Prop > more; mv more TmpErr2Prop
paste TmpProp TmpErr2Prop > more
awk '{printf("%6.2f\t%6.2f\n",$1, $2)}' more > TotalProp

rm -f TmpProp
rm -f TmpErr2Prop
rm -f more
rm -f tmp

Note I don't need to know how many "stat"s there are, and how many rows each of the "stat"s has. The only precondition is that I know what the common prefix ("stat") of my datafiles is, and that those files contain only the two numerical columns mentioned above.

Clueless Fundatma

Thursday, July 9, 2009

Combining data from independent simulation runs using a bash script

No comments:

Post a Comment