## Thursday, July 9, 2009

### Combining data from independent simulation runs using a bash script

Today I came across a problem that I have solved several times before. From my simulations, I generate a bunch of files called stat1, stat2, ... statN, which contain the following data:

$cat stat1 567.20 0.88 45.29 3.08 296.58 21.50 0.33 0.14 The first column are some properties in a particular simulation run, and second column is the standard error. The "N" different "stat" files are N independent simulation runs. When I finally report, I like to report the average properties and associated standard errors. The following shell script DataAgg.sh creates a new file TotalProp which contains exactly that.$cat TotalProp
567.49 0.24
43.57 0.45

289.91 1.61
0.67 0.10

The shell script is here:

$cat DataAgg.sh i=0for s in stat*dolet i=i+1if [$i == 1 ]; then    awk '{print $1}'$s > TmpProp    awk '{print $2*$2}' $s > TmpErr2Propelse awk '{print$1}' $s > tmp paste tmp TmpProp > more awk '{print$1+$2}' more > TmpProp awk '{print$2}' $s > tmp paste tmp TmpErr2Prop > more awk '{print$1+$2}' more > TmpErr2Propfidoneawk '{print$1/n}' n=$i TmpProp > more; mv more TmpPropawk '{print sqrt($1)/n}' n=$i TmpErr2Prop > more; mv more TmpErr2Proppaste TmpProp TmpErr2Prop > moreawk '{printf("%6.2f\t%6.2f\n",$1, \$2)}' more > TotalProprm -f TmpProprm -f TmpErr2Proprm -f morerm -f tmp

Note I don't need to know how many "stat"s there are, and how many rows each of the "stat"s has. The only precondition is that I know what the common prefix ("stat") of my datafiles is, and that those files contain only the two numerical columns mentioned above.