Tuesday, February 3, 2015

Computing Weighted Averages With Tiny Weights

Consider a set of n data-points \(\{ (w_1, R_1), (w_2, R_2), ..., (w_n, R_n) \}\), where w is the weight, and R is the observed data. We can compute a weighted mean by, \[\bar{R} = \frac{\sum\limits_{i=1}^N w_i R_i}{\sum\limits_{i=1}^N w_i}.\]

Suppose that the weights were super-tiny so that instead of the actual weights, you kept track of the logarithm of the weights \(L_i = \log w_i\). If the weights are small and disparate, a naive computer implementation of the formula, \[\bar{R} = \frac{\sum\limits_{i=1}^N e^{L_i} R_i}{\sum\limits_{i=1}^N e^{L_i}},\] would run into trouble.

This can be illustrated by a simple python program:

import numpy as np
L = np.array([-1500, -999, -1001, -1002], float);
R = np.array([800, 50, 100, 66], float);

sumNumerator   = sum(np.exp(L)*R)
sumDenominator = sum(np.exp(L))

print sumNumerator, sumDenominator

This runs into underflow problems and yields the output:

0.0 0.0

We note that the numerator and denominator can be multiplied by any common number without changing the ratio. If all the data is available, then this observation informs a simple fix. We can rescale the logWeights by subtracting out a common scale factor, \(L_{i}^{'} = L_i - L_s\), since \[\bar{R} = \frac{e^{L_s} \sum\limits_{i=1}^N e^{L_{i}^{'}} R_i}{e^{L_s} \sum\limits_{i=1}^N e^{L_i^{'}}}.\] Continuing the python example from above, and choose maximum of the logWeights as the scale factor :

Lscale = max(L)
Lp = L - Ls;
sumNumerator   = sum(np.exp(Lp)*R)
sumDenominator = sum(np.exp(Lp))

print sumNumerator, sumDenominator

# If denominator not too small
if sumDenominator > np.finfo(float).eps:
weightAverage = sumNumerator/sumDenominator
print "Scaled Weighted Average", weightAverage

yields the output,

66.8194748359 1.1851223516
Scaled Weighted Average 56.3819210274

Suppose, the data were not all available at once, but trickled in one datapoint after another. The principal complication in this case is that we don't know all the weights a priori (note: if we did, then this would be a simpler problem). We could start by setting the scale factor using the first data-point, and then appropriately weighting or de-weighting, when a better (larger) candidate for the scale factor is found. This Python function does the trick:

No comments: