Friday, October 7, 2011

Floating Point Number Line

In the class I am teaching this semester, I was talking about the "discreteness" of the floating point number line, and the errors that pop up due to finite precision.


One of my students pointed out that he had heard that finite-precision was behind some of the initial failures of Patriot missiles.

So I went back and researched and found that on Feb 25, 1991 an incoming Iraqi Scud missile killed 28 soldiers at Dharan because the Patriot missile that had been fired to intercept it failed.

The source of the problem was that the onboard 24-bit computer measured time in integral units of 1/10 of a second (so 35, 36, and 37 instead of 3.5s, 3.6s and 3.7s), and the integer was multiplied by 1/10 on demand.

The problem is 1/10 is a non-terminating sequence (0.0001100110011....) when translated in base-2. This is similar to 1/3 being non-terminating in base-10 (0.333...). When chopped off after 24 bits, the truncation error is about 1/10^7 s.

While this may seem small, a battery operating for 100 hours on the missile would have accumulated a round-off error of 0.34 seconds. Given the speed of the Patriot missile (about 1700 m/s), it was off-target by about 600 meters.

Interestingly, the Israeli's warned the US Army about the problem two weeks ago before the accident. Their solution: reboot frequently. While they did not know how this solved the problem, frequent resetting made the clock time smaller (than 100 hours), which meant smaller accumulated round-off error.

Obviously, the problem could also have been avoided by using 1/8 or 1/16 as the unit of time, since these numbers can be represented exactly in base-2.

Images: xkcd and Wired.com

No comments: