Loading web-font TeX/Math/Italic

Sunday, April 14, 2013

Linear Least Squares: What technique should I use? (part 1)

The quintessential least-squares problem looks like A x = b,
where A_{m \times n} is a "tall" matrix with m > n, x_{n \times 1} is the vector we are trying to "regress", and b_{m \times 1} is a vector as tall as A.

Since we have too many equations or constraints (m) compared with unknowns (n), we can't solve the problem in the usual sense. Hence we seek a least-squares solution, which can be obtained by hitting both sides of the original equation with the transpose A', to obtain the so called normal equation, A' A x = A' b.
Now we have a linear system n equations and n unknowns, which can be solved for x.

There are three methods one can use:
  1. Direct solution of Normal Equations: The idea is to factorize A' A using Cholesky decomposition, and solve the resulting system using forward and back substitution. This is straightforward, fast (asymptotic cost \sim m n^2 + n^3/3), but susceptible to round-off. We will explore this last part in a little bit.
  2. QR Decomposition: The idea is to first factorize A = QR. Then, we can write the normal equations as A' A x = b \implies R' Q'  Q R x = R' Q' b.
    Since Q is orthogonal, we can simplify the equations to the triangular system R x = Q' b.
    The asymptotic cost is 2mn^2 - 2n^3/3. This method is less susceptible to round-off.
  3. SVD: We can always bring out the ultimate thermonuclear weapon. We first factorize A = U S V'. The normal equations can then be simplified as above into S V' x = U' b.
    The asymptotic cost is 2mn^2 + 11 n^3. This method is also less susceptible to round-off. An advantage is that, if required, we can throw in some regularization at a problem to make it less nasty.
In the usual case when m \gg n, the cost of SVD and QR is comparable. However as m \sim n, SVD becomes more than an order of magnitude slower.

Let us explore the relative merits of the three methods by considering a Matlab example.

No comments: