Sunday, April 14, 2013

Linear Least Squares: What technique should I use? (part 1)

The quintessential least-squares problem looks like \[A x = b,\] where \(A_{m \times n}\) is a "tall" matrix with \(m > n\), \(x_{n \times 1}\) is the vector we are trying to "regress", and \(b_{m \times 1}\) is a vector as tall as \(A\).

Since we have too many equations or constraints (m) compared with unknowns (n), we can't solve the problem in the usual sense. Hence we seek a least-squares solution, which can be obtained by hitting both sides of the original equation with the transpose \(A'\), to obtain the so called normal equation, \[A' A x = A' b.\]Now we have a linear system \(n\) equations and \(n\) unknowns, which can be solved for \(x\).

There are three methods one can use:
  1. Direct solution of Normal Equations: The idea is to factorize \(A' A\) using Cholesky decomposition, and solve the resulting system using forward and back substitution. This is straightforward, fast (asymptotic cost \(\sim m n^2 + n^3/3\)), but susceptible to round-off. We will explore this last part in a little bit.
  2. QR Decomposition: The idea is to first factorize \(A = QR\). Then, we can write the normal equations as \[A' A x = b \implies R' Q'  Q R x = R' Q' b.\] Since \(Q\) is orthogonal, we can simplify the equations to the triangular system \[R x = Q' b.\] The asymptotic cost is \(2mn^2 - 2n^3/3\). This method is less susceptible to round-off.
  3. SVD: We can always bring out the ultimate thermonuclear weapon. We first factorize \(A = U S V'\). The normal equations can then be simplified as above into \[S V' x = U' b.\]The asymptotic cost is \(2mn^2 + 11 n^3\). This method is also less susceptible to round-off. An advantage is that, if required, we can throw in some regularization at a problem to make it less nasty.
In the usual case when \(m \gg n\), the cost of SVD and QR is comparable. However as \(m \sim n\), SVD becomes more than an order of magnitude slower.

Let us explore the relative merits of the three methods by considering a Matlab example.

No comments: